SlideShare a Scribd company logo
Annotate Types in Large Codebase with Automated
Refactoring
Jimmy Lai, Software Engineer at Carta
Feb. 9, 2022
Annotate types in large codebase with automated refactoring
Annotate types in large codebase with automated refactoring
Tech Stack
…
A Large Python Codebase
Python code
1.8 million lines
27,000 files
120,000 functions
~200 active developers
Lots of TypeError,
AttributeError, ValueError
Type Annotation and Mypy
Mypy: Argument 1 to "add" has incompatible type "str"; expected "int"
Automated Refactoring
Automated code changes for fixing large scale tech
debt (Code Formatting, Type Annotation, Dead Code
Cleanup)
LibCST Features:
● Concrete Syntax Tree
● Transformer and Matcher API
● Metadata with static analysis
Recommended tool: LibCST
A library for modifying Python code easily.
Code Review with Pull Requests
Pull
Request
Pull
Request
Pull
Request
Pull
Request
Annotate types in large codebase with automated refactoring
Annotate types in large codebase with automated refactoring
Annotate types in large codebase with automated refactoring
Annotate types in large codebase with automated refactoring
Add missing types based on static analysis
Annotate types in large codebase with automated refactoring
Annotate types in large codebase with automated refactoring
MonkeyType: add missing types based on runtime data
1. Collect types by running Python program.
2. Aggregate collected types and apply to the code using LibCST.
Run test cases and apply types:
Make it more fun!
Automated weekly updates and leaderboards!
Fully Typed Function Coverage
2018 2021
automated refactoring
Production Type Error Improvement
20

Carta
We are hiring! https://tinyurl.com/carta-jobs
Carta Engineering Blog https://medium.com/building-carta
Contact: jimmy.lai@carta.com
Ad

Recommended

Enforcing API Design Rules for High Quality Code Generation
Enforcing API Design Rules for High Quality Code Generation
Tim Burks
 
Type Annotations in Python: Whats, Whys and Wows!
Type Annotations in Python: Whats, Whys and Wows!
Andreas Dewes
 
MongoDB World 2018: A Swift Introduction to Swift
MongoDB World 2018: A Swift Introduction to Swift
MongoDB
 
Python_Interview_Questions.pdf
Python_Interview_Questions.pdf
Samir P.
 
C Types - Extending Python
C Types - Extending Python
Priyank Kapadia
 
Introduction to Python.pdf
Introduction to Python.pdf
Rahul Mogal
 
Nordic API days 2016 - APIs.guru Wikipedia for Web APIs
Nordic API days 2016 - APIs.guru Wikipedia for Web APIs
Ivan Goncharov
 
Git risky using git metadata to predict code bug risk
Git risky using git metadata to predict code bug risk
PyData
 
python programming.pptx
python programming.pptx
Kaviya452563
 
Static code analysis for verification of the 64-bit applications
Static code analysis for verification of the 64-bit applications
PVS-Studio
 
Pa1 json requests
Pa1 json requests
aiclub_slides
 
Python Linters at Scale.pdf
Python Linters at Scale.pdf
Jimmy Lai
 
phases of compiler PPT includes phases of compiler
phases of compiler PPT includes phases of compiler
premajain3
 
FDP-faculty deveopmemt program on python
FDP-faculty deveopmemt program on python
kannikadg
 
The Onward Journey: Porting Twisted to Python 3
The Onward Journey: Porting Twisted to Python 3
Craig Rodrigues
 
API Athens Meetup - API standards 22.03.2016
API Athens Meetup - API standards 22.03.2016
Ivan Goncharov
 
JavaScript API Deprecation in the Wild: A First Assessment (SANER 2020)
JavaScript API Deprecation in the Wild: A First Assessment (SANER 2020)
Andre Hora
 
1-Phases of compiler-26-04-2023.pptx
1-Phases of compiler-26-04-2023.pptx
venkatapranaykumarGa
 
Wondershare Filmora 14.3.2 Crack + License Key Free Download
Wondershare Filmora 14.3.2 Crack + License Key Free Download
anglekaan18
 
2025-03-20 - How to use AI to your advantage - AI-Driven Development.pdf
2025-03-20 - How to use AI to your advantage - AI-Driven Development.pdf
Shereef
 
AOMEI Backupper Crack 2025 FREE Download
AOMEI Backupper Crack 2025 FREE Download
muhammadwaqaryounus6
 
Wondershare PDFelement Pro Crack FREE Download
Wondershare PDFelement Pro Crack FREE Download
waqarcracker5
 
Python Course In Chandigarh
Python Course In Chandigarh
Excellence Academy
 
Pa2 session 4
Pa2 session 4
aiclub_slides
 
C# 4.0 and .NET 4.0
C# 4.0 and .NET 4.0
Buu Nguyen
 
Pa1 json requests
Pa1 json requests
aiclub_slides
 
quang document based portfolio presentation.pptx
quang document based portfolio presentation.pptx
shiningstar010325
 
Overview of python 2019
Overview of python 2019
Samir Mohanty
 
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
Jimmy Lai
 
PyCon JP 2024 Streamlining Testing in a Large Python Codebase .pdf
PyCon JP 2024 Streamlining Testing in a Large Python Codebase .pdf
Jimmy Lai
 

More Related Content

Similar to Annotate types in large codebase with automated refactoring (20)

python programming.pptx
python programming.pptx
Kaviya452563
 
Static code analysis for verification of the 64-bit applications
Static code analysis for verification of the 64-bit applications
PVS-Studio
 
Pa1 json requests
Pa1 json requests
aiclub_slides
 
Python Linters at Scale.pdf
Python Linters at Scale.pdf
Jimmy Lai
 
phases of compiler PPT includes phases of compiler
phases of compiler PPT includes phases of compiler
premajain3
 
FDP-faculty deveopmemt program on python
FDP-faculty deveopmemt program on python
kannikadg
 
The Onward Journey: Porting Twisted to Python 3
The Onward Journey: Porting Twisted to Python 3
Craig Rodrigues
 
API Athens Meetup - API standards 22.03.2016
API Athens Meetup - API standards 22.03.2016
Ivan Goncharov
 
JavaScript API Deprecation in the Wild: A First Assessment (SANER 2020)
JavaScript API Deprecation in the Wild: A First Assessment (SANER 2020)
Andre Hora
 
1-Phases of compiler-26-04-2023.pptx
1-Phases of compiler-26-04-2023.pptx
venkatapranaykumarGa
 
Wondershare Filmora 14.3.2 Crack + License Key Free Download
Wondershare Filmora 14.3.2 Crack + License Key Free Download
anglekaan18
 
2025-03-20 - How to use AI to your advantage - AI-Driven Development.pdf
2025-03-20 - How to use AI to your advantage - AI-Driven Development.pdf
Shereef
 
AOMEI Backupper Crack 2025 FREE Download
AOMEI Backupper Crack 2025 FREE Download
muhammadwaqaryounus6
 
Wondershare PDFelement Pro Crack FREE Download
Wondershare PDFelement Pro Crack FREE Download
waqarcracker5
 
Python Course In Chandigarh
Python Course In Chandigarh
Excellence Academy
 
Pa2 session 4
Pa2 session 4
aiclub_slides
 
C# 4.0 and .NET 4.0
C# 4.0 and .NET 4.0
Buu Nguyen
 
Pa1 json requests
Pa1 json requests
aiclub_slides
 
quang document based portfolio presentation.pptx
quang document based portfolio presentation.pptx
shiningstar010325
 
Overview of python 2019
Overview of python 2019
Samir Mohanty
 
python programming.pptx
python programming.pptx
Kaviya452563
 
Static code analysis for verification of the 64-bit applications
Static code analysis for verification of the 64-bit applications
PVS-Studio
 
Python Linters at Scale.pdf
Python Linters at Scale.pdf
Jimmy Lai
 
phases of compiler PPT includes phases of compiler
phases of compiler PPT includes phases of compiler
premajain3
 
FDP-faculty deveopmemt program on python
FDP-faculty deveopmemt program on python
kannikadg
 
The Onward Journey: Porting Twisted to Python 3
The Onward Journey: Porting Twisted to Python 3
Craig Rodrigues
 
API Athens Meetup - API standards 22.03.2016
API Athens Meetup - API standards 22.03.2016
Ivan Goncharov
 
JavaScript API Deprecation in the Wild: A First Assessment (SANER 2020)
JavaScript API Deprecation in the Wild: A First Assessment (SANER 2020)
Andre Hora
 
1-Phases of compiler-26-04-2023.pptx
1-Phases of compiler-26-04-2023.pptx
venkatapranaykumarGa
 
Wondershare Filmora 14.3.2 Crack + License Key Free Download
Wondershare Filmora 14.3.2 Crack + License Key Free Download
anglekaan18
 
2025-03-20 - How to use AI to your advantage - AI-Driven Development.pdf
2025-03-20 - How to use AI to your advantage - AI-Driven Development.pdf
Shereef
 
AOMEI Backupper Crack 2025 FREE Download
AOMEI Backupper Crack 2025 FREE Download
muhammadwaqaryounus6
 
Wondershare PDFelement Pro Crack FREE Download
Wondershare PDFelement Pro Crack FREE Download
waqarcracker5
 
C# 4.0 and .NET 4.0
C# 4.0 and .NET 4.0
Buu Nguyen
 
quang document based portfolio presentation.pptx
quang document based portfolio presentation.pptx
shiningstar010325
 
Overview of python 2019
Overview of python 2019
Samir Mohanty
 

More from Jimmy Lai (20)

[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
Jimmy Lai
 
PyCon JP 2024 Streamlining Testing in a Large Python Codebase .pdf
PyCon JP 2024 Streamlining Testing in a Large Python Codebase .pdf
Jimmy Lai
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
Jimmy Lai
 
EuroPython 2022 - Automated Refactoring Large Python Codebases
EuroPython 2022 - Automated Refactoring Large Python Codebases
Jimmy Lai
 
The journey of asyncio adoption in instagram
The journey of asyncio adoption in instagram
Jimmy Lai
 
Data Analyst Nanodegree
Data Analyst Nanodegree
Jimmy Lai
 
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...
Jimmy Lai
 
Continuous Delivery: automated testing, continuous integration and continuous...
Continuous Delivery: automated testing, continuous integration and continuous...
Jimmy Lai
 
Build a Searchable Knowledge Base
Build a Searchable Knowledge Base
Jimmy Lai
 
[LDSP] Solr Usage
[LDSP] Solr Usage
Jimmy Lai
 
[LDSP] Search Engine Back End API Solution for Fast Prototyping
[LDSP] Search Engine Back End API Solution for Fast Prototyping
Jimmy Lai
 
Text classification in scikit-learn
Text classification in scikit-learn
Jimmy Lai
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013
Jimmy Lai
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Jimmy Lai
 
Software development practices in python
Software development practices in python
Jimmy Lai
 
Fast data mining flow prototyping using IPython Notebook
Fast data mining flow prototyping using IPython Notebook
Jimmy Lai
 
Documentation with sphinx @ PyHug
Documentation with sphinx @ PyHug
Jimmy Lai
 
Apache thrift-RPC service cross languages
Apache thrift-RPC service cross languages
Jimmy Lai
 
NetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHug
Jimmy Lai
 
When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012
Jimmy Lai
 
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
Jimmy Lai
 
PyCon JP 2024 Streamlining Testing in a Large Python Codebase .pdf
PyCon JP 2024 Streamlining Testing in a Large Python Codebase .pdf
Jimmy Lai
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
Jimmy Lai
 
EuroPython 2022 - Automated Refactoring Large Python Codebases
EuroPython 2022 - Automated Refactoring Large Python Codebases
Jimmy Lai
 
The journey of asyncio adoption in instagram
The journey of asyncio adoption in instagram
Jimmy Lai
 
Data Analyst Nanodegree
Data Analyst Nanodegree
Jimmy Lai
 
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...
Jimmy Lai
 
Continuous Delivery: automated testing, continuous integration and continuous...
Continuous Delivery: automated testing, continuous integration and continuous...
Jimmy Lai
 
Build a Searchable Knowledge Base
Build a Searchable Knowledge Base
Jimmy Lai
 
[LDSP] Solr Usage
[LDSP] Solr Usage
Jimmy Lai
 
[LDSP] Search Engine Back End API Solution for Fast Prototyping
[LDSP] Search Engine Back End API Solution for Fast Prototyping
Jimmy Lai
 
Text classification in scikit-learn
Text classification in scikit-learn
Jimmy Lai
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013
Jimmy Lai
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Jimmy Lai
 
Software development practices in python
Software development practices in python
Jimmy Lai
 
Fast data mining flow prototyping using IPython Notebook
Fast data mining flow prototyping using IPython Notebook
Jimmy Lai
 
Documentation with sphinx @ PyHug
Documentation with sphinx @ PyHug
Jimmy Lai
 
Apache thrift-RPC service cross languages
Apache thrift-RPC service cross languages
Jimmy Lai
 
NetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHug
Jimmy Lai
 
When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012
Jimmy Lai
 
Ad

Recently uploaded (20)

Introduction to Python Programming Language
Introduction to Python Programming Language
merlinjohnsy
 
Call For Papers - 17th International Conference on Wireless & Mobile Networks...
Call For Papers - 17th International Conference on Wireless & Mobile Networks...
hosseinihamid192023
 
Rapid Prototyping for XR: Lecture 3 - Video and Paper Prototyping
Rapid Prototyping for XR: Lecture 3 - Video and Paper Prototyping
Mark Billinghurst
 
Fatality due to Falls at Working at Height
Fatality due to Falls at Working at Height
ssuserb8994f
 
Rapid Prototyping for XR: Lecture 2 - Low Fidelity Prototyping.
Rapid Prototyping for XR: Lecture 2 - Low Fidelity Prototyping.
Mark Billinghurst
 
Tesla-Stock-Analysis-and-Forecast.pptx (1).pptx
Tesla-Stock-Analysis-and-Forecast.pptx (1).pptx
moonsony54
 
20CE404-Soil Mechanics - Slide Share PPT
20CE404-Soil Mechanics - Slide Share PPT
saravananr808639
 
DESIGN OF REINFORCED CONCRETE ELEMENTS S
DESIGN OF REINFORCED CONCRETE ELEMENTS S
prabhusp8
 
Unit III_One Dimensional Consolidation theory
Unit III_One Dimensional Consolidation theory
saravananr808639
 
Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx
Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx
resming1
 
Proposal for folders structure division in projects.pdf
Proposal for folders structure division in projects.pdf
Mohamed Ahmed
 
Industry 4.o the fourth revolutionWeek-2.pptx
Industry 4.o the fourth revolutionWeek-2.pptx
KNaveenKumarECE
 
MATERIAL SCIENCE LECTURE NOTES FOR DIPLOMA STUDENTS
MATERIAL SCIENCE LECTURE NOTES FOR DIPLOMA STUDENTS
SAMEER VISHWAKARMA
 
special_edition_using_visual_foxpro_6.pdf
special_edition_using_visual_foxpro_6.pdf
Shabista Imam
 
Machine Learning - Classification Algorithms
Machine Learning - Classification Algorithms
resming1
 
Abraham Silberschatz-Operating System Concepts (9th,2012.12).pdf
Abraham Silberschatz-Operating System Concepts (9th,2012.12).pdf
Shabista Imam
 
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
resming1
 
NEW Strengthened Senior High School Gen Math.pptx
NEW Strengthened Senior High School Gen Math.pptx
DaryllWhere
 
Rapid Prototyping for XR: Lecture 5 - Cross Platform Development
Rapid Prototyping for XR: Lecture 5 - Cross Platform Development
Mark Billinghurst
 
retina_biometrics ruet rajshahi bangdesh.pptx
retina_biometrics ruet rajshahi bangdesh.pptx
MdRakibulIslam697135
 
Introduction to Python Programming Language
Introduction to Python Programming Language
merlinjohnsy
 
Call For Papers - 17th International Conference on Wireless & Mobile Networks...
Call For Papers - 17th International Conference on Wireless & Mobile Networks...
hosseinihamid192023
 
Rapid Prototyping for XR: Lecture 3 - Video and Paper Prototyping
Rapid Prototyping for XR: Lecture 3 - Video and Paper Prototyping
Mark Billinghurst
 
Fatality due to Falls at Working at Height
Fatality due to Falls at Working at Height
ssuserb8994f
 
Rapid Prototyping for XR: Lecture 2 - Low Fidelity Prototyping.
Rapid Prototyping for XR: Lecture 2 - Low Fidelity Prototyping.
Mark Billinghurst
 
Tesla-Stock-Analysis-and-Forecast.pptx (1).pptx
Tesla-Stock-Analysis-and-Forecast.pptx (1).pptx
moonsony54
 
20CE404-Soil Mechanics - Slide Share PPT
20CE404-Soil Mechanics - Slide Share PPT
saravananr808639
 
DESIGN OF REINFORCED CONCRETE ELEMENTS S
DESIGN OF REINFORCED CONCRETE ELEMENTS S
prabhusp8
 
Unit III_One Dimensional Consolidation theory
Unit III_One Dimensional Consolidation theory
saravananr808639
 
Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx
Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx
resming1
 
Proposal for folders structure division in projects.pdf
Proposal for folders structure division in projects.pdf
Mohamed Ahmed
 
Industry 4.o the fourth revolutionWeek-2.pptx
Industry 4.o the fourth revolutionWeek-2.pptx
KNaveenKumarECE
 
MATERIAL SCIENCE LECTURE NOTES FOR DIPLOMA STUDENTS
MATERIAL SCIENCE LECTURE NOTES FOR DIPLOMA STUDENTS
SAMEER VISHWAKARMA
 
special_edition_using_visual_foxpro_6.pdf
special_edition_using_visual_foxpro_6.pdf
Shabista Imam
 
Machine Learning - Classification Algorithms
Machine Learning - Classification Algorithms
resming1
 
Abraham Silberschatz-Operating System Concepts (9th,2012.12).pdf
Abraham Silberschatz-Operating System Concepts (9th,2012.12).pdf
Shabista Imam
 
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
resming1
 
NEW Strengthened Senior High School Gen Math.pptx
NEW Strengthened Senior High School Gen Math.pptx
DaryllWhere
 
Rapid Prototyping for XR: Lecture 5 - Cross Platform Development
Rapid Prototyping for XR: Lecture 5 - Cross Platform Development
Mark Billinghurst
 
retina_biometrics ruet rajshahi bangdesh.pptx
retina_biometrics ruet rajshahi bangdesh.pptx
MdRakibulIslam697135
 
Ad

Annotate types in large codebase with automated refactoring

  • 1. Annotate Types in Large Codebase with Automated Refactoring Jimmy Lai, Software Engineer at Carta Feb. 9, 2022
  • 5. A Large Python Codebase Python code 1.8 million lines 27,000 files 120,000 functions ~200 active developers Lots of TypeError, AttributeError, ValueError
  • 6. Type Annotation and Mypy Mypy: Argument 1 to "add" has incompatible type "str"; expected "int"
  • 7. Automated Refactoring Automated code changes for fixing large scale tech debt (Code Formatting, Type Annotation, Dead Code Cleanup) LibCST Features: ● Concrete Syntax Tree ● Transformer and Matcher API ● Metadata with static analysis Recommended tool: LibCST A library for modifying Python code easily.
  • 8. Code Review with Pull Requests Pull Request Pull Request Pull Request Pull Request
  • 13. Add missing types based on static analysis
  • 16. MonkeyType: add missing types based on runtime data 1. Collect types by running Python program. 2. Aggregate collected types and apply to the code using LibCST. Run test cases and apply types:
  • 17. Make it more fun! Automated weekly updates and leaderboards!
  • 18. Fully Typed Function Coverage 2018 2021 automated refactoring
  • 19. Production Type Error Improvement
  • 20. 20  Carta We are hiring! https://tinyurl.com/carta-jobs Carta Engineering Blog https://medium.com/building-carta Contact: [email protected]