SlideShare a Scribd company logo
Deep Learning Type Inference for Dynamic
Programming Languages
Amir M. Mir
PhD Student in Software Engineering Research Group
s.a.m.mir@tudelft.nl
SERG Lunch
April 22, 20201
Content
● Introduction
● Type annotations
● Existing Deep Learning-based approaches
● Major Research Problem
● Our current approach
2
Introduction
Dynamic programming languages such as Python and JavaScript are extremely
popular nowadays.
3
Introduction
Dynamic languages enable fast prototyping.
4
Issues of Dynamic Languages
● Type errors
● Suboptimal IDE support
● Unexpected runtime behavior
● Difficult-to-understand APIs
5
Type Annotations
● Type hints for Python 3 (PEP 484, Sep. 2014)
● TypeScript with optional static types (Oct. 2012)
6
Type Annotations
TypeScript example:
7
Type Annotations
Python example:
8
Type Annotations
Issues
● Relies on developers
● Cumbersome and error-prone process
● Two main approaches for inferring types:
○ Static analysis tools
○ ML-based approaches
9
Static Type Checkers
● Mypy (mypy-lang.org/)
● Pyre (pyre-check.org/)
● Flow (flow.org/)
10
Existing Deep learning-based Approaches
11
Existing Deep learning-based Approaches
● DeepTyper (Vincent et al., 2018)
● NL2Type (Malik et al., 2019)
● TypeWriter (Pradel et al., 2020)
● LAMBDANET (Wel et al., 2020)
12
DeepTyper
● Inspired by part-of-speech (POS) tagging in NLP research
● The task is modelled as a sequence of annotations.
● Employs a Bi-directional Recurrent Neural Network (bi-RNN).
● Adds a consistency layer to the bi-RNN for considering multiple usage of a
variable.
13
DeepTyper
14
NL2Type
● Considers natural language information embedded in code
○ Name of the function
○ Name of the formal parameters
○ Comment associated with the function
○ Comment associated with the parameters
○ Comment associated with return type of the function
● Learns two word embeddings for both comments and identifier names
● Adapts an RNN with long short-term memory (LSTM)
15
NL2Type
16
NL2Type
17
TypeWriter
● Considers four kinds of context information:
○ Identifiers names
○ Code occurrences
○ Function-level comments
○ Available type hints
● Similar to NL2Type, it trains two word embeddings
● Has three RNNs submodels:
○ Learning from identifiers
○ Learning from token sequences
○ Learning from comments
● Feedback-guided search for consistent types
18
TypeWriter
Available type hints
Identifiers
Comments
Code occurrences
19
TypeWriter
20
LAMBDANET
● Imposes hard constraints on types
● Contextual hints
● Type dependency graph, i.e. a set of predicates
● Uses a Graph Neural Network (GNNs) and proposes a pointer-like network
for handling user-defined types
21
LAMBDANET
Type Dependency Graph
22
LAMBDANET
Hyperedges in type dependency graph
23
Major Research Problem
Closed type vocabulary, i.e. limited to 1000 types.
24
Out-of-Vocabulary Problem
25
DNN Model
Unknown
Return type
Parameter type
Our Current Approach
New dataset
26
Re-implementation of TypeWriter with new dataset
27
~27% higher ~7% higher
Our Current Approach
Improved available type extractor
28
[AbstactResolver,
ClientConnectionError,
ClientHttpProxyError,
…]
1
1
0
0Python dataset Visible Type hints extractor
Type mask vector● Lightweight static analysis with importlab and LibCST
Our Current Approach
Future
● Refinements to the search part and/or the neural model
● Performing extensive experiments to show the effectiveness of the
approach
● Writing a paper draft by the end of June.
29
Thank You!
30
References
1. Hellendoorn, V. J., Bird, C., Barr, E. T., & Allamanis, M. (2018, October). Deep learning type inference. In Proceedings of the 2018 26th acm
joint meeting on european software engineering conference and symposium on the foundations of software engineering (pp. 152-162).
2. Malik, R. S., Patra, J., & Pradel, M. (2019, May). NL2Type: inferring JavaScript function types from natural language information. In 2019
IEEE/ACM 41st International Conference on Software Engineering (ICSE) (pp. 304-315). IEEE.
3. Pradel, M., Gousios, G., Liu, J., & Chandra, S. (2019). TypeWriter: Neural Type Prediction with Search-based Validation. arXiv preprint
arXiv:1912.03768.
4. Wei, J., Goyal, M., Durrett, G., & Dillig, I. (2020). LambdaNet: Probabilistic Type Inference using Graph Neural Networks. ICLR 2020.
5. Gage, P. (1994). A new algorithm for data compression. C Users Journal, 12(2), 23-38.
6. Sennrich, R., Haddow, B., & Birch, A. (2015). Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909.
31
Ad

Recommended

Notesparadigms
Notesparadigms
Deepakkumar5880
 
Python assignment help from professional programmers
Python assignment help from professional programmers
Anderson Silva
 
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Sebastian Ruder
 
Lichang Wang_CV
Lichang Wang_CV
lichang wang
 
Oke
Oke
Andrea Nuzzolese
 
LDA Beginner's Tutorial
LDA Beginner's Tutorial
Wayne Lee
 
Knowledge Patterns for the Web: extraction, transformation, and reuse
Knowledge Patterns for the Web: extraction, transformation, and reuse
Andrea Nuzzolese
 
Admixture of Poisson MRFs: A New Topic Model with Word Dependencies
Admixture of Poisson MRFs: A New Topic Model with Word Dependencies
David Inouye
 
Introduction to Named Entity Recognition
Introduction to Named Entity Recognition
Tomer Lieber
 
AINL 2016: Bastrakova, Ledesma, Millan, Zighed
AINL 2016: Bastrakova, Ledesma, Millan, Zighed
Lidia Pivovarova
 
Semantic Technologies in ST&DL
Semantic Technologies in ST&DL
Andrea Nuzzolese
 
I didn't know you could do that with groovy
I didn't know you could do that with groovy
Steven Hicks
 
A hierarchical neural autoencoder for paragraphs and documents
A hierarchical neural autoencoder for paragraphs and documents
Hayahide Yamagishi
 
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
Iman Mirrezaei
 
A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...
Francisco Manuel Rangel Pardo
 
Splitup Syllabus for Class XII
Splitup Syllabus for Class XII
Praveen M Jigajinni
 
AINL 2016: Eyecioglu
AINL 2016: Eyecioglu
Lidia Pivovarova
 
Icpc16.ppt
Icpc16.ppt
Yann-Gaël Guéhéneuc
 
AINL 2016: Malykh
AINL 2016: Malykh
Lidia Pivovarova
 
Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...
Francisco Manuel Rangel Pardo
 
Opinion mining for social media and news items in Romanian
Opinion mining for social media and news items in Romanian
Traian Rebedea
 
OntoLex-TEI: Inspiration from Global WordNet
OntoLex-TEI: Inspiration from Global WordNet
PretaLLOD
 
Studi Penerapan Ontologi dalam Bahasa Inggris sebagai Kerangka
Studi Penerapan Ontologi dalam Bahasa Inggris sebagai Kerangka
Metilova Sitorus
 
kite
kite
miso_uam
 
Topic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 Tutorial
Vitomir Kovanovic
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Lifeng (Aaron) Han
 
Scam 08
Scam 08
ClarkTony
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
Lifeng (Aaron) Han
 
The presentation of Type4Py at the ICSE'22 conference
The presentation of Type4Py at the ICSE'22 conference
Amir M. Mir
 
The Object Model
The Object Model
yndaravind
 

More Related Content

What's hot (20)

Introduction to Named Entity Recognition
Introduction to Named Entity Recognition
Tomer Lieber
 
AINL 2016: Bastrakova, Ledesma, Millan, Zighed
AINL 2016: Bastrakova, Ledesma, Millan, Zighed
Lidia Pivovarova
 
Semantic Technologies in ST&DL
Semantic Technologies in ST&DL
Andrea Nuzzolese
 
I didn't know you could do that with groovy
I didn't know you could do that with groovy
Steven Hicks
 
A hierarchical neural autoencoder for paragraphs and documents
A hierarchical neural autoencoder for paragraphs and documents
Hayahide Yamagishi
 
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
Iman Mirrezaei
 
A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...
Francisco Manuel Rangel Pardo
 
Splitup Syllabus for Class XII
Splitup Syllabus for Class XII
Praveen M Jigajinni
 
AINL 2016: Eyecioglu
AINL 2016: Eyecioglu
Lidia Pivovarova
 
Icpc16.ppt
Icpc16.ppt
Yann-Gaël Guéhéneuc
 
AINL 2016: Malykh
AINL 2016: Malykh
Lidia Pivovarova
 
Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...
Francisco Manuel Rangel Pardo
 
Opinion mining for social media and news items in Romanian
Opinion mining for social media and news items in Romanian
Traian Rebedea
 
OntoLex-TEI: Inspiration from Global WordNet
OntoLex-TEI: Inspiration from Global WordNet
PretaLLOD
 
Studi Penerapan Ontologi dalam Bahasa Inggris sebagai Kerangka
Studi Penerapan Ontologi dalam Bahasa Inggris sebagai Kerangka
Metilova Sitorus
 
kite
kite
miso_uam
 
Topic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 Tutorial
Vitomir Kovanovic
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Lifeng (Aaron) Han
 
Scam 08
Scam 08
ClarkTony
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
Lifeng (Aaron) Han
 
Introduction to Named Entity Recognition
Introduction to Named Entity Recognition
Tomer Lieber
 
AINL 2016: Bastrakova, Ledesma, Millan, Zighed
AINL 2016: Bastrakova, Ledesma, Millan, Zighed
Lidia Pivovarova
 
Semantic Technologies in ST&DL
Semantic Technologies in ST&DL
Andrea Nuzzolese
 
I didn't know you could do that with groovy
I didn't know you could do that with groovy
Steven Hicks
 
A hierarchical neural autoencoder for paragraphs and documents
A hierarchical neural autoencoder for paragraphs and documents
Hayahide Yamagishi
 
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
Iman Mirrezaei
 
A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...
Francisco Manuel Rangel Pardo
 
Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...
Francisco Manuel Rangel Pardo
 
Opinion mining for social media and news items in Romanian
Opinion mining for social media and news items in Romanian
Traian Rebedea
 
OntoLex-TEI: Inspiration from Global WordNet
OntoLex-TEI: Inspiration from Global WordNet
PretaLLOD
 
Studi Penerapan Ontologi dalam Bahasa Inggris sebagai Kerangka
Studi Penerapan Ontologi dalam Bahasa Inggris sebagai Kerangka
Metilova Sitorus
 
Topic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 Tutorial
Vitomir Kovanovic
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Lifeng (Aaron) Han
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
Lifeng (Aaron) Han
 

Similar to Deep learning Type Inference for Dynamic Programming Languages (20)

The presentation of Type4Py at the ICSE'22 conference
The presentation of Type4Py at the ICSE'22 conference
Amir M. Mir
 
The Object Model
The Object Model
yndaravind
 
A Primer on High-Quality Identifier Naming [ASE 2022]
A Primer on High-Quality Identifier Naming [ASE 2022]
University of Hawai‘i at Mānoa
 
Lacture 1- Programming using python.pptx
Lacture 1- Programming using python.pptx
hello236603
 
CIS 5 Project.pdf
CIS 5 Project.pdf
RayvonneEvans1
 
cis5-204-Project-ch11c - Evan, Le, Mata.pdf
cis5-204-Project-ch11c - Evan, Le, Mata.pdf
MinhLe595264
 
What do Practitioners Expect from the Meta-modeling Tools? A Survey
What do Practitioners Expect from the Meta-modeling Tools? A Survey
Obeo
 
Text categorization
Text categorization
Shubham Pahune
 
Miso-McGill
Miso-McGill
miso_uam
 
GE3151_PSPP_UNIT_2_Notes
GE3151_PSPP_UNIT_2_Notes
Guru Nanak Technical Institutions
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
Jinho Choi
 
ALGORITHM FOR TEXT TO GRAPH CONVERSION
ALGORITHM FOR TEXT TO GRAPH CONVERSION
ijnlc
 
ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...
ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...
kevig
 
ResearchPaper
ResearchPaper
Prajakta Yerpude
 
Recent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP Approaches
IRJET Journal
 
Python Mastery: A Comprehensive Guide to Setting Up Your Development Environment
Python Mastery: A Comprehensive Guide to Setting Up Your Development Environment
Python Devloper
 
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
Enrico Palumbo
 
Lecture1.pptx
Lecture1.pptx
akabiradam13
 
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Marcus Hanwell
 
Modern Programming Languages classification Poster
Modern Programming Languages classification Poster
Saulo Aguiar
 
The presentation of Type4Py at the ICSE'22 conference
The presentation of Type4Py at the ICSE'22 conference
Amir M. Mir
 
The Object Model
The Object Model
yndaravind
 
Lacture 1- Programming using python.pptx
Lacture 1- Programming using python.pptx
hello236603
 
cis5-204-Project-ch11c - Evan, Le, Mata.pdf
cis5-204-Project-ch11c - Evan, Le, Mata.pdf
MinhLe595264
 
What do Practitioners Expect from the Meta-modeling Tools? A Survey
What do Practitioners Expect from the Meta-modeling Tools? A Survey
Obeo
 
Miso-McGill
Miso-McGill
miso_uam
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
Jinho Choi
 
ALGORITHM FOR TEXT TO GRAPH CONVERSION
ALGORITHM FOR TEXT TO GRAPH CONVERSION
ijnlc
 
ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...
ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...
kevig
 
Recent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP Approaches
IRJET Journal
 
Python Mastery: A Comprehensive Guide to Setting Up Your Development Environment
Python Mastery: A Comprehensive Guide to Setting Up Your Development Environment
Python Devloper
 
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
Enrico Palumbo
 
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Marcus Hanwell
 
Modern Programming Languages classification Poster
Modern Programming Languages classification Poster
Saulo Aguiar
 
Ad

Recently uploaded (20)

Introduction to Python Programming Language
Introduction to Python Programming Language
merlinjohnsy
 
MATERIAL SCIENCE LECTURE NOTES FOR DIPLOMA STUDENTS
MATERIAL SCIENCE LECTURE NOTES FOR DIPLOMA STUDENTS
SAMEER VISHWAKARMA
 
Tesla-Stock-Analysis-and-Forecast.pptx (1).pptx
Tesla-Stock-Analysis-and-Forecast.pptx (1).pptx
moonsony54
 
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
resming1
 
Call For Papers - 17th International Conference on Wireless & Mobile Networks...
Call For Papers - 17th International Conference on Wireless & Mobile Networks...
hosseinihamid192023
 
FUNDAMENTALS OF COMPUTER ORGANIZATION AND ARCHITECTURE
FUNDAMENTALS OF COMPUTER ORGANIZATION AND ARCHITECTURE
Shabista Imam
 
How to Un-Obsolete Your Legacy Keypad Design
How to Un-Obsolete Your Legacy Keypad Design
Epec Engineered Technologies
 
machine learning is a advance technology
machine learning is a advance technology
ynancy893
 
Deep Learning for Image Processing on 16 June 2025 MITS.pptx
Deep Learning for Image Processing on 16 June 2025 MITS.pptx
resming1
 
Cadastral Maps
Cadastral Maps
Google
 
Proposal for folders structure division in projects.pdf
Proposal for folders structure division in projects.pdf
Mohamed Ahmed
 
Abraham Silberschatz-Operating System Concepts (9th,2012.12).pdf
Abraham Silberschatz-Operating System Concepts (9th,2012.12).pdf
Shabista Imam
 
Solar thermal – Flat plate and concentrating collectors .pptx
Solar thermal – Flat plate and concentrating collectors .pptx
jdaniabraham1
 
retina_biometrics ruet rajshahi bangdesh.pptx
retina_biometrics ruet rajshahi bangdesh.pptx
MdRakibulIslam697135
 
Structural Wonderers_new and ancient.pptx
Structural Wonderers_new and ancient.pptx
nikopapa113
 
CST413 KTU S7 CSE Machine Learning Clustering K Means Hierarchical Agglomerat...
CST413 KTU S7 CSE Machine Learning Clustering K Means Hierarchical Agglomerat...
resming1
 
Generative AI & Scientific Research : Catalyst for Innovation, Ethics & Impact
Generative AI & Scientific Research : Catalyst for Innovation, Ethics & Impact
AlqualsaDIResearchGr
 
تقرير عن التحليل الديناميكي لتدفق الهواء حول جناح.pdf
تقرير عن التحليل الديناميكي لتدفق الهواء حول جناح.pdf
محمد قصص فتوتة
 
Modern multi-proposer consensus implementations
Modern multi-proposer consensus implementations
François Garillot
 
System design handwritten notes guidance
System design handwritten notes guidance
Shabista Imam
 
Introduction to Python Programming Language
Introduction to Python Programming Language
merlinjohnsy
 
MATERIAL SCIENCE LECTURE NOTES FOR DIPLOMA STUDENTS
MATERIAL SCIENCE LECTURE NOTES FOR DIPLOMA STUDENTS
SAMEER VISHWAKARMA
 
Tesla-Stock-Analysis-and-Forecast.pptx (1).pptx
Tesla-Stock-Analysis-and-Forecast.pptx (1).pptx
moonsony54
 
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
resming1
 
Call For Papers - 17th International Conference on Wireless & Mobile Networks...
Call For Papers - 17th International Conference on Wireless & Mobile Networks...
hosseinihamid192023
 
FUNDAMENTALS OF COMPUTER ORGANIZATION AND ARCHITECTURE
FUNDAMENTALS OF COMPUTER ORGANIZATION AND ARCHITECTURE
Shabista Imam
 
machine learning is a advance technology
machine learning is a advance technology
ynancy893
 
Deep Learning for Image Processing on 16 June 2025 MITS.pptx
Deep Learning for Image Processing on 16 June 2025 MITS.pptx
resming1
 
Cadastral Maps
Cadastral Maps
Google
 
Proposal for folders structure division in projects.pdf
Proposal for folders structure division in projects.pdf
Mohamed Ahmed
 
Abraham Silberschatz-Operating System Concepts (9th,2012.12).pdf
Abraham Silberschatz-Operating System Concepts (9th,2012.12).pdf
Shabista Imam
 
Solar thermal – Flat plate and concentrating collectors .pptx
Solar thermal – Flat plate and concentrating collectors .pptx
jdaniabraham1
 
retina_biometrics ruet rajshahi bangdesh.pptx
retina_biometrics ruet rajshahi bangdesh.pptx
MdRakibulIslam697135
 
Structural Wonderers_new and ancient.pptx
Structural Wonderers_new and ancient.pptx
nikopapa113
 
CST413 KTU S7 CSE Machine Learning Clustering K Means Hierarchical Agglomerat...
CST413 KTU S7 CSE Machine Learning Clustering K Means Hierarchical Agglomerat...
resming1
 
Generative AI & Scientific Research : Catalyst for Innovation, Ethics & Impact
Generative AI & Scientific Research : Catalyst for Innovation, Ethics & Impact
AlqualsaDIResearchGr
 
تقرير عن التحليل الديناميكي لتدفق الهواء حول جناح.pdf
تقرير عن التحليل الديناميكي لتدفق الهواء حول جناح.pdf
محمد قصص فتوتة
 
Modern multi-proposer consensus implementations
Modern multi-proposer consensus implementations
François Garillot
 
System design handwritten notes guidance
System design handwritten notes guidance
Shabista Imam
 
Ad

Deep learning Type Inference for Dynamic Programming Languages

  • 1. Deep Learning Type Inference for Dynamic Programming Languages Amir M. Mir PhD Student in Software Engineering Research Group [email protected] SERG Lunch April 22, 20201
  • 2. Content ● Introduction ● Type annotations ● Existing Deep Learning-based approaches ● Major Research Problem ● Our current approach 2
  • 3. Introduction Dynamic programming languages such as Python and JavaScript are extremely popular nowadays. 3
  • 5. Issues of Dynamic Languages ● Type errors ● Suboptimal IDE support ● Unexpected runtime behavior ● Difficult-to-understand APIs 5
  • 6. Type Annotations ● Type hints for Python 3 (PEP 484, Sep. 2014) ● TypeScript with optional static types (Oct. 2012) 6
  • 9. Type Annotations Issues ● Relies on developers ● Cumbersome and error-prone process ● Two main approaches for inferring types: ○ Static analysis tools ○ ML-based approaches 9
  • 10. Static Type Checkers ● Mypy (mypy-lang.org/) ● Pyre (pyre-check.org/) ● Flow (flow.org/) 10
  • 12. Existing Deep learning-based Approaches ● DeepTyper (Vincent et al., 2018) ● NL2Type (Malik et al., 2019) ● TypeWriter (Pradel et al., 2020) ● LAMBDANET (Wel et al., 2020) 12
  • 13. DeepTyper ● Inspired by part-of-speech (POS) tagging in NLP research ● The task is modelled as a sequence of annotations. ● Employs a Bi-directional Recurrent Neural Network (bi-RNN). ● Adds a consistency layer to the bi-RNN for considering multiple usage of a variable. 13
  • 15. NL2Type ● Considers natural language information embedded in code ○ Name of the function ○ Name of the formal parameters ○ Comment associated with the function ○ Comment associated with the parameters ○ Comment associated with return type of the function ● Learns two word embeddings for both comments and identifier names ● Adapts an RNN with long short-term memory (LSTM) 15
  • 18. TypeWriter ● Considers four kinds of context information: ○ Identifiers names ○ Code occurrences ○ Function-level comments ○ Available type hints ● Similar to NL2Type, it trains two word embeddings ● Has three RNNs submodels: ○ Learning from identifiers ○ Learning from token sequences ○ Learning from comments ● Feedback-guided search for consistent types 18
  • 21. LAMBDANET ● Imposes hard constraints on types ● Contextual hints ● Type dependency graph, i.e. a set of predicates ● Uses a Graph Neural Network (GNNs) and proposes a pointer-like network for handling user-defined types 21
  • 23. LAMBDANET Hyperedges in type dependency graph 23
  • 24. Major Research Problem Closed type vocabulary, i.e. limited to 1000 types. 24
  • 27. Re-implementation of TypeWriter with new dataset 27 ~27% higher ~7% higher
  • 28. Our Current Approach Improved available type extractor 28 [AbstactResolver, ClientConnectionError, ClientHttpProxyError, …] 1 1 0 0Python dataset Visible Type hints extractor Type mask vector● Lightweight static analysis with importlab and LibCST
  • 29. Our Current Approach Future ● Refinements to the search part and/or the neural model ● Performing extensive experiments to show the effectiveness of the approach ● Writing a paper draft by the end of June. 29
  • 31. References 1. Hellendoorn, V. J., Bird, C., Barr, E. T., & Allamanis, M. (2018, October). Deep learning type inference. In Proceedings of the 2018 26th acm joint meeting on european software engineering conference and symposium on the foundations of software engineering (pp. 152-162). 2. Malik, R. S., Patra, J., & Pradel, M. (2019, May). NL2Type: inferring JavaScript function types from natural language information. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) (pp. 304-315). IEEE. 3. Pradel, M., Gousios, G., Liu, J., & Chandra, S. (2019). TypeWriter: Neural Type Prediction with Search-based Validation. arXiv preprint arXiv:1912.03768. 4. Wei, J., Goyal, M., Durrett, G., & Dillig, I. (2020). LambdaNet: Probabilistic Type Inference using Graph Neural Networks. ICLR 2020. 5. Gage, P. (1994). A new algorithm for data compression. C Users Journal, 12(2), 23-38. 6. Sennrich, R., Haddow, B., & Birch, A. (2015). Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909. 31