SlideShare a Scribd company logo
A Conceptual Dependency Graph Based
Keyword Extraction Model for Source Code to
API Documentation Mapping
Prepared By
Nakul Sharma
Under Guidance of
Dr. Prasanth Yalla
Professor, Department of Computer Science and
Engineering.
Koneru Laxmiah Education Foundation.
Vijayawada, Andhra Pradesh
India
Table of Contents
 Introduction
 Background
 Mathematical Foundations
 Genesis of Research
 Proposed Methodology
 Results and Discussion
 Future Scope and Conclusion
 References
Introduction
Traditional key feature extraction techniques
• use terms or sentences from the project
source codes to form a unique code structure.
Almost all traditional document key phrase
extraction techniques
• represent a document collection as the
phrase or sentence matrix in which each row
denotes the phrase or sentence-id and
corresponding column represents the frequency
Introduction (Continued)
Main problem with the existing systems is that they ignore the
context based textual information.
Contextual Information hold more relevance especially when
undertaking any software change which effects not just the
current phase of project but also the previous phases and the
next phases.
Source Code Analysis also aids in checking the effect of
change on code.
In the proposed model, a weighted graph dependency model
is used to filter the candidate sets among the vertices for
contextual similarity computation.
Background
• Source Code Analysis
• Text Mining
• Document Representation
• Clustering
• NLP/CL
Mathematical Framework
• Centrality Measures
• Document Clustering
• Document Metrics
• Source Code Metrics
Genesis of Research
Work Done in Text Mining and its related fields
Research conducted by various authors
Related Work
Sr. No. Name of Authors Work Done in Brief
1 S. Mohammadi et.al new approach is presented to extract the
knowledge of dependency between
artifacts in the source code.
2 V. U. Gómez, et.al U. Gómez, et.al, proposed a semantic
model on the visually characterizing
source code modifications
3 S. L. Abebe et.al S. L. Abebe et.al has introduced a new
extraction scheme that is sufficiently
effective to extract domain concepts from
the source code.
4 S. Bajracharya, et al, S. Bajracharya, et al, developed a new
SCA framework to collect and analyze
open source code on a large scale
5 A. S. Yumaganov A. S. Yumaganov proposed to compare
different search models for similarity with
limitations on the source code
Related Work
Sr. No. Name of Authors Work Done in Brief
1 Dimitriou et.al A. Dimitriou et.al, introduced a new keyword
search of top-k-size on tree structured data
2 W. Ding W. Ding proposed a review of software
documentation process knowledge-based
techniques
3 Hussain et. al. Hussain et.al proposed a new software design
pattern classification and selection scheme.
4 Ibrahim et. al. Ibrahim et.al presented a scientometric re-
ranking technique
5 L. H. Lee et. al. L. H. Lee, et.al, used Bayesian text classification
to introduce high relevance keyword extraction
process
Related Work (Related to Software
Metrics)
Sr. No. Name of Authors Work Done in Brief
1 Dimitriou et.al A. Dimitriou et.al, introduced a new keyword search of top-k-
size on tree structured data
2 W. Ding W. Ding proposed a review of software documentation
process knowledge-based techniques
3 Hussain et. al. Hussain et.al proposed a new software design pattern
classification and selection scheme.
4 Ibrahim et. al. Ibrahim et.al presented a scientometric re-ranking technique
5 L. H. Lee et. al. L. H. Lee, et.al, used Bayesian text classification to introduce
high relevance keyword extraction process
Observations on Related Work
Large open source projects not considered in SCA
systems and tools developed
Existing system also do not take into
consideration the contextual keyphrases in
providing traceability links.
The current work proposes an alternative
contextual dependency graph based software
metrics in form of contextual similarity.
Proposed Methodology
Figure 1: Module-1
Project source
codes
Class parsing
Project API
documentation
Text pre-processing
Filtered API
documents
Code dependency
Graph
Proposed
Contextual
dependency graph
similarity
Pre-processing of API Documents
Proposed
Methodology
Phase 1: Source Code and API documents Pre-processing
Step 1: Read project source codes S.
Step 2: Read project API documents D.
Step 3: for each code Ci in S[]
Do
Parse source code Ci with methods M and Fields F.
Mi=ExtractMethods(Ci)
Fi=ExtractFields(Ci)
Mapping (Mi , Fi) to Ci
C1 (M1,F1)
C2 (M2,F2)
… …..
Cn (Mn,Fn)
done
Step 4: // Remove the duplicate methods and fields in each class
For each code Ci
Do
i i j
i i j
M Pr ob(M M / C);i j
F Pr ob(F F / C);i j
  
  
If( Mi!=0 AND Fi!=0)
Then
Remove Mi in Ci or Cj
Remove Fi in Ci or Cj
End if
Done
Results and Discussion
Project LDA ONTOSE Proposed Method
Apache Pluto 0.846 0.835 0.9436
Apache Commons
Collections
0.736 0.753 0.879
JEuclid 0.794 0.825 0.962
JFreeChart 0.773 0.874 0.921
Kyro 0.874 0.915 0.948
Future Scope and Conclusion
The current paper proposed a novel approach to find
the relationship between the source code to API
documents using the contextual dependency graph. A
two pronged approach is used in the proposed method.
The project source code is scanned for the relevant
metrics. On the other hand, from the API
documentation, necessary information is extracted.
Here, the dependency graph is used to compute the
contextual similarity computation between the source
code metrics and its API documents
References
Amir Hossein Rasekh, Amir Hossein Arshia, “Mining and discovery of hidden relationships between
software source codes and related textual Documents”, Digital Scholarship in the Humanities ,
Published by Oxford University Press on behalf of EADH., doi:10.1093/llc/fqx052,
Chun Yong Chong , Sai Peck Lee , Automatic Clustering Constraints Derivation from Object-Oriented
Software Using Weighted Complex Network with Graph Theory Analysis, The Journal of Systems &
Software (2017), doi: 10.1016/j.jss.2017.08.017
Anh Tuan Nguyen, Tien N. Nguyen, Graph-based Statistical Language Model for Code, 2015
IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE), 2015, Florence,
Italy, Page 858-862.
Lars Ackermann, Bernhard Volz, “model[NL]generation: Natural Language Model Extraction”,
DCM’13: Proceedings of the 2013 workshop on Domain Specific Modeling: ACM New York,USA.
F Meziane, N. Athanasakis, S. Ananiadou, "Generating Natural Lanuage Specifications from UML
Class diagrams", Requirement Engineering Journal, 13(1):1-18, Springer-Verlag, London.
Fabian Friedrich, Jan Mendling, Frank Puhlmann, “Process Model Generation from Natural
Language Text”, In Advanced Information Systems Engineering, Eds. Lecture Notes in Computer
Science. Springer Berlin Heidelberg, Berlin, Heidelberg, 482-496.
Ad

Recommended

Resume_CS
Resume_CS
Yunzhe Pan
 
Findability through Traceability - A Realistic Application of Candidate Tr...
Findability through Traceability - A Realistic Application of Candidate Tr...
Markus Borg
 
‘CodeAliker’ - Plagiarism Detection on the Cloud
‘CodeAliker’ - Plagiarism Detection on the Cloud
acijjournal
 
Keyphrase Extraction And Source Code Similarity Detection- A Survey
Keyphrase Extraction And Source Code Similarity Detection- A Survey
Nakul Sharma
 
Mapping and visualization of source code a survey
Mapping and visualization of source code a survey
Nakul Sharma
 
SurfExample- Recommendation of Exception Handling Code Examples
SurfExample- Recommendation of Exception Handling Code Examples
Masud Rahman
 
A first look at the integration of machine learning models in complex autonom...
A first look at the integration of machine learning models in complex autonom...
Concordia University
 
A parallel association rule mining algorithm for corpus
A parallel association rule mining algorithm for corpus
Caspar Yim
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
ijceronline
 
Mining Query Log to Suggest Competitive Keyphrases for Sponsored Search Via I...
Mining Query Log to Suggest Competitive Keyphrases for Sponsored Search Via I...
IRJET Journal
 
Iwesep19.ppt
Iwesep19.ppt
Yann-Gaël Guéhéneuc
 
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
Tao Xie
 
Final Assignment - Evaluating Scholarly Articles - Area of Research Interest ...
Final Assignment - Evaluating Scholarly Articles - Area of Research Interest ...
E Rey Garcia, MPA, DCS-EIS Candidate
 
Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...
Tao Xie
 
EXTRACTING ARABIC RELATIONS FROM THE WEB
EXTRACTING ARABIC RELATIONS FROM THE WEB
ijcsit
 
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...
Tao Xie
 
Software bug prediction
Software bug prediction
Muthukumaran Kasinathan
 
Machine learning with graph
Machine learning with graph
Ding Li
 
Designing Cross-Language Information Retrieval System using various Technique...
Designing Cross-Language Information Retrieval System using various Technique...
IRJET Journal
 
Enase20.ppt
Enase20.ppt
Yann-Gaël Guéhéneuc
 
Component Search and Retrieval
Component Search and Retrieval
Eduardo Cruz
 
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
TELKOMNIKA JOURNAL
 
Examination of Document Similarity Using Rabin-Karp Algorithm
Examination of Document Similarity Using Rabin-Karp Algorithm
Universitas Pembangunan Panca Budi
 
Computing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search engine
csandit
 
Keyphrase Extraction using Neighborhood Knowledge
Keyphrase Extraction using Neighborhood Knowledge
IJMTST Journal
 
A study of code change patterns for adaptive maintenance with AST analysis
A study of code change patterns for adaptive maintenance with AST analysis
IJECEIAES
 
Towards the Next Generation of Reactive Model Transformations on Low-Code Pla...
Towards the Next Generation of Reactive Model Transformations on Low-Code Pla...
IncQuery Labs
 
Searching Repositories of Web Application Models
Searching Repositories of Web Application Models
Marco Brambilla
 
A Survey on Design Pattern Detection Approaches
A Survey on Design Pattern Detection Approaches
CSCJournals
 
Automatic Code Completion Exploting Semantic Similarity
Automatic Code Completion Exploting Semantic Similarity
Masud Rahman
 

More Related Content

What's hot (19)

International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
ijceronline
 
Mining Query Log to Suggest Competitive Keyphrases for Sponsored Search Via I...
Mining Query Log to Suggest Competitive Keyphrases for Sponsored Search Via I...
IRJET Journal
 
Iwesep19.ppt
Iwesep19.ppt
Yann-Gaël Guéhéneuc
 
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
Tao Xie
 
Final Assignment - Evaluating Scholarly Articles - Area of Research Interest ...
Final Assignment - Evaluating Scholarly Articles - Area of Research Interest ...
E Rey Garcia, MPA, DCS-EIS Candidate
 
Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...
Tao Xie
 
EXTRACTING ARABIC RELATIONS FROM THE WEB
EXTRACTING ARABIC RELATIONS FROM THE WEB
ijcsit
 
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...
Tao Xie
 
Software bug prediction
Software bug prediction
Muthukumaran Kasinathan
 
Machine learning with graph
Machine learning with graph
Ding Li
 
Designing Cross-Language Information Retrieval System using various Technique...
Designing Cross-Language Information Retrieval System using various Technique...
IRJET Journal
 
Enase20.ppt
Enase20.ppt
Yann-Gaël Guéhéneuc
 
Component Search and Retrieval
Component Search and Retrieval
Eduardo Cruz
 
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
TELKOMNIKA JOURNAL
 
Examination of Document Similarity Using Rabin-Karp Algorithm
Examination of Document Similarity Using Rabin-Karp Algorithm
Universitas Pembangunan Panca Budi
 
Computing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search engine
csandit
 
Keyphrase Extraction using Neighborhood Knowledge
Keyphrase Extraction using Neighborhood Knowledge
IJMTST Journal
 
A study of code change patterns for adaptive maintenance with AST analysis
A study of code change patterns for adaptive maintenance with AST analysis
IJECEIAES
 
Towards the Next Generation of Reactive Model Transformations on Low-Code Pla...
Towards the Next Generation of Reactive Model Transformations on Low-Code Pla...
IncQuery Labs
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
ijceronline
 
Mining Query Log to Suggest Competitive Keyphrases for Sponsored Search Via I...
Mining Query Log to Suggest Competitive Keyphrases for Sponsored Search Via I...
IRJET Journal
 
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
Tao Xie
 
Final Assignment - Evaluating Scholarly Articles - Area of Research Interest ...
Final Assignment - Evaluating Scholarly Articles - Area of Research Interest ...
E Rey Garcia, MPA, DCS-EIS Candidate
 
Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...
Tao Xie
 
EXTRACTING ARABIC RELATIONS FROM THE WEB
EXTRACTING ARABIC RELATIONS FROM THE WEB
ijcsit
 
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...
Tao Xie
 
Machine learning with graph
Machine learning with graph
Ding Li
 
Designing Cross-Language Information Retrieval System using various Technique...
Designing Cross-Language Information Retrieval System using various Technique...
IRJET Journal
 
Component Search and Retrieval
Component Search and Retrieval
Eduardo Cruz
 
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
TELKOMNIKA JOURNAL
 
Computing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search engine
csandit
 
Keyphrase Extraction using Neighborhood Knowledge
Keyphrase Extraction using Neighborhood Knowledge
IJMTST Journal
 
A study of code change patterns for adaptive maintenance with AST analysis
A study of code change patterns for adaptive maintenance with AST analysis
IJECEIAES
 
Towards the Next Generation of Reactive Model Transformations on Low-Code Pla...
Towards the Next Generation of Reactive Model Transformations on Low-Code Pla...
IncQuery Labs
 

Similar to A Conceptual Dependency Graph Based Keyword Extraction Model for Source Code to API Documentation Mapping (20)

Searching Repositories of Web Application Models
Searching Repositories of Web Application Models
Marco Brambilla
 
A Survey on Design Pattern Detection Approaches
A Survey on Design Pattern Detection Approaches
CSCJournals
 
Automatic Code Completion Exploting Semantic Similarity
Automatic Code Completion Exploting Semantic Similarity
Masud Rahman
 
A tool for Detecting Source Code Plagarism-SourcePlag
A tool for Detecting Source Code Plagarism-SourcePlag
Nakul Sharma
 
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
ijseajournal
 
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
ijseajournal
 
Ju3517011704
Ju3517011704
IJERA Editor
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
ijceronline
 
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
IJDKP
 
H1803044651
H1803044651
IOSR Journals
 
Query expansion using novel use case scenario relationship for finding featur...
Query expansion using novel use case scenario relationship for finding featur...
IJECEIAES
 
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
kevig
 
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
ijnlc
 
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...
cscpconf
 
An effective citation metadata extraction process based on BibPro parser
An effective citation metadata extraction process based on BibPro parser
IOSR Journals
 
Towards Reusable Research Software
Towards Reusable Research Software
dgarijo
 
Using Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic Web
IJwest
 
Semantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based System
ijcnes
 
Sub1583
Sub1583
International Journal of Science and Research (IJSR)
 
Paper id 25201463
Paper id 25201463
IJRAT
 
Searching Repositories of Web Application Models
Searching Repositories of Web Application Models
Marco Brambilla
 
A Survey on Design Pattern Detection Approaches
A Survey on Design Pattern Detection Approaches
CSCJournals
 
Automatic Code Completion Exploting Semantic Similarity
Automatic Code Completion Exploting Semantic Similarity
Masud Rahman
 
A tool for Detecting Source Code Plagarism-SourcePlag
A tool for Detecting Source Code Plagarism-SourcePlag
Nakul Sharma
 
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
ijseajournal
 
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
A DATA EXTRACTION ALGORITHM FROM OPEN SOURCE SOFTWARE PROJECT REPOSITORIES FO...
ijseajournal
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
ijceronline
 
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
IJDKP
 
Query expansion using novel use case scenario relationship for finding featur...
Query expansion using novel use case scenario relationship for finding featur...
IJECEIAES
 
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
kevig
 
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
ijnlc
 
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...
cscpconf
 
An effective citation metadata extraction process based on BibPro parser
An effective citation metadata extraction process based on BibPro parser
IOSR Journals
 
Towards Reusable Research Software
Towards Reusable Research Software
dgarijo
 
Using Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic Web
IJwest
 
Semantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based System
ijcnes
 
Paper id 25201463
Paper id 25201463
IJRAT
 
Ad

More from Nakul Sharma (9)

Machine Translation- Indian Regional lannguages.pdf
Machine Translation- Indian Regional lannguages.pdf
Nakul Sharma
 
Visualizing UML’s Sequence and Class Diagrams Using Graph-Based Clusters
Visualizing UML’s Sequence and Class Diagrams Using Graph-Based Clusters
Nakul Sharma
 
Mapping and visualization of source code a survey
Mapping and visualization of source code a survey
Nakul Sharma
 
Integrating natural language processing and software engineering
Integrating natural language processing and software engineering
Nakul Sharma
 
Possibility of interdisciplinary research software engineering andnatural lan...
Possibility of interdisciplinary research software engineering andnatural lan...
Nakul Sharma
 
Possibility of interdisciplinary research software engineering and
Possibility of interdisciplinary research software engineering and
Nakul Sharma
 
Session on machine translation batu 19 march2016
Session on machine translation batu 19 march2016
Nakul Sharma
 
Integrating natural language processing and software engineering
Integrating natural language processing and software engineering
Nakul Sharma
 
Statistical machine translation for indian language copy
Statistical machine translation for indian language copy
Nakul Sharma
 
Machine Translation- Indian Regional lannguages.pdf
Machine Translation- Indian Regional lannguages.pdf
Nakul Sharma
 
Visualizing UML’s Sequence and Class Diagrams Using Graph-Based Clusters
Visualizing UML’s Sequence and Class Diagrams Using Graph-Based Clusters
Nakul Sharma
 
Mapping and visualization of source code a survey
Mapping and visualization of source code a survey
Nakul Sharma
 
Integrating natural language processing and software engineering
Integrating natural language processing and software engineering
Nakul Sharma
 
Possibility of interdisciplinary research software engineering andnatural lan...
Possibility of interdisciplinary research software engineering andnatural lan...
Nakul Sharma
 
Possibility of interdisciplinary research software engineering and
Possibility of interdisciplinary research software engineering and
Nakul Sharma
 
Session on machine translation batu 19 march2016
Session on machine translation batu 19 march2016
Nakul Sharma
 
Integrating natural language processing and software engineering
Integrating natural language processing and software engineering
Nakul Sharma
 
Statistical machine translation for indian language copy
Statistical machine translation for indian language copy
Nakul Sharma
 
Ad

Recently uploaded (20)

Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
resming1
 
MATERIAL SCIENCE LECTURE NOTES FOR DIPLOMA STUDENTS
MATERIAL SCIENCE LECTURE NOTES FOR DIPLOMA STUDENTS
SAMEER VISHWAKARMA
 
Generative AI & Scientific Research : Catalyst for Innovation, Ethics & Impact
Generative AI & Scientific Research : Catalyst for Innovation, Ethics & Impact
AlqualsaDIResearchGr
 
Rapid Prototyping for XR: Lecture 3 - Video and Paper Prototyping
Rapid Prototyping for XR: Lecture 3 - Video and Paper Prototyping
Mark Billinghurst
 
FUNDAMENTALS OF COMPUTER ORGANIZATION AND ARCHITECTURE
FUNDAMENTALS OF COMPUTER ORGANIZATION AND ARCHITECTURE
Shabista Imam
 
Rapid Prototyping for XR: Lecture 5 - Cross Platform Development
Rapid Prototyping for XR: Lecture 5 - Cross Platform Development
Mark Billinghurst
 
دراسة حاله لقرية تقع في جنوب غرب السودان
دراسة حاله لقرية تقع في جنوب غرب السودان
محمد قصص فتوتة
 
Introduction to sensing and Week-1.pptx
Introduction to sensing and Week-1.pptx
KNaveenKumarECE
 
May 2025: Top 10 Read Articles in Data Mining & Knowledge Management Process
May 2025: Top 10 Read Articles in Data Mining & Knowledge Management Process
IJDKP
 
Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx
Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx
resming1
 
retina_biometrics ruet rajshahi bangdesh.pptx
retina_biometrics ruet rajshahi bangdesh.pptx
MdRakibulIslam697135
 
20CE404-Soil Mechanics - Slide Share PPT
20CE404-Soil Mechanics - Slide Share PPT
saravananr808639
 
Validating a Citizen Observatories enabling Platform by completing a Citizen ...
Validating a Citizen Observatories enabling Platform by completing a Citizen ...
Diego López-de-Ipiña González-de-Artaza
 
Microwatt: Open Tiny Core, Big Possibilities
Microwatt: Open Tiny Core, Big Possibilities
IBM
 
special_edition_using_visual_foxpro_6.pdf
special_edition_using_visual_foxpro_6.pdf
Shabista Imam
 
Call For Papers - 17th International Conference on Wireless & Mobile Networks...
Call For Papers - 17th International Conference on Wireless & Mobile Networks...
hosseinihamid192023
 
Complete University of Calculus :: 2nd edition
Complete University of Calculus :: 2nd edition
Shabista Imam
 
Introduction to Python Programming Language
Introduction to Python Programming Language
merlinjohnsy
 
Structured Programming with C++ :: Kjell Backman
Structured Programming with C++ :: Kjell Backman
Shabista Imam
 
Rapid Prototyping for XR: Lecture 2 - Low Fidelity Prototyping.
Rapid Prototyping for XR: Lecture 2 - Low Fidelity Prototyping.
Mark Billinghurst
 
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
resming1
 
MATERIAL SCIENCE LECTURE NOTES FOR DIPLOMA STUDENTS
MATERIAL SCIENCE LECTURE NOTES FOR DIPLOMA STUDENTS
SAMEER VISHWAKARMA
 
Generative AI & Scientific Research : Catalyst for Innovation, Ethics & Impact
Generative AI & Scientific Research : Catalyst for Innovation, Ethics & Impact
AlqualsaDIResearchGr
 
Rapid Prototyping for XR: Lecture 3 - Video and Paper Prototyping
Rapid Prototyping for XR: Lecture 3 - Video and Paper Prototyping
Mark Billinghurst
 
FUNDAMENTALS OF COMPUTER ORGANIZATION AND ARCHITECTURE
FUNDAMENTALS OF COMPUTER ORGANIZATION AND ARCHITECTURE
Shabista Imam
 
Rapid Prototyping for XR: Lecture 5 - Cross Platform Development
Rapid Prototyping for XR: Lecture 5 - Cross Platform Development
Mark Billinghurst
 
دراسة حاله لقرية تقع في جنوب غرب السودان
دراسة حاله لقرية تقع في جنوب غرب السودان
محمد قصص فتوتة
 
Introduction to sensing and Week-1.pptx
Introduction to sensing and Week-1.pptx
KNaveenKumarECE
 
May 2025: Top 10 Read Articles in Data Mining & Knowledge Management Process
May 2025: Top 10 Read Articles in Data Mining & Knowledge Management Process
IJDKP
 
Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx
Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx
resming1
 
retina_biometrics ruet rajshahi bangdesh.pptx
retina_biometrics ruet rajshahi bangdesh.pptx
MdRakibulIslam697135
 
20CE404-Soil Mechanics - Slide Share PPT
20CE404-Soil Mechanics - Slide Share PPT
saravananr808639
 
Validating a Citizen Observatories enabling Platform by completing a Citizen ...
Validating a Citizen Observatories enabling Platform by completing a Citizen ...
Diego López-de-Ipiña González-de-Artaza
 
Microwatt: Open Tiny Core, Big Possibilities
Microwatt: Open Tiny Core, Big Possibilities
IBM
 
special_edition_using_visual_foxpro_6.pdf
special_edition_using_visual_foxpro_6.pdf
Shabista Imam
 
Call For Papers - 17th International Conference on Wireless & Mobile Networks...
Call For Papers - 17th International Conference on Wireless & Mobile Networks...
hosseinihamid192023
 
Complete University of Calculus :: 2nd edition
Complete University of Calculus :: 2nd edition
Shabista Imam
 
Introduction to Python Programming Language
Introduction to Python Programming Language
merlinjohnsy
 
Structured Programming with C++ :: Kjell Backman
Structured Programming with C++ :: Kjell Backman
Shabista Imam
 
Rapid Prototyping for XR: Lecture 2 - Low Fidelity Prototyping.
Rapid Prototyping for XR: Lecture 2 - Low Fidelity Prototyping.
Mark Billinghurst
 

A Conceptual Dependency Graph Based Keyword Extraction Model for Source Code to API Documentation Mapping

  • 1. A Conceptual Dependency Graph Based Keyword Extraction Model for Source Code to API Documentation Mapping Prepared By Nakul Sharma Under Guidance of Dr. Prasanth Yalla Professor, Department of Computer Science and Engineering. Koneru Laxmiah Education Foundation. Vijayawada, Andhra Pradesh India
  • 2. Table of Contents  Introduction  Background  Mathematical Foundations  Genesis of Research  Proposed Methodology  Results and Discussion  Future Scope and Conclusion  References
  • 3. Introduction Traditional key feature extraction techniques • use terms or sentences from the project source codes to form a unique code structure. Almost all traditional document key phrase extraction techniques • represent a document collection as the phrase or sentence matrix in which each row denotes the phrase or sentence-id and corresponding column represents the frequency
  • 4. Introduction (Continued) Main problem with the existing systems is that they ignore the context based textual information. Contextual Information hold more relevance especially when undertaking any software change which effects not just the current phase of project but also the previous phases and the next phases. Source Code Analysis also aids in checking the effect of change on code. In the proposed model, a weighted graph dependency model is used to filter the candidate sets among the vertices for contextual similarity computation.
  • 5. Background • Source Code Analysis • Text Mining • Document Representation • Clustering • NLP/CL
  • 6. Mathematical Framework • Centrality Measures • Document Clustering • Document Metrics • Source Code Metrics
  • 7. Genesis of Research Work Done in Text Mining and its related fields Research conducted by various authors
  • 8. Related Work Sr. No. Name of Authors Work Done in Brief 1 S. Mohammadi et.al new approach is presented to extract the knowledge of dependency between artifacts in the source code. 2 V. U. Gómez, et.al U. Gómez, et.al, proposed a semantic model on the visually characterizing source code modifications 3 S. L. Abebe et.al S. L. Abebe et.al has introduced a new extraction scheme that is sufficiently effective to extract domain concepts from the source code. 4 S. Bajracharya, et al, S. Bajracharya, et al, developed a new SCA framework to collect and analyze open source code on a large scale 5 A. S. Yumaganov A. S. Yumaganov proposed to compare different search models for similarity with limitations on the source code
  • 9. Related Work Sr. No. Name of Authors Work Done in Brief 1 Dimitriou et.al A. Dimitriou et.al, introduced a new keyword search of top-k-size on tree structured data 2 W. Ding W. Ding proposed a review of software documentation process knowledge-based techniques 3 Hussain et. al. Hussain et.al proposed a new software design pattern classification and selection scheme. 4 Ibrahim et. al. Ibrahim et.al presented a scientometric re- ranking technique 5 L. H. Lee et. al. L. H. Lee, et.al, used Bayesian text classification to introduce high relevance keyword extraction process
  • 10. Related Work (Related to Software Metrics) Sr. No. Name of Authors Work Done in Brief 1 Dimitriou et.al A. Dimitriou et.al, introduced a new keyword search of top-k- size on tree structured data 2 W. Ding W. Ding proposed a review of software documentation process knowledge-based techniques 3 Hussain et. al. Hussain et.al proposed a new software design pattern classification and selection scheme. 4 Ibrahim et. al. Ibrahim et.al presented a scientometric re-ranking technique 5 L. H. Lee et. al. L. H. Lee, et.al, used Bayesian text classification to introduce high relevance keyword extraction process
  • 11. Observations on Related Work Large open source projects not considered in SCA systems and tools developed Existing system also do not take into consideration the contextual keyphrases in providing traceability links. The current work proposes an alternative contextual dependency graph based software metrics in form of contextual similarity.
  • 12. Proposed Methodology Figure 1: Module-1 Project source codes Class parsing Project API documentation Text pre-processing Filtered API documents Code dependency Graph Proposed Contextual dependency graph similarity
  • 14. Proposed Methodology Phase 1: Source Code and API documents Pre-processing Step 1: Read project source codes S. Step 2: Read project API documents D. Step 3: for each code Ci in S[] Do Parse source code Ci with methods M and Fields F. Mi=ExtractMethods(Ci) Fi=ExtractFields(Ci) Mapping (Mi , Fi) to Ci C1 (M1,F1) C2 (M2,F2) … ….. Cn (Mn,Fn) done Step 4: // Remove the duplicate methods and fields in each class For each code Ci Do i i j i i j M Pr ob(M M / C);i j F Pr ob(F F / C);i j       If( Mi!=0 AND Fi!=0) Then Remove Mi in Ci or Cj Remove Fi in Ci or Cj End if Done
  • 15. Results and Discussion Project LDA ONTOSE Proposed Method Apache Pluto 0.846 0.835 0.9436 Apache Commons Collections 0.736 0.753 0.879 JEuclid 0.794 0.825 0.962 JFreeChart 0.773 0.874 0.921 Kyro 0.874 0.915 0.948
  • 16. Future Scope and Conclusion The current paper proposed a novel approach to find the relationship between the source code to API documents using the contextual dependency graph. A two pronged approach is used in the proposed method. The project source code is scanned for the relevant metrics. On the other hand, from the API documentation, necessary information is extracted. Here, the dependency graph is used to compute the contextual similarity computation between the source code metrics and its API documents
  • 17. References Amir Hossein Rasekh, Amir Hossein Arshia, “Mining and discovery of hidden relationships between software source codes and related textual Documents”, Digital Scholarship in the Humanities , Published by Oxford University Press on behalf of EADH., doi:10.1093/llc/fqx052, Chun Yong Chong , Sai Peck Lee , Automatic Clustering Constraints Derivation from Object-Oriented Software Using Weighted Complex Network with Graph Theory Analysis, The Journal of Systems & Software (2017), doi: 10.1016/j.jss.2017.08.017 Anh Tuan Nguyen, Tien N. Nguyen, Graph-based Statistical Language Model for Code, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE), 2015, Florence, Italy, Page 858-862. Lars Ackermann, Bernhard Volz, “model[NL]generation: Natural Language Model Extraction”, DCM’13: Proceedings of the 2013 workshop on Domain Specific Modeling: ACM New York,USA. F Meziane, N. Athanasakis, S. Ananiadou, "Generating Natural Lanuage Specifications from UML Class diagrams", Requirement Engineering Journal, 13(1):1-18, Springer-Verlag, London. Fabian Friedrich, Jan Mendling, Frank Puhlmann, “Process Model Generation from Natural Language Text”, In Advanced Information Systems Engineering, Eds. Lecture Notes in Computer Science. Springer Berlin Heidelberg, Berlin, Heidelberg, 482-496.