SlideShare a Scribd company logo
Supporting Program Comprehension
with Source Code Summarization

Sonia Haiduc, Jairo
Aponte, Andrian Marcus
Presented By: Mohammad Masudur Rahman
Contents










2

Why Code Summarization?
Thesis Statement
Research Questions about summary
Research Questions about tool
Automatic Code Summarization
Evaluation
Experiments Conducted
Pyramid Method
Important Findings
My Observation & Future Works
Why Code Summarization?
 Program

comprehension 50% of all
maintenance works
 Two extreme approaches – skim through and
read thoroughly
 Skim through – leads to misunderstanding
 Read thoroughly – time consuming
 An intermediate solution – source code entity
with comprehensive textual description
3
Thesis Statement
 New

idea: code summarization to help in
program comprehension (PC)
 Applying TR methods like Latent Semantic
Indexing in source code summarization.
 Combining structural information with
retrieved code summary to make it effective
for realistic purposes.
4
Research Questions of Code
Summarization
 Summary

should be automatically generated
 Generate summary to different granularity
levels – class, method, packages etc
 Shorter than the source code
 Capture and preserve code semantics and
structure – text as well as structure from the
code
 Consistent structure – important items at first
5
Research Questions of Code
Summarization
 Summary

should reflect the developer’s
understanding about the code
 Tool should allow user to change summary
and will remember user’s choice in future
summary
 Tool should rebuild the summary if the code
changes or developer’s provide feedback
6
Research Questions about
Summarizer Tool









7

Which summarization technique works the best for
source code?
What type of structural info necessary in summary?
Will the summary be different for different type of
maintenance task?
How long it would be?
How much will it resemble to actual summary?
How do developers generate summary?
Automatic Code Summarization
 Generate

extractive summary – the most
important info extracted from the document

8
Automatic Code Summarization
 Two

types info extracted – lexical and
structural
 Lexical info – identifiers and comments are
extracted
 Common English and PL keywords are
removed
 Identifiers are split into constituent words and
stemming performed.
9
Automatic Code Summarization
 Extracted

lexical info forms the text corpus of
code where TR methods (e.g. LSI) used to
get most important n words.
 Once retrieved, n words are combined with
structural info like their class name, method
name, package name, parameter name and
type etc
 How to apply structural info to autogenerated summary is an important part
10
Automatic Code Summarization
A

method name reflects the description of
what it does.
 If method name ignored by TR, the tool can
introduce it automatically
 Additional info can be added like –user tags

11
Evaluation






12

Two types – intrinsic and extrinsic
Intrinsic – content evaluation, how closely it depicts
the document or how close to manually generated
summary
Metrics- precision, recall, pyramid method
Extrinsic – how much utility and usability it has to
support SE tasks – concept location, impact
analysis, software reuse, traceability links recovery
etc
Experiments Conducted
 Pyramid

method
 ATunes OS project, 12 methods
 6 developers from different demographic
locations, undergraduate students, 3 years
Java programming experiences
 Developers provided with a list of terms, they
need to choose 5 terms for each method that
suits best, 60 minutes total time
13
Experiments Conducted
 Corpus

containing whole code vocabulary
 Each method is a different document
 LSI indexing the corpus against each method
terms
 Cosine measure between corpus and
method and corpus words are ranked
 Top 5 words from corpus are chosen
14
Pyramid method
 Pyramid

score = (Sum of A’s score / Total
score A could make)

15
Pyramid Score

16
Important Findings








17

Pyramid score >=.1 and <=.5, marked it encouraging
Words chosen by developers – 98.7% in method
name, 88.9% in class name and 84.6% in parameter
name
Automatic summary terms – 20% in method name,
12.9% in class name and 30.7% in parameter name
Structural info should be considered properly in
automatic summary
Comments text not included in summary
My Observation &Future Works








18

The corpus development technique is not well
specified- no specification about redundancy
protection
LSI focuses on term frequency rather than structural
info which produces bad scores.
During cosine measurement structural info of term in
the method could be considered to get better results
There should have some heuristic measurement for
structural info.
Thank You
Questions?

19
Ad

Recommended

Summarization Techniques for Code, Changes, and Testing
Summarization Techniques for Code, Changes, and Testing
Sebastiano Panichella
 
Supporting program comprehension with source code summarization icse nier 2010
Supporting program comprehension with source code summarization icse nier 2010
Sonia Haiduc
 
Frequently asked tcs technical interview questions and answers
Frequently asked tcs technical interview questions and answers
nishajj
 
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET Journal
 
50120140503001
50120140503001
IAEME Publication
 
50120140503001
50120140503001
IAEME Publication
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathi
aciijournal
 
Finding Help with Programming Errors: An Exploratory Study of Novice Software...
Finding Help with Programming Errors: An Exploratory Study of Novice Software...
Preetha Chatterjee
 
Mining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software Artifacts
Preetha Chatterjee
 
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT
IAEME Publication
 
Cohesive Software Design
Cohesive Software Design
ijtsrd
 
Algorithms and Application Programming
Algorithms and Application Programming
ahaleemsl
 
Supporting software documentation with source code summarization
Supporting software documentation with source code summarization
Ra'Fat Al-Msie'deen
 
Extracting Archival-Quality Information from Software-Related Chats
Extracting Archival-Quality Information from Software-Related Chats
Preetha Chatterjee
 
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Preetha Chatterjee
 
Chain indexing
Chain indexing
silambu111
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET Journal
 
Bt9402 artificial intelligence
Bt9402 artificial intelligence
smumbahelp
 
A New Metric for Code Readability
A New Metric for Code Readability
IOSR Journals
 
Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...
Waqas Tariq
 
Survey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi Language
Editor IJCATR
 
Java chapter 3
Java chapter 3
Mukesh Tekwani
 
Cd ch2 - lexical analysis
Cd ch2 - lexical analysis
mengistu23
 
Hindi language as a graphical user interface to relational database for tran...
Hindi language as a graphical user interface to relational database for tran...
IRJET Journal
 
Automatic Traceability
Automatic Traceability
Radoslaw Smilgin
 
130817 latifa guerrouj - context-aware source code vocabulary normalization...
130817 latifa guerrouj - context-aware source code vocabulary normalization...
Ptidej Team
 
Automated Bug classification using Bayesian probabilistic approach
Automated Bug classification using Bayesian probabilistic approach
Masud Rahman
 
MAHEDI-finalcv_March
MAHEDI-finalcv_March
mahedi masud
 
Improving Neural Abstractive Text Summarization with Prior Knowledge
Improving Neural Abstractive Text Summarization with Prior Knowledge
Gaetano Rossiello, PhD
 
Assignment 1
Assignment 1
Masud Rahman
 

More Related Content

What's hot (18)

Mining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software Artifacts
Preetha Chatterjee
 
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT
IAEME Publication
 
Cohesive Software Design
Cohesive Software Design
ijtsrd
 
Algorithms and Application Programming
Algorithms and Application Programming
ahaleemsl
 
Supporting software documentation with source code summarization
Supporting software documentation with source code summarization
Ra'Fat Al-Msie'deen
 
Extracting Archival-Quality Information from Software-Related Chats
Extracting Archival-Quality Information from Software-Related Chats
Preetha Chatterjee
 
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Preetha Chatterjee
 
Chain indexing
Chain indexing
silambu111
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET Journal
 
Bt9402 artificial intelligence
Bt9402 artificial intelligence
smumbahelp
 
A New Metric for Code Readability
A New Metric for Code Readability
IOSR Journals
 
Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...
Waqas Tariq
 
Survey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi Language
Editor IJCATR
 
Java chapter 3
Java chapter 3
Mukesh Tekwani
 
Cd ch2 - lexical analysis
Cd ch2 - lexical analysis
mengistu23
 
Hindi language as a graphical user interface to relational database for tran...
Hindi language as a graphical user interface to relational database for tran...
IRJET Journal
 
Automatic Traceability
Automatic Traceability
Radoslaw Smilgin
 
130817 latifa guerrouj - context-aware source code vocabulary normalization...
130817 latifa guerrouj - context-aware source code vocabulary normalization...
Ptidej Team
 
Mining Code Examples with Descriptive Text from Software Artifacts
Mining Code Examples with Descriptive Text from Software Artifacts
Preetha Chatterjee
 
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT
IAEME Publication
 
Cohesive Software Design
Cohesive Software Design
ijtsrd
 
Algorithms and Application Programming
Algorithms and Application Programming
ahaleemsl
 
Supporting software documentation with source code summarization
Supporting software documentation with source code summarization
Ra'Fat Al-Msie'deen
 
Extracting Archival-Quality Information from Software-Related Chats
Extracting Archival-Quality Information from Software-Related Chats
Preetha Chatterjee
 
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Preetha Chatterjee
 
Chain indexing
Chain indexing
silambu111
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET Journal
 
Bt9402 artificial intelligence
Bt9402 artificial intelligence
smumbahelp
 
A New Metric for Code Readability
A New Metric for Code Readability
IOSR Journals
 
Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...
Waqas Tariq
 
Survey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi Language
Editor IJCATR
 
Cd ch2 - lexical analysis
Cd ch2 - lexical analysis
mengistu23
 
Hindi language as a graphical user interface to relational database for tran...
Hindi language as a graphical user interface to relational database for tran...
IRJET Journal
 
130817 latifa guerrouj - context-aware source code vocabulary normalization...
130817 latifa guerrouj - context-aware source code vocabulary normalization...
Ptidej Team
 

Viewers also liked (8)

Automated Bug classification using Bayesian probabilistic approach
Automated Bug classification using Bayesian probabilistic approach
Masud Rahman
 
MAHEDI-finalcv_March
MAHEDI-finalcv_March
mahedi masud
 
Improving Neural Abstractive Text Summarization with Prior Knowledge
Improving Neural Abstractive Text Summarization with Prior Knowledge
Gaetano Rossiello, PhD
 
Assignment 1
Assignment 1
Masud Rahman
 
How Sentiment Analysis works
How Sentiment Analysis works
CJ Jenkins
 
Summarizing Tips
Summarizing Tips
Daniela Munca-Aftenev
 
Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)
Kavita Ganesan
 
Best topics for seminar
Best topics for seminar
shilpi nagpal
 
Automated Bug classification using Bayesian probabilistic approach
Automated Bug classification using Bayesian probabilistic approach
Masud Rahman
 
MAHEDI-finalcv_March
MAHEDI-finalcv_March
mahedi masud
 
Improving Neural Abstractive Text Summarization with Prior Knowledge
Improving Neural Abstractive Text Summarization with Prior Knowledge
Gaetano Rossiello, PhD
 
How Sentiment Analysis works
How Sentiment Analysis works
CJ Jenkins
 
Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)
Kavita Ganesan
 
Best topics for seminar
Best topics for seminar
shilpi nagpal
 
Ad

Similar to Supporting program comprehension with source code summarization (20)

A template based algorithm for automatic summarization and dialogue managemen...
A template based algorithm for automatic summarization and dialogue managemen...
eSAT Journals
 
Supporting software documentation with source code summarization
Supporting software documentation with source code summarization
Ra'Fat Al-Msie'deen
 
Review of Topic Modeling and Summarization
Review of Topic Modeling and Summarization
IRJET Journal
 
Automatic Text Summarization: A Critical Review
Automatic Text Summarization: A Critical Review
IRJET Journal
 
Abcxyz
Abcxyz
vacbalolenvadi90
 
Text Analytics for Legal work
Text Analytics for Legal work
AlgoAnalytics Financial Consultancy Pvt. Ltd.
 
Summarization of Software Artifacts : A Review
Summarization of Software Artifacts : A Review
AIRCC Publishing Corporation
 
Summarization of Software Artifacts : A Review
Summarization of Software Artifacts : A Review
AIRCC Publishing Corporation
 
A web based approach: Acronym Definition Extraction
A web based approach: Acronym Definition Extraction
IRJET Journal
 
A Comparative Study of Automatic Text Summarization Methodologies
A Comparative Study of Automatic Text Summarization Methodologies
IRJET Journal
 
TECHNIQUES FOR COMPONENT REUSABLE APPROACH
TECHNIQUES FOR COMPONENT REUSABLE APPROACH
cscpconf
 
Program logic and design
Program logic and design
Chaffey College
 
Automatic Summarization in Chinese Product Reviews
Automatic Summarization in Chinese Product Reviews
TELKOMNIKA JOURNAL
 
Deepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine Learning
IRJET Journal
 
Automatic Text Summarization Using Natural Language Processing (1)
Automatic Text Summarization Using Natural Language Processing (1)
Don Dooley
 
Applications of Generative Artificial intelligence
Applications of Generative Artificial intelligence
DrNBargavi
 
Best Practices for Building Successful LLM Applications
Best Practices for Building Successful LLM Applications
BhavulGauri1
 
Ethnograph 11 Jul07
Ethnograph 11 Jul07
Clara Kwan
 
design-3 software engineering unit three
design-3 software engineering unit three
Devendra Meena
 
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
Zainul Sayed
 
A template based algorithm for automatic summarization and dialogue managemen...
A template based algorithm for automatic summarization and dialogue managemen...
eSAT Journals
 
Supporting software documentation with source code summarization
Supporting software documentation with source code summarization
Ra'Fat Al-Msie'deen
 
Review of Topic Modeling and Summarization
Review of Topic Modeling and Summarization
IRJET Journal
 
Automatic Text Summarization: A Critical Review
Automatic Text Summarization: A Critical Review
IRJET Journal
 
A web based approach: Acronym Definition Extraction
A web based approach: Acronym Definition Extraction
IRJET Journal
 
A Comparative Study of Automatic Text Summarization Methodologies
A Comparative Study of Automatic Text Summarization Methodologies
IRJET Journal
 
TECHNIQUES FOR COMPONENT REUSABLE APPROACH
TECHNIQUES FOR COMPONENT REUSABLE APPROACH
cscpconf
 
Program logic and design
Program logic and design
Chaffey College
 
Automatic Summarization in Chinese Product Reviews
Automatic Summarization in Chinese Product Reviews
TELKOMNIKA JOURNAL
 
Deepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine Learning
IRJET Journal
 
Automatic Text Summarization Using Natural Language Processing (1)
Automatic Text Summarization Using Natural Language Processing (1)
Don Dooley
 
Applications of Generative Artificial intelligence
Applications of Generative Artificial intelligence
DrNBargavi
 
Best Practices for Building Successful LLM Applications
Best Practices for Building Successful LLM Applications
BhavulGauri1
 
Ethnograph 11 Jul07
Ethnograph 11 Jul07
Clara Kwan
 
design-3 software engineering unit three
design-3 software engineering unit three
Devendra Meena
 
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
Zainul Sayed
 
Ad

More from Masud Rahman (20)

HereWeCode 2022: Dalhousie University
HereWeCode 2022: Dalhousie University
Masud Rahman
 
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
Masud Rahman
 
PhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of Saskatchewan
Masud Rahman
 
PhD proposal of Masud Rahman
PhD proposal of Masud Rahman
Masud Rahman
 
PhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud Rahman
Masud Rahman
 
Doctoral Symposium of Masud Rahman
Doctoral Symposium of Masud Rahman
Masud Rahman
 
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Masud Rahman
 
ICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-Localization
Masud Rahman
 
MSR2017-Challenge
MSR2017-Challenge
Masud Rahman
 
MSR2017-RevHelper
MSR2017-RevHelper
Masud Rahman
 
STRICT-SANER2017
STRICT-SANER2017
Masud Rahman
 
MSR2015-Challenge
MSR2015-Challenge
Masud Rahman
 
MSR2014-Challenge
MSR2014-Challenge
Masud Rahman
 
CodeInsight-SCAM2015
CodeInsight-SCAM2015
Masud Rahman
 
STRICT-SANER2015
STRICT-SANER2015
Masud Rahman
 
CMPT-842-BRACK
CMPT-842-BRACK
Masud Rahman
 
RACK-Tool-ICSE2017
RACK-Tool-ICSE2017
Masud Rahman
 
RACK-SANER2016
RACK-SANER2016
Masud Rahman
 
QUICKAR-ASE2016-Singapore
QUICKAR-ASE2016-Singapore
Masud Rahman
 
CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016
Masud Rahman
 
HereWeCode 2022: Dalhousie University
HereWeCode 2022: Dalhousie University
Masud Rahman
 
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
Masud Rahman
 
PhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of Saskatchewan
Masud Rahman
 
PhD proposal of Masud Rahman
PhD proposal of Masud Rahman
Masud Rahman
 
PhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud Rahman
Masud Rahman
 
Doctoral Symposium of Masud Rahman
Doctoral Symposium of Masud Rahman
Masud Rahman
 
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Masud Rahman
 
ICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-Localization
Masud Rahman
 
CodeInsight-SCAM2015
CodeInsight-SCAM2015
Masud Rahman
 
RACK-Tool-ICSE2017
RACK-Tool-ICSE2017
Masud Rahman
 
QUICKAR-ASE2016-Singapore
QUICKAR-ASE2016-Singapore
Masud Rahman
 
CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016
Masud Rahman
 

Recently uploaded (20)

M&A5 Q1 1 differentiate evolving early Philippine conventional and contempora...
M&A5 Q1 1 differentiate evolving early Philippine conventional and contempora...
ErlizaRosete
 
IIT KGP Quiz Week 2024 Sports Quiz (Prelims + Finals)
IIT KGP Quiz Week 2024 Sports Quiz (Prelims + Finals)
IIT Kharagpur Quiz Club
 
THE PSYCHOANALYTIC OF THE BLACK CAT BY EDGAR ALLAN POE (1).pdf
THE PSYCHOANALYTIC OF THE BLACK CAT BY EDGAR ALLAN POE (1).pdf
nabilahk908
 
A Visual Introduction to the Prophet Jeremiah
A Visual Introduction to the Prophet Jeremiah
Steve Thomason
 
2025 June Year 9 Presentation: Subject selection.pptx
2025 June Year 9 Presentation: Subject selection.pptx
mansk2
 
Great Governors' Send-Off Quiz 2025 Prelims IIT KGP
Great Governors' Send-Off Quiz 2025 Prelims IIT KGP
IIT Kharagpur Quiz Club
 
ENGLISH-5 Q1 Lesson 1.pptx - Story Elements
ENGLISH-5 Q1 Lesson 1.pptx - Story Elements
Mayvel Nadal
 
K12 Tableau User Group virtual event June 18, 2025
K12 Tableau User Group virtual event June 18, 2025
dogden2
 
Code Profiling in Odoo 18 - Odoo 18 Slides
Code Profiling in Odoo 18 - Odoo 18 Slides
Celine George
 
Q1_ENGLISH_PPT_WEEK 1 power point grade 3 Quarter 1 week 1
Q1_ENGLISH_PPT_WEEK 1 power point grade 3 Quarter 1 week 1
jutaydeonne
 
Pests of Maize: An comprehensive overview.pptx
Pests of Maize: An comprehensive overview.pptx
Arshad Shaikh
 
Aprendendo Arquitetura Framework Salesforce - Dia 02
Aprendendo Arquitetura Framework Salesforce - Dia 02
Mauricio Alexandre Silva
 
June 2025 Progress Update With Board Call_In process.pptx
June 2025 Progress Update With Board Call_In process.pptx
International Society of Service Innovation Professionals
 
GREAT QUIZ EXCHANGE 2025 - GENERAL QUIZ.pptx
GREAT QUIZ EXCHANGE 2025 - GENERAL QUIZ.pptx
Ronisha Das
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 6-14-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 6-14-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
NSUMD_M1 Library Orientation_June 11, 2025.pptx
NSUMD_M1 Library Orientation_June 11, 2025.pptx
Julie Sarpy
 
Gladiolous Cultivation practices by AKL.pdf
Gladiolous Cultivation practices by AKL.pdf
kushallamichhame
 
How to Manage Different Customer Addresses in Odoo 18 Accounting
How to Manage Different Customer Addresses in Odoo 18 Accounting
Celine George
 
Vitamin and Nutritional Deficiencies.pptx
Vitamin and Nutritional Deficiencies.pptx
Vishal Chanalia
 
ECONOMICS, DISASTER MANAGEMENT, ROAD SAFETY - STUDY MATERIAL [10TH]
ECONOMICS, DISASTER MANAGEMENT, ROAD SAFETY - STUDY MATERIAL [10TH]
SHERAZ AHMAD LONE
 
M&A5 Q1 1 differentiate evolving early Philippine conventional and contempora...
M&A5 Q1 1 differentiate evolving early Philippine conventional and contempora...
ErlizaRosete
 
IIT KGP Quiz Week 2024 Sports Quiz (Prelims + Finals)
IIT KGP Quiz Week 2024 Sports Quiz (Prelims + Finals)
IIT Kharagpur Quiz Club
 
THE PSYCHOANALYTIC OF THE BLACK CAT BY EDGAR ALLAN POE (1).pdf
THE PSYCHOANALYTIC OF THE BLACK CAT BY EDGAR ALLAN POE (1).pdf
nabilahk908
 
A Visual Introduction to the Prophet Jeremiah
A Visual Introduction to the Prophet Jeremiah
Steve Thomason
 
2025 June Year 9 Presentation: Subject selection.pptx
2025 June Year 9 Presentation: Subject selection.pptx
mansk2
 
Great Governors' Send-Off Quiz 2025 Prelims IIT KGP
Great Governors' Send-Off Quiz 2025 Prelims IIT KGP
IIT Kharagpur Quiz Club
 
ENGLISH-5 Q1 Lesson 1.pptx - Story Elements
ENGLISH-5 Q1 Lesson 1.pptx - Story Elements
Mayvel Nadal
 
K12 Tableau User Group virtual event June 18, 2025
K12 Tableau User Group virtual event June 18, 2025
dogden2
 
Code Profiling in Odoo 18 - Odoo 18 Slides
Code Profiling in Odoo 18 - Odoo 18 Slides
Celine George
 
Q1_ENGLISH_PPT_WEEK 1 power point grade 3 Quarter 1 week 1
Q1_ENGLISH_PPT_WEEK 1 power point grade 3 Quarter 1 week 1
jutaydeonne
 
Pests of Maize: An comprehensive overview.pptx
Pests of Maize: An comprehensive overview.pptx
Arshad Shaikh
 
Aprendendo Arquitetura Framework Salesforce - Dia 02
Aprendendo Arquitetura Framework Salesforce - Dia 02
Mauricio Alexandre Silva
 
GREAT QUIZ EXCHANGE 2025 - GENERAL QUIZ.pptx
GREAT QUIZ EXCHANGE 2025 - GENERAL QUIZ.pptx
Ronisha Das
 
NSUMD_M1 Library Orientation_June 11, 2025.pptx
NSUMD_M1 Library Orientation_June 11, 2025.pptx
Julie Sarpy
 
Gladiolous Cultivation practices by AKL.pdf
Gladiolous Cultivation practices by AKL.pdf
kushallamichhame
 
How to Manage Different Customer Addresses in Odoo 18 Accounting
How to Manage Different Customer Addresses in Odoo 18 Accounting
Celine George
 
Vitamin and Nutritional Deficiencies.pptx
Vitamin and Nutritional Deficiencies.pptx
Vishal Chanalia
 
ECONOMICS, DISASTER MANAGEMENT, ROAD SAFETY - STUDY MATERIAL [10TH]
ECONOMICS, DISASTER MANAGEMENT, ROAD SAFETY - STUDY MATERIAL [10TH]
SHERAZ AHMAD LONE
 

Supporting program comprehension with source code summarization

  • 1. Supporting Program Comprehension with Source Code Summarization Sonia Haiduc, Jairo Aponte, Andrian Marcus Presented By: Mohammad Masudur Rahman
  • 2. Contents           2 Why Code Summarization? Thesis Statement Research Questions about summary Research Questions about tool Automatic Code Summarization Evaluation Experiments Conducted Pyramid Method Important Findings My Observation & Future Works
  • 3. Why Code Summarization?  Program comprehension 50% of all maintenance works  Two extreme approaches – skim through and read thoroughly  Skim through – leads to misunderstanding  Read thoroughly – time consuming  An intermediate solution – source code entity with comprehensive textual description 3
  • 4. Thesis Statement  New idea: code summarization to help in program comprehension (PC)  Applying TR methods like Latent Semantic Indexing in source code summarization.  Combining structural information with retrieved code summary to make it effective for realistic purposes. 4
  • 5. Research Questions of Code Summarization  Summary should be automatically generated  Generate summary to different granularity levels – class, method, packages etc  Shorter than the source code  Capture and preserve code semantics and structure – text as well as structure from the code  Consistent structure – important items at first 5
  • 6. Research Questions of Code Summarization  Summary should reflect the developer’s understanding about the code  Tool should allow user to change summary and will remember user’s choice in future summary  Tool should rebuild the summary if the code changes or developer’s provide feedback 6
  • 7. Research Questions about Summarizer Tool       7 Which summarization technique works the best for source code? What type of structural info necessary in summary? Will the summary be different for different type of maintenance task? How long it would be? How much will it resemble to actual summary? How do developers generate summary?
  • 8. Automatic Code Summarization  Generate extractive summary – the most important info extracted from the document 8
  • 9. Automatic Code Summarization  Two types info extracted – lexical and structural  Lexical info – identifiers and comments are extracted  Common English and PL keywords are removed  Identifiers are split into constituent words and stemming performed. 9
  • 10. Automatic Code Summarization  Extracted lexical info forms the text corpus of code where TR methods (e.g. LSI) used to get most important n words.  Once retrieved, n words are combined with structural info like their class name, method name, package name, parameter name and type etc  How to apply structural info to autogenerated summary is an important part 10
  • 11. Automatic Code Summarization A method name reflects the description of what it does.  If method name ignored by TR, the tool can introduce it automatically  Additional info can be added like –user tags 11
  • 12. Evaluation     12 Two types – intrinsic and extrinsic Intrinsic – content evaluation, how closely it depicts the document or how close to manually generated summary Metrics- precision, recall, pyramid method Extrinsic – how much utility and usability it has to support SE tasks – concept location, impact analysis, software reuse, traceability links recovery etc
  • 13. Experiments Conducted  Pyramid method  ATunes OS project, 12 methods  6 developers from different demographic locations, undergraduate students, 3 years Java programming experiences  Developers provided with a list of terms, they need to choose 5 terms for each method that suits best, 60 minutes total time 13
  • 14. Experiments Conducted  Corpus containing whole code vocabulary  Each method is a different document  LSI indexing the corpus against each method terms  Cosine measure between corpus and method and corpus words are ranked  Top 5 words from corpus are chosen 14
  • 15. Pyramid method  Pyramid score = (Sum of A’s score / Total score A could make) 15
  • 17. Important Findings      17 Pyramid score >=.1 and <=.5, marked it encouraging Words chosen by developers – 98.7% in method name, 88.9% in class name and 84.6% in parameter name Automatic summary terms – 20% in method name, 12.9% in class name and 30.7% in parameter name Structural info should be considered properly in automatic summary Comments text not included in summary
  • 18. My Observation &Future Works     18 The corpus development technique is not well specified- no specification about redundancy protection LSI focuses on term frequency rather than structural info which produces bad scores. During cosine measurement structural info of term in the method could be considered to get better results There should have some heuristic measurement for structural info.