SlideShare a Scribd company logo
CHANGES and BUGS
Mining and Predicting Software Development Activities




                  Thomas Zimmermann
Software development



         Build
Collaboration




Comm.      Version      Bug
Archive    Archive    Database


  Mining Software Archives
MY THESIS                                                             .
additions analysis architecture archives aspects   bug cached calls
changes collaboration complexities component concerns cross-
cutting cvs data defects design development drawing dynamine
eclipse effort evolves failures fine-grained fix fix-inducing
graphs     hatari   history locate matching method mining
predicting program programmers report repositories
revision software support system taking transactions
version visualizing
Contributions of the thesis

Fine-grained analysis of version archives.              1
Project-specific usage patterns of methods (FSE 2005)
Identification of cross-cutting changes (ASE 2006)



Mining bug databases to predict defects.                2
Dependencies predict defects (ISSRE 2007, ICSE 2008)
Domino effect: depending on defect-prone binaries increases
the chances of having defects (Software Evolution 2008).
Fine-grained analysis

public void createPartControl(Composite parent) {
    ...
    // add listener for editor page activation
    getSite().getPage().addPartListener(partListener);
}

public void dispose() {
    ...
    getSite().getPage().removePartListener(partListener);
}
Fine-grained analysis

public void createPartControl(Composite parent) {
    ...
    // add listener for editor page activation
    getSite().getPage().addPartListener(partListener);
}

public void dispose() {         co-added
    ...
    getSite().getPage().removePartListener(partListener);
}
Fine-grained analysis

public void createPartControl(Composite parent) {
    ...                                                      close
    // add listener for editor page activation      open
    getSite().getPage().addPartListener(partListener);      println
}

public void dispose() {          co-added
    ...
    getSite().getPage().removePartListener(partListener);
}                                                             begin




           Co-added items = patterns
Fine-grained analysis
                  public static final native void _XFree(int address);
                  public static final void XFree(int /*long*/ address) {
                        lock.lock();
                        try {
                              _XFree(address);
                        } finally {
                              lock.unlock();
                        }
                  }

                                  D IN
                              N GE I O N S
                          CHA CAT
                         1284 LO


Crosscutting changes = aspect candidates
Contributions of the thesis

Fine-grained analysis of version archives.              1
Project-specific usage patterns of methods (FSE 2005)
Identification of cross-cutting changes (ASE 2006)



Mining bug databases to predict defects.                2
Dependencies predict defects (ISSRE 2007, ICSE 2008)
Domino effect: depending on defect-prone binaries increases
the chances of having defects (Software Evolution 2008).
Bugs! Bugs! Bugs!
Quality assurance is limited...

   ...by time...   ...and by money.
Spent resources on the
components that need it most,
  i.e., are most likely to fail.
Indicators of defects

Code complexity              Code churn
Complex Code is more         Changes are likely to
prone to defects.            introduce new defects.



History                      Dependencies
Code with past defects is    Using compiler packages
more likely to have future   is more difficult than using
defects,                     packages for UI.
2252 Binaries
28.3 MLOC
Hypotheses

Complexity of dependency graphs                             Sub
                                                          system
correlates with the number of post-release defects (H1)    level
can predict the number of post-release defects (H2)



Network measures on dependency graphs                     Binary
correlate with the number of post-release defects (H3)     level

can predict the number of post-release defects (H4)
can indicate critical “escrow” binaries (H5)
DATA.   .
Data collection
                      six months
 Release point for
                       to collect
Windows Server 2003
                        defects



  Dependencies

Network Measures

Complexity Metrics     Defects
Centrality




Degree                         Closeness                           Betweenness
Blue binary has dependencies   Blue binary is close to all other   Blue binary connects the left
to many other binaries         binaries (only two steps)           with the right graph (bridge)
Centrality
• Degreethe number dependencies
          centrality
   -
   counts

• Closeness centrality binaries into account
   -
   takes distance to all other
   - Closeness: How close are the other binaries?
   - Reach: How many binaries can be reached (weighted)?
   - Eigenvector: similar to Pagerank
• Betweenness centrality paths through a binary
   -
   counts the number of shortest
Complexity metrics
Group                  Metrics                                 Aggregation
Module metrics         # functions in B
for a binary B         # global variables in B
                       # executable lines in f()
                       # parameters in f()
Per-function metrics                                              Total
                       # functions calling f()
for a function f()                                                Max
                       # functions called by f()
                       McCabe’s cyclomatic complexity of f()
                       # methods in C
                       # subclasses of C
OO metrics                                                        Total
                       Depth of C in the inheritance tree
for a class C                                                     Max
                       Coupling between classes
                       Cyclic coupling between classes
RESULTS.   .
Prediction


Input metrics and measures   Model        Prediction
                               PCA
                             Regression
  Metrics                                     Classification
                 SNA

 Metrics+SNA                                   Ranking
Classification


Has a binary a defect or not?




            or
Ranking


Which binaries have the most defects?




    or                or ... or
Random splits




4×50×
Classification
 (logistic regression)
Classification
            (logistic regression)




SNA increases the recall by 0.10 (at p=0.01)
  while precision remains comparable.
Ranking
          (linear regression)




SNA+METRICS increases the correlation
    by 0.10 (significant at p=0.01)
FUTURE WORK                                                        .
                                         bug cached calls
                          bug changes collaboration
additions analysis architecture archives aspects
analysis archives aspects
changes collaboration complexities component concerns cross-
complexities component concerns cross-cutting cvs data defects
cutting cvs data defects design development drawing dynamine
design development drawing eclipse erose evolves factor
eclipse effort evolvesfix-inducing fine-grained fix fix-inducing
failures fine-grained fix
                          failures
                                   fm graphs guide hatari
graphs hatari history locate matching method mining
history human matching mining networking
predicting program programmers report repositories
predicting program programmers system report repositories
revision software support
                               quality
                                        taking transactions
revision social software support system taking version
version visualizing
"Piled Higher and Deeper" by Jorge Cham. www.phdcomics.com
"Piled Higher and Deeper" by Jorge Cham. www.phdcomics.com
"Piled Higher and Deeper" by Jorge Cham. www.phdcomics.com
Contributions of the thesis

Fine-grained analysis of version archives.              1
Project-specific usage patterns of methods (FSE 2005)
Identification of cross-cutting changes (ASE 2006)



Mining bug databases to predict defects.                2
Dependencies predict defects (ISSRE 2007, ICSE 2008)
Domino effect: depending on defect-prone binaries increases
the chances of having defects (Software Evolution 2008).
Ad

Recommended

Social Network Analysis
Social Network Analysis
Giorgos Cheliotis
 
Elgamal Digital Signature
Elgamal Digital Signature
Sou Jana
 
lazy learners and other classication methods
lazy learners and other classication methods
rajshreemuthiah
 
Botnets
Botnets
Kavisha Miyan
 
Multichannel User Interfaces
Multichannel User Interfaces
Pedro J. Molina
 
라즈베리파이를 이용한 GPS Data 가시화
라즈베리파이를 이용한 GPS Data 가시화
Ju Young Lee
 
Authentication Protocols
Authentication Protocols
Trinity Dwarka
 
Hash Function
Hash Function
ssuserdfb2da
 
K means Clustering Algorithm
K means Clustering Algorithm
Kasun Ranga Wijeweera
 
Network measures used in social network analysis
Network measures used in social network analysis
Dragan Gasevic
 
gSpan algorithm
gSpan algorithm
Sadik Mussah
 
Data mining in social network
Data mining in social network
akash_mishra
 
Association Analysis
Association Analysis
guest0edcaf
 
Elgamal & schnorr digital signature scheme copy
Elgamal & schnorr digital signature scheme copy
North Cap University (NCU) Formely ITM University
 
Computer Aided Software Engineering
Computer Aided Software Engineering
ČhauÐhařÿ Faísal Ãlï
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
Pramit Choudhary
 
Trends in DM.pptx
Trends in DM.pptx
ImXaib
 
Switching concepts Data communication and networks
Switching concepts Data communication and networks
Nt Arvind
 
rule-based classifier
rule-based classifier
Sean Chiu
 
K means clustering
K means clustering
keshav goyal
 
X.509 Certificates
X.509 Certificates
Sou Jana
 
Association Analysis in Data Mining
Association Analysis in Data Mining
Kamal Acharya
 
Kerberos : An Authentication Application
Kerberos : An Authentication Application
Vidulatiwari
 
Optimistic concurrency control in Distributed Systems
Optimistic concurrency control in Distributed Systems
mridul mishra
 
CSS (KNC-301) 4. Packet Filtering Firewall By Vivek Tripathi.pptx
CSS (KNC-301) 4. Packet Filtering Firewall By Vivek Tripathi.pptx
VivekTripathi684438
 
Datamining data visualization
Datamining data visualization
Asterite
 
block ciphers
block ciphers
Asad Ali
 
Intruders in cns. Various intrusion detection and prevention technique.pptx
Intruders in cns. Various intrusion detection and prevention technique.pptx
SriK49
 
Changes and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development Activities
Thomas Zimmermann
 
Predicting Defects using Network Analysis on Dependency Graphs
Predicting Defects using Network Analysis on Dependency Graphs
Thomas Zimmermann
 

More Related Content

What's hot (20)

K means Clustering Algorithm
K means Clustering Algorithm
Kasun Ranga Wijeweera
 
Network measures used in social network analysis
Network measures used in social network analysis
Dragan Gasevic
 
gSpan algorithm
gSpan algorithm
Sadik Mussah
 
Data mining in social network
Data mining in social network
akash_mishra
 
Association Analysis
Association Analysis
guest0edcaf
 
Elgamal & schnorr digital signature scheme copy
Elgamal & schnorr digital signature scheme copy
North Cap University (NCU) Formely ITM University
 
Computer Aided Software Engineering
Computer Aided Software Engineering
ČhauÐhařÿ Faísal Ãlï
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
Pramit Choudhary
 
Trends in DM.pptx
Trends in DM.pptx
ImXaib
 
Switching concepts Data communication and networks
Switching concepts Data communication and networks
Nt Arvind
 
rule-based classifier
rule-based classifier
Sean Chiu
 
K means clustering
K means clustering
keshav goyal
 
X.509 Certificates
X.509 Certificates
Sou Jana
 
Association Analysis in Data Mining
Association Analysis in Data Mining
Kamal Acharya
 
Kerberos : An Authentication Application
Kerberos : An Authentication Application
Vidulatiwari
 
Optimistic concurrency control in Distributed Systems
Optimistic concurrency control in Distributed Systems
mridul mishra
 
CSS (KNC-301) 4. Packet Filtering Firewall By Vivek Tripathi.pptx
CSS (KNC-301) 4. Packet Filtering Firewall By Vivek Tripathi.pptx
VivekTripathi684438
 
Datamining data visualization
Datamining data visualization
Asterite
 
block ciphers
block ciphers
Asad Ali
 
Intruders in cns. Various intrusion detection and prevention technique.pptx
Intruders in cns. Various intrusion detection and prevention technique.pptx
SriK49
 
Network measures used in social network analysis
Network measures used in social network analysis
Dragan Gasevic
 
Data mining in social network
Data mining in social network
akash_mishra
 
Association Analysis
Association Analysis
guest0edcaf
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
Pramit Choudhary
 
Trends in DM.pptx
Trends in DM.pptx
ImXaib
 
Switching concepts Data communication and networks
Switching concepts Data communication and networks
Nt Arvind
 
rule-based classifier
rule-based classifier
Sean Chiu
 
K means clustering
K means clustering
keshav goyal
 
X.509 Certificates
X.509 Certificates
Sou Jana
 
Association Analysis in Data Mining
Association Analysis in Data Mining
Kamal Acharya
 
Kerberos : An Authentication Application
Kerberos : An Authentication Application
Vidulatiwari
 
Optimistic concurrency control in Distributed Systems
Optimistic concurrency control in Distributed Systems
mridul mishra
 
CSS (KNC-301) 4. Packet Filtering Firewall By Vivek Tripathi.pptx
CSS (KNC-301) 4. Packet Filtering Firewall By Vivek Tripathi.pptx
VivekTripathi684438
 
Datamining data visualization
Datamining data visualization
Asterite
 
block ciphers
block ciphers
Asad Ali
 
Intruders in cns. Various intrusion detection and prevention technique.pptx
Intruders in cns. Various intrusion detection and prevention technique.pptx
SriK49
 

Similar to Changes and Bugs: Mining and Predicting Development Activities (20)

Changes and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development Activities
Thomas Zimmermann
 
Predicting Defects using Network Analysis on Dependency Graphs
Predicting Defects using Network Analysis on Dependency Graphs
Thomas Zimmermann
 
A tale of bug prediction in software development
A tale of bug prediction in software development
Martin Pinzger
 
VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assis...
VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assis...
Stefano Dalla Palma
 
Measuring Your Code
Measuring Your Code
Nate Abele
 
Measuring Your Code 2.0
Measuring Your Code 2.0
Nate Abele
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lecture
Masud Rahman
 
Measuring maintainability; software metrics explained
Measuring maintainability; software metrics explained
Dennis de Greef
 
Of Bugs and Men (and Plugins too)
Of Bugs and Men (and Plugins too)
Michel Wermelinger
 
Of Bugs and Men
Of Bugs and Men
Michel Wermelinger
 
CSMR06a.ppt
CSMR06a.ppt
Ptidej Team
 
MSR Asia Summit
MSR Asia Summit
Ptidej Team
 
2014 01-ticosa
2014 01-ticosa
Pharo
 
Predicting Fault-Prone Files using Machine Learning
Predicting Fault-Prone Files using Machine Learning
Guido A. Ciollaro
 
Software Architecture - Quiz Questions
Software Architecture - Quiz Questions
Ganesh Samarthyam
 
Software Architecture - Quiz Questions
Software Architecture - Quiz Questions
CodeOps Technologies LLP
 
Linq To The Enterprise
Linq To The Enterprise
Daniel Egan
 
Bayesian network based software reliability prediction
Bayesian network based software reliability prediction
JULIO GONZALEZ SANZ
 
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Martin Pinzger
 
Dependability Benchmarking by Injecting Software Bugs
Dependability Benchmarking by Injecting Software Bugs
Roberto Natella
 
Changes and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development Activities
Thomas Zimmermann
 
Predicting Defects using Network Analysis on Dependency Graphs
Predicting Defects using Network Analysis on Dependency Graphs
Thomas Zimmermann
 
A tale of bug prediction in software development
A tale of bug prediction in software development
Martin Pinzger
 
VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assis...
VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assis...
Stefano Dalla Palma
 
Measuring Your Code
Measuring Your Code
Nate Abele
 
Measuring Your Code 2.0
Measuring Your Code 2.0
Nate Abele
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lecture
Masud Rahman
 
Measuring maintainability; software metrics explained
Measuring maintainability; software metrics explained
Dennis de Greef
 
Of Bugs and Men (and Plugins too)
Of Bugs and Men (and Plugins too)
Michel Wermelinger
 
2014 01-ticosa
2014 01-ticosa
Pharo
 
Predicting Fault-Prone Files using Machine Learning
Predicting Fault-Prone Files using Machine Learning
Guido A. Ciollaro
 
Software Architecture - Quiz Questions
Software Architecture - Quiz Questions
Ganesh Samarthyam
 
Linq To The Enterprise
Linq To The Enterprise
Daniel Egan
 
Bayesian network based software reliability prediction
Bayesian network based software reliability prediction
JULIO GONZALEZ SANZ
 
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Martin Pinzger
 
Dependability Benchmarking by Injecting Software Bugs
Dependability Benchmarking by Injecting Software Bugs
Roberto Natella
 
Ad

More from Thomas Zimmermann (20)

Software Analytics = Sharing Information
Software Analytics = Sharing Information
Thomas Zimmermann
 
MSR 2013 Preview
MSR 2013 Preview
Thomas Zimmermann
 
Predicting Method Crashes with Bytecode Operations
Predicting Method Crashes with Bytecode Operations
Thomas Zimmermann
 
Analytics for smarter software development
Analytics for smarter software development
Thomas Zimmermann
 
Characterizing and Predicting Which Bugs Get Reopened
Characterizing and Predicting Which Bugs Get Reopened
Thomas Zimmermann
 
Klingon Countdown Timer
Klingon Countdown Timer
Thomas Zimmermann
 
Data driven games user research
Data driven games user research
Thomas Zimmermann
 
Not my bug! Reasons for software bug report reassignments
Not my bug! Reasons for software bug report reassignments
Thomas Zimmermann
 
Empirical Software Engineering at Microsoft Research
Empirical Software Engineering at Microsoft Research
Thomas Zimmermann
 
Security trend analysis with CVE topic models
Security trend analysis with CVE topic models
Thomas Zimmermann
 
Analytics for software development
Analytics for software development
Thomas Zimmermann
 
Characterizing and predicting which bugs get fixed
Characterizing and predicting which bugs get fixed
Thomas Zimmermann
 
Cross-project defect prediction
Cross-project defect prediction
Thomas Zimmermann
 
Quality of Bug Reports in Open Source
Quality of Bug Reports in Open Source
Thomas Zimmermann
 
Meet Tom and his Fish
Meet Tom and his Fish
Thomas Zimmermann
 
Predicting Subsystem Defects using Dependency Graph Complexities
Predicting Subsystem Defects using Dependency Graph Complexities
Thomas Zimmermann
 
Got Myth? Myths in Software Engineering
Got Myth? Myths in Software Engineering
Thomas Zimmermann
 
Mining Workspace Updates in CVS
Mining Workspace Updates in CVS
Thomas Zimmermann
 
Mining Software Archives to Support Software Development
Mining Software Archives to Support Software Development
Thomas Zimmermann
 
Unit testing with JUnit
Unit testing with JUnit
Thomas Zimmermann
 
Software Analytics = Sharing Information
Software Analytics = Sharing Information
Thomas Zimmermann
 
Predicting Method Crashes with Bytecode Operations
Predicting Method Crashes with Bytecode Operations
Thomas Zimmermann
 
Analytics for smarter software development
Analytics for smarter software development
Thomas Zimmermann
 
Characterizing and Predicting Which Bugs Get Reopened
Characterizing and Predicting Which Bugs Get Reopened
Thomas Zimmermann
 
Data driven games user research
Data driven games user research
Thomas Zimmermann
 
Not my bug! Reasons for software bug report reassignments
Not my bug! Reasons for software bug report reassignments
Thomas Zimmermann
 
Empirical Software Engineering at Microsoft Research
Empirical Software Engineering at Microsoft Research
Thomas Zimmermann
 
Security trend analysis with CVE topic models
Security trend analysis with CVE topic models
Thomas Zimmermann
 
Analytics for software development
Analytics for software development
Thomas Zimmermann
 
Characterizing and predicting which bugs get fixed
Characterizing and predicting which bugs get fixed
Thomas Zimmermann
 
Cross-project defect prediction
Cross-project defect prediction
Thomas Zimmermann
 
Quality of Bug Reports in Open Source
Quality of Bug Reports in Open Source
Thomas Zimmermann
 
Predicting Subsystem Defects using Dependency Graph Complexities
Predicting Subsystem Defects using Dependency Graph Complexities
Thomas Zimmermann
 
Got Myth? Myths in Software Engineering
Got Myth? Myths in Software Engineering
Thomas Zimmermann
 
Mining Workspace Updates in CVS
Mining Workspace Updates in CVS
Thomas Zimmermann
 
Mining Software Archives to Support Software Development
Mining Software Archives to Support Software Development
Thomas Zimmermann
 
Ad

Recently uploaded (20)

9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
 
Python Conference Singapore - 19 Jun 2025
Python Conference Singapore - 19 Jun 2025
ninefyi
 
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
Fwdays
 
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
Fwdays
 
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Safe Software
 
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
yosra Saidani
 
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
 
AI VIDEO MAGAZINE - June 2025 - r/aivideo
AI VIDEO MAGAZINE - June 2025 - r/aivideo
1pcity Studios, Inc
 
Cyber Defense Matrix Workshop - RSA Conference
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
 
Curietech AI in action - Accelerate MuleSoft development
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
Quantum AI: Where Impossible Becomes Probable
Quantum AI: Where Impossible Becomes Probable
Saikat Basu
 
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
digitaljignect
 
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
Safe Software
 
Connecting Data and Intelligence: The Role of FME in Machine Learning
Connecting Data and Intelligence: The Role of FME in Machine Learning
Safe Software
 
10 Key Challenges for AI within the EU Data Protection Framework.pdf
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
"Scaling in space and time with Temporal", Andriy Lupa.pdf
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
 
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
 
Python Conference Singapore - 19 Jun 2025
Python Conference Singapore - 19 Jun 2025
ninefyi
 
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
Fwdays
 
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
Fwdays
 
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Safe Software
 
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
yosra Saidani
 
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
 
AI VIDEO MAGAZINE - June 2025 - r/aivideo
AI VIDEO MAGAZINE - June 2025 - r/aivideo
1pcity Studios, Inc
 
Cyber Defense Matrix Workshop - RSA Conference
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
 
Curietech AI in action - Accelerate MuleSoft development
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
Quantum AI: Where Impossible Becomes Probable
Quantum AI: Where Impossible Becomes Probable
Saikat Basu
 
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
digitaljignect
 
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
Safe Software
 
Connecting Data and Intelligence: The Role of FME in Machine Learning
Connecting Data and Intelligence: The Role of FME in Machine Learning
Safe Software
 
10 Key Challenges for AI within the EU Data Protection Framework.pdf
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
"Scaling in space and time with Temporal", Andriy Lupa.pdf
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
 

Changes and Bugs: Mining and Predicting Development Activities

  • 1. CHANGES and BUGS Mining and Predicting Software Development Activities Thomas Zimmermann
  • 3. Collaboration Comm. Version Bug Archive Archive Database Mining Software Archives
  • 4. MY THESIS . additions analysis architecture archives aspects bug cached calls changes collaboration complexities component concerns cross- cutting cvs data defects design development drawing dynamine eclipse effort evolves failures fine-grained fix fix-inducing graphs hatari history locate matching method mining predicting program programmers report repositories revision software support system taking transactions version visualizing
  • 5. Contributions of the thesis Fine-grained analysis of version archives. 1 Project-specific usage patterns of methods (FSE 2005) Identification of cross-cutting changes (ASE 2006) Mining bug databases to predict defects. 2 Dependencies predict defects (ISSRE 2007, ICSE 2008) Domino effect: depending on defect-prone binaries increases the chances of having defects (Software Evolution 2008).
  • 6. Fine-grained analysis public void createPartControl(Composite parent) { ... // add listener for editor page activation getSite().getPage().addPartListener(partListener); } public void dispose() { ... getSite().getPage().removePartListener(partListener); }
  • 7. Fine-grained analysis public void createPartControl(Composite parent) { ... // add listener for editor page activation getSite().getPage().addPartListener(partListener); } public void dispose() { co-added ... getSite().getPage().removePartListener(partListener); }
  • 8. Fine-grained analysis public void createPartControl(Composite parent) { ... close // add listener for editor page activation open getSite().getPage().addPartListener(partListener); println } public void dispose() { co-added ... getSite().getPage().removePartListener(partListener); } begin Co-added items = patterns
  • 9. Fine-grained analysis public static final native void _XFree(int address); public static final void XFree(int /*long*/ address) { lock.lock(); try { _XFree(address); } finally { lock.unlock(); } } D IN N GE I O N S CHA CAT 1284 LO Crosscutting changes = aspect candidates
  • 10. Contributions of the thesis Fine-grained analysis of version archives. 1 Project-specific usage patterns of methods (FSE 2005) Identification of cross-cutting changes (ASE 2006) Mining bug databases to predict defects. 2 Dependencies predict defects (ISSRE 2007, ICSE 2008) Domino effect: depending on defect-prone binaries increases the chances of having defects (Software Evolution 2008).
  • 12. Quality assurance is limited... ...by time... ...and by money.
  • 13. Spent resources on the components that need it most, i.e., are most likely to fail.
  • 14. Indicators of defects Code complexity Code churn Complex Code is more Changes are likely to prone to defects. introduce new defects. History Dependencies Code with past defects is Using compiler packages more likely to have future is more difficult than using defects, packages for UI.
  • 16. Hypotheses Complexity of dependency graphs Sub system correlates with the number of post-release defects (H1) level can predict the number of post-release defects (H2) Network measures on dependency graphs Binary correlate with the number of post-release defects (H3) level can predict the number of post-release defects (H4) can indicate critical “escrow” binaries (H5)
  • 17. DATA. .
  • 18. Data collection six months Release point for to collect Windows Server 2003 defects Dependencies Network Measures Complexity Metrics Defects
  • 19. Centrality Degree Closeness Betweenness Blue binary has dependencies Blue binary is close to all other Blue binary connects the left to many other binaries binaries (only two steps) with the right graph (bridge)
  • 20. Centrality • Degreethe number dependencies centrality - counts • Closeness centrality binaries into account - takes distance to all other - Closeness: How close are the other binaries? - Reach: How many binaries can be reached (weighted)? - Eigenvector: similar to Pagerank • Betweenness centrality paths through a binary - counts the number of shortest
  • 21. Complexity metrics Group Metrics Aggregation Module metrics # functions in B for a binary B # global variables in B # executable lines in f() # parameters in f() Per-function metrics Total # functions calling f() for a function f() Max # functions called by f() McCabe’s cyclomatic complexity of f() # methods in C # subclasses of C OO metrics Total Depth of C in the inheritance tree for a class C Max Coupling between classes Cyclic coupling between classes
  • 22. RESULTS. .
  • 23. Prediction Input metrics and measures Model Prediction PCA Regression Metrics Classification SNA Metrics+SNA Ranking
  • 24. Classification Has a binary a defect or not? or
  • 25. Ranking Which binaries have the most defects? or or ... or
  • 28. Classification (logistic regression) SNA increases the recall by 0.10 (at p=0.01) while precision remains comparable.
  • 29. Ranking (linear regression) SNA+METRICS increases the correlation by 0.10 (significant at p=0.01)
  • 30. FUTURE WORK . bug cached calls bug changes collaboration additions analysis architecture archives aspects analysis archives aspects changes collaboration complexities component concerns cross- complexities component concerns cross-cutting cvs data defects cutting cvs data defects design development drawing dynamine design development drawing eclipse erose evolves factor eclipse effort evolvesfix-inducing fine-grained fix fix-inducing failures fine-grained fix failures fm graphs guide hatari graphs hatari history locate matching method mining history human matching mining networking predicting program programmers report repositories predicting program programmers system report repositories revision software support quality taking transactions revision social software support system taking version version visualizing
  • 31. "Piled Higher and Deeper" by Jorge Cham. www.phdcomics.com
  • 32. "Piled Higher and Deeper" by Jorge Cham. www.phdcomics.com
  • 33. "Piled Higher and Deeper" by Jorge Cham. www.phdcomics.com
  • 34. Contributions of the thesis Fine-grained analysis of version archives. 1 Project-specific usage patterns of methods (FSE 2005) Identification of cross-cutting changes (ASE 2006) Mining bug databases to predict defects. 2 Dependencies predict defects (ISSRE 2007, ICSE 2008) Domino effect: depending on defect-prone binaries increases the chances of having defects (Software Evolution 2008).