SlideShare a Scribd company logo
Towards a usable
defect prediction tool
Vladimir Kovalenko
JetBrains
Crossbreeding machine learning and heuristics
Defect Prediction
• Common goal: identify defect-prone entities in
advance
• Why?
• QA
• Resource allocation (testing, review, etc)
Previous research
• Academia
• Microsoft Research
• Google case study
Research papers
(common points)
• ML defect prediction models work in general
• No universal model: projects are too different
• Code metrics as features improve prediction quality
• Typical precision/recall ~0.7
Google case study
• Collaborated with researchers to introduce defect
prediction in internal code review system
• Came up with a heuristic model
Google Bug Prediction Score (Time Weighted Risk)
Tools
• No defect prediction tools known to be used in
industry
• Why?
• Too low accuracy
• Too much effort to set up
Tool usability criteria
• Language independent
• “entity” >= file
• Little or no effort to set up
• no plain supervised learning
• Near real-time
• Easy to use: VCS agnostic, etc
• Accurate!
Implementation
• CI server plugin
• Only use VCS metrics
• Automatic bugfix changes detection (heuristics)
• Processing: detect bug-introducing changes
• ML classifier: Naive Bayes / Decision Tree
• Take prediction top, not absolute values
• Automatic quality evaluation
Features
• Local change frequency
• Number of authors
• File age
• Number of affecting commits
• Google Score
Quality Evaluation
• Bug tracker integration: find bugfix changes
• Quality metric: fraction of files from model
predictions affected by bugfix changes in the future
Result samples
Project A, 2 years
Project B, 1 year
Conclusions
• It is possible combine learning and heuristic
approaches to get the best of two worlds
• The accuracy is still not good enough
• No wonder no prediction tools are widely used yet
Thank you!
vladimir.kovalenko@jetbrains.com

More Related Content

PDF
Model Driven Developing & Model Based Checking: Applying Together
PDF
TMPA-2015: The Application of Parameterized Hierarchy Templates for Automated...
PDF
TMPA-2017: Regression Testing with Semiautomatic Test Selection for Auditing ...
PDF
Systematic Evaluation of the Unsoundness of Call Graph Algorithms for Java
PPT
Beyond Static Analysis: Integrating .NET Static Analysis with Unit Testing a...
PPTX
Part1 my
PDF
Random testing
PPTX
Introduction to Python Programming
Model Driven Developing & Model Based Checking: Applying Together
TMPA-2015: The Application of Parameterized Hierarchy Templates for Automated...
TMPA-2017: Regression Testing with Semiautomatic Test Selection for Auditing ...
Systematic Evaluation of the Unsoundness of Call Graph Algorithms for Java
Beyond Static Analysis: Integrating .NET Static Analysis with Unit Testing a...
Part1 my
Random testing
Introduction to Python Programming

What's hot (19)

PPTX
Flow control in Python
PDF
Ivan Pashko - Simplifying test automation with design patterns
PPT
Black box and white box testing
DOC
Black box testing
PPTX
Blackbox
PDF
Rachid kherrazi-testing-asd-interface-compliance-with-asd spec
PDF
Object-Centric Debugging for Pharo 8
PPTX
Black box testing
PPTX
Test cases for effective testing - part 1
PPTX
5 black box and grey box testing
PDF
Unit testing in Force.com platform
PPTX
What not to study for the apcs exam, and other exam tips
PDF
Automatic Test Case Generation
PPT
White Box Testing V0.2
PPT
Introduction to automated quality assurance
PPTX
MTA understanding java script and coding essentials
PPTX
Compiler lecture 07
PPT
9781111530532 ppt ch07_passing_primitivetypeasobjects
PDF
150412 38 beamer methods of binary analysis
Flow control in Python
Ivan Pashko - Simplifying test automation with design patterns
Black box and white box testing
Black box testing
Blackbox
Rachid kherrazi-testing-asd-interface-compliance-with-asd spec
Object-Centric Debugging for Pharo 8
Black box testing
Test cases for effective testing - part 1
5 black box and grey box testing
Unit testing in Force.com platform
What not to study for the apcs exam, and other exam tips
Automatic Test Case Generation
White Box Testing V0.2
Introduction to automated quality assurance
MTA understanding java script and coding essentials
Compiler lecture 07
9781111530532 ppt ch07_passing_primitivetypeasobjects
150412 38 beamer methods of binary analysis
Ad

Viewers also liked (20)

PDF
TMPA-2015: Multi-Module Application Tracing in z/OS Environment
PDF
TMPA-2015: Automated Testing of Multi-thread Data Structures Solutions Lineri...
PDF
TMPA-2015: Formal Methods in Robotics
PPT
TMPA-2015: Information Support System for Autonomous Spacecraft Control Macro...
PPTX
TMPA-2015: Generation of Test Scenarios for Non Deterministic and Concurrent ...
PDF
TMPA-2015: Multi-Platform Approach to Reverse Debugging of Virtual Machines
PPT
TMPA-2015: FPGA-Based Low Latency Sponsored Access
PDF
TMPA-2015: Kotlin: From Null Dereference to Smart Casts
PDF
TMPA-2015: Expanding the Meta-Generation of Correctness Conditions by Means o...
PDF
TMPA-2015: A Need To Specify and Verify Standard Functions
PPTX
TMPA-2015: Automated process of creating test scenarios for financial protoco...
PPTX
TMPA-2015: Standards and Standartization in Program Engineering. Why Would Yo...
PPT
TMPA-2015: ClearTH: a Tool for Automated Testing of Post Trade Systems
PDF
TMPA-2015: Lexical analysis of dynamically formed string expressions
PDF
TMPA-2015: The Verification of Functional Programs by Applying Statechart Dia...
PDF
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
PDF
TMPA-2015: Software Engineering Education: The Messir Approach
PPTX
TMPA-2015: The Application of Static Analysis to Optimize the Dynamic Detecti...
PDF
TMPA-2015: The dynamic Analysis of Executable Code in ELF Format Based on Sta...
PDF
TMPA-2017: The Quest for Average Response Time
TMPA-2015: Multi-Module Application Tracing in z/OS Environment
TMPA-2015: Automated Testing of Multi-thread Data Structures Solutions Lineri...
TMPA-2015: Formal Methods in Robotics
TMPA-2015: Information Support System for Autonomous Spacecraft Control Macro...
TMPA-2015: Generation of Test Scenarios for Non Deterministic and Concurrent ...
TMPA-2015: Multi-Platform Approach to Reverse Debugging of Virtual Machines
TMPA-2015: FPGA-Based Low Latency Sponsored Access
TMPA-2015: Kotlin: From Null Dereference to Smart Casts
TMPA-2015: Expanding the Meta-Generation of Correctness Conditions by Means o...
TMPA-2015: A Need To Specify and Verify Standard Functions
TMPA-2015: Automated process of creating test scenarios for financial protoco...
TMPA-2015: Standards and Standartization in Program Engineering. Why Would Yo...
TMPA-2015: ClearTH: a Tool for Automated Testing of Post Trade Systems
TMPA-2015: Lexical analysis of dynamically formed string expressions
TMPA-2015: The Verification of Functional Programs by Applying Statechart Dia...
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Software Engineering Education: The Messir Approach
TMPA-2015: The Application of Static Analysis to Optimize the Dynamic Detecti...
TMPA-2015: The dynamic Analysis of Executable Code in ELF Format Based on Sta...
TMPA-2017: The Quest for Average Response Time
Ad

Similar to TMPA-2015: Towards a Usable Defect Prediction Tool: Crossbreeding Machine Learning and Heuristics (20)

PDF
Lecture #6. automation testing (andrey oleynik)
PDF
Getting Ahead of Delivery Issues with Deep SDLC Analysis by Donald Belcham
PDF
Continuous integration
PPT
Test planning and software's engineering
PPTX
DevOps for AI Apps
PDF
Code Review Tool Evaluation
PPT
Ch11lect1 ud
PDF
Continuous Integration
PPTX
How to Become a Senior
PPT
Bugday bkk-2014 nitisak-auto_perf
PPT
Test Driven Development using QUnit
PDF
ISTQB - CTFL Summary v1.0
PPTX
Testing Best Practices
PDF
Introduction to-automated-testing
PDF
Introduction to Automated Testing
PPTX
Development Processes and Tooling
POT
Quality metrics and angular js applications
PDF
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
PPT
Automated Software Testing Framework Training by Quontra Solutions
PDF
Selenium Today vs. Selenium Tomorrow: Digital as the Convergence of Mobile & ...
Lecture #6. automation testing (andrey oleynik)
Getting Ahead of Delivery Issues with Deep SDLC Analysis by Donald Belcham
Continuous integration
Test planning and software's engineering
DevOps for AI Apps
Code Review Tool Evaluation
Ch11lect1 ud
Continuous Integration
How to Become a Senior
Bugday bkk-2014 nitisak-auto_perf
Test Driven Development using QUnit
ISTQB - CTFL Summary v1.0
Testing Best Practices
Introduction to-automated-testing
Introduction to Automated Testing
Development Processes and Tooling
Quality metrics and angular js applications
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Automated Software Testing Framework Training by Quontra Solutions
Selenium Today vs. Selenium Tomorrow: Digital as the Convergence of Mobile & ...

More from Iosif Itkin (20)

PDF
Foundations of Software Testing Lecture 4
PPTX
QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...
PDF
Exactpro FinTech Webinar - Global Exchanges Test Oracles
PDF
Exactpro FinTech Webinar - Global Exchanges FIX Protocol
PDF
Operational Resilience in Financial Market Infrastructures
PDF
20 Simple Questions from Exactpro for Your Enjoyment This Holiday Season
PDF
Testing the Intelligence of your AI
PDF
EXTENT 2019: Exactpro Quality Assurance for Financial Market Infrastructures
PDF
ClearTH Test Automation Framework: Case Study in IRS & CDS Swaps Lifecycle Mo...
PPTX
EXTENT Talks 2019 Tbilisi: Failover and Recovery Test Automation - Ivan Shamrai
PDF
EXTENT Talks QA Community Tbilisi 20 April 2019 - Conference Open
PDF
User-Assisted Log Analysis for Quality Control of Distributed Fintech Applica...
PPTX
QAFF Chicago 2019 - Complex Post-Trade Systems, Requirements Traceability and...
PDF
QA Community Saratov: Past, Present, Future (2019-02-08)
PDF
Machine Learning and RoboCop Testing
PDF
Behaviour Driven Development: Oltre i limiti del possibile
PDF
2018 - Exactpro Year in Review
PPTX
Exactpro Discussion about Joy and Strategy
PPTX
FIX EMEA Conference 2018 - Post Trade Software Testing Challenges
PDF
BDD. The Outer Limits. Iosif Itkin at Youcon (in Russian)
Foundations of Software Testing Lecture 4
QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...
Exactpro FinTech Webinar - Global Exchanges Test Oracles
Exactpro FinTech Webinar - Global Exchanges FIX Protocol
Operational Resilience in Financial Market Infrastructures
20 Simple Questions from Exactpro for Your Enjoyment This Holiday Season
Testing the Intelligence of your AI
EXTENT 2019: Exactpro Quality Assurance for Financial Market Infrastructures
ClearTH Test Automation Framework: Case Study in IRS & CDS Swaps Lifecycle Mo...
EXTENT Talks 2019 Tbilisi: Failover and Recovery Test Automation - Ivan Shamrai
EXTENT Talks QA Community Tbilisi 20 April 2019 - Conference Open
User-Assisted Log Analysis for Quality Control of Distributed Fintech Applica...
QAFF Chicago 2019 - Complex Post-Trade Systems, Requirements Traceability and...
QA Community Saratov: Past, Present, Future (2019-02-08)
Machine Learning and RoboCop Testing
Behaviour Driven Development: Oltre i limiti del possibile
2018 - Exactpro Year in Review
Exactpro Discussion about Joy and Strategy
FIX EMEA Conference 2018 - Post Trade Software Testing Challenges
BDD. The Outer Limits. Iosif Itkin at Youcon (in Russian)

Recently uploaded (20)

PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
The scientific heritage No 166 (166) (2025)
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
Microbiology with diagram medical studies .pptx
PPTX
Overview of calcium in human muscles.pptx
PPTX
Application of enzymes in medicine (2).pptx
PPTX
Science Quipper for lesson in grade 8 Matatag Curriculum
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
BIOMOLECULES PPT........................
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
The scientific heritage No 166 (166) (2025)
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
7. General Toxicologyfor clinical phrmacy.pptx
Classification Systems_TAXONOMY_SCIENCE8.pptx
Phytochemical Investigation of Miliusa longipes.pdf
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Taita Taveta Laboratory Technician Workshop Presentation.pptx
Microbiology with diagram medical studies .pptx
Overview of calcium in human muscles.pptx
Application of enzymes in medicine (2).pptx
Science Quipper for lesson in grade 8 Matatag Curriculum
neck nodes and dissection types and lymph nodes levels
BIOMOLECULES PPT........................

TMPA-2015: Towards a Usable Defect Prediction Tool: Crossbreeding Machine Learning and Heuristics

  • 1. Towards a usable defect prediction tool Vladimir Kovalenko JetBrains Crossbreeding machine learning and heuristics
  • 2. Defect Prediction • Common goal: identify defect-prone entities in advance • Why? • QA • Resource allocation (testing, review, etc)
  • 3. Previous research • Academia • Microsoft Research • Google case study
  • 4. Research papers (common points) • ML defect prediction models work in general • No universal model: projects are too different • Code metrics as features improve prediction quality • Typical precision/recall ~0.7
  • 5. Google case study • Collaborated with researchers to introduce defect prediction in internal code review system • Came up with a heuristic model Google Bug Prediction Score (Time Weighted Risk)
  • 6. Tools • No defect prediction tools known to be used in industry • Why? • Too low accuracy • Too much effort to set up
  • 7. Tool usability criteria • Language independent • “entity” >= file • Little or no effort to set up • no plain supervised learning • Near real-time • Easy to use: VCS agnostic, etc • Accurate!
  • 8. Implementation • CI server plugin • Only use VCS metrics • Automatic bugfix changes detection (heuristics) • Processing: detect bug-introducing changes • ML classifier: Naive Bayes / Decision Tree • Take prediction top, not absolute values • Automatic quality evaluation
  • 9. Features • Local change frequency • Number of authors • File age • Number of affecting commits • Google Score
  • 10. Quality Evaluation • Bug tracker integration: find bugfix changes • Quality metric: fraction of files from model predictions affected by bugfix changes in the future
  • 11. Result samples Project A, 2 years Project B, 1 year
  • 12. Conclusions • It is possible combine learning and heuristic approaches to get the best of two worlds • The accuracy is still not good enough • No wonder no prediction tools are widely used yet