An {Execution-Semantic,
Content-and-Context}-Based
Code-Clone
{Detection,Analysis}
Toshihiro Kamiya
Future University Hakodate
kamiya@fun.ac.jp
Toshihiro Kamiya: An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and Analysis,
Proceedings of the 9th IEEE International Workshop on Software Clones (IWSC'15), pp. 1-7 (2015).
TOC
● Problem/Motivation
● Outline of proposed method
● Example
● Algorithm of clone detection
● Visualization
● Implementation
● Preliminary experiment
The problems / Motivation
● In functional PLs, developers can define their own control
structure.
– Analyzing only pre-defined control statements is no longer sufficient to
represent code pattern.
– E.g., if (C) A; else B; ⇔ myIf(C, lambdaA, lambdaB);
→ inter-procedural analysis
● Dynamic dispatching makes inter-procedural analysis difficult.
– Esp. in functional + OO + dynamically typed PLs
(no explicit type declaration → hard to analyze dispatches in a static
way)
Idea
Detect clones from an execution trace !
● Dispatches and control structures have been
expanded (resolved).
● Detected clones are inter-procedural, type 3
clones.
Outline of proposed method
● Execution trace
→ Call tree
→ Contents and Context (for each node)
●
main()
os.listdir()
print_extensions
_w_for_stmt()
print_extensions
_w_map_func()
os.path.
splitext() print str.join()get_extensions() print
map()
lambda() at line 8
os.path.
splitext()
contents
context
Clone detection
Clone analysis
Contents
Context
Example code
These two functions are...
A helper function
...a semantic clone.
The same
functionality: finds
extensions of given
files and prints
them out
Shared items
and differences
Distinct loops.
for vs map
All shared items are
contained in a function.
Shared items are
spread into functions.
Detection steps
Input: a call tree (← execution trace ← target
program)
1. Extracts contents and context of each node
2. Identifies sets of contents-sharing nodes
3. Removes redundant nodes (filtering with
contexts)
Input
…
call __main__//<module> runpy//_run_code 69
:
load_const __main__//<module> 0
load_const __main__//<module> 12
load_const __main__//<module> 21
load_const __main__//<module> 30
load_const __main__//<module> 39
call __main__//main __main__//<module> 63
:
call __main__//print_extensions_w_for_stmt __main__//main 24
: <list>
call posixpath//splitext __main__//print_extensions_w_for_stmt 25
: 'about.txt'
call genericpath//_splitext posixpath//splitext 18
: 'about.txt' '/' None '.'
load_const genericpath//_splitext 0
return genericpath//_splitext 139
: * 'about' '.txt'
return posixpath//splitext 21
: * 'about' '.txt'
call pygoat.hook/Out/write __main__//print_extensions_w_for_stmt 32
: '.txt'
return pygoat.hook/Out/write 15
call pygoat.hook/Out/write __main__//print_extensions_w_for_stmt 33
: 'n'
return pygoat.hook/Out/write 15
call posixpath//splitext __main__//print_extensions_w_for_stmt 25
: 'pygoat.data'
call genericpath//_splitext posixpath//splitext 18
: 'pygoat.data' '/' None '.'
load_const genericpath//_splitext 0
return genericpath//_splitext 139
: * 'pygoat' '.data'
return posixpath//splitext 21
: * 'pygoat' '.data'
call pygoat.hook/Out/write __main__//print_extensions_w_for_stmt 32
: '.data'
return pygoat.hook/Out/write 15
call pygoat.hook/Out/write __main__//print_extensions_w_for_stmt 33
: 'n'
return pygoat.hook/Out/write 15
call posixpath//splitext __main__//print_extensions_w_for_stmt 25
: 'greeting.md'
call genericpath//_splitext posixpath//splitext 18
: 'greeting.md' '/' None '.'
load_const genericpath//_splitext 0
return genericpath//_splitext 139
: * 'greeting' '.md'
return posixpath//splitext 21
: * 'greeting' '.md'
call pygoat.hook/Out/write __main__//print_extensions_w_for_stmt 32
: '.md'
return pygoat.hook/Out/write 15
call pygoat.hook/Out/write __main__//print_extensions_w_for_stmt 33
Program
Execution trace
main()
os.listdir()
print_extensions
_w_for_stmt()
print_extensions
_w_map_func()
os.path.
splitext() print str.join()get_extensions() print
map()
lambda() at line 8
os.path.
splitext()
Call tree
Input: a call tree (← execution trace ← target
program)
1. Extracts contents and context of each node
2. Identifies sets of contents-sharing nodes
3. Removes redundant nodes (filtering with
contexts)
Step 1.
1. Extracts contents and context of each node
main()
os.listdir()
print_extensions
_w_for_stmt()
print_extensions
_w_map_func()
os.path.
splitext() print str.join()get_extensions() print
map()
lambda() at line 8
os.path.
splitext()
main()
get_extensions(),
map(),
lambda() at line 8,
os.listdir(),
os.path.split(),
print,
print_extensions_w_for_stmt(),
print_extensions_w_map_func(),
str.join()
print_extensions_w_for_stmt()
main()
os.path.split()
print
print_extensions_w_map_func()
main()
get_extensions(),
map(),
lambda() at line 8,
os.path.split(),
print,
str.join()
Input: a call tree (← execution trace ← target
program)
1. Extracts contents and context of each node
2. Identifies sets of contents-sharing nodes
3. Removes redundant nodes (filtering with
contexts)
Step 2.
2. Identifies sets of contents-sharing nodes
main()
get_extensions(),
map(),
lambda() at line 8,
os.listdir(),
os.path.split(),
print,
print_extensions_w_for_stmt(),
print_extensions_w_map_func(),
str.join()
print_extensions_w_for_stmt()
main()
os.path.split()
print
print_extensions_w_map_func()
main()
get_extensions(),
map(),
lambda() at line 8,
os.path.split(),
print,
str.join()
Input: a call tree (← execution trace ← target
program)
1. Extracts contents and context of each node
2. Identifies sets of contents-sharing nodes
3. Removes redundant nodes (filtering with
contexts)
Step 3.
3. Removes redundant nodes (filtering with
contexts) main()
get_extensions(),
map(),
lambda() at line 8,
os.listdir(),
os.path.split(),
print,
print_extensions_w_for_stmt(),
print_extensions_w_map_func(),
str.join()
print_extensions_w_for_stmt()
main()
os.path.split()
print
print_extensions_w_map_func()
main()
get_extensions(),
map(),
lambda() at line 8,
os.path.split(),
print,
str.join()
Included by all of other
nodes in the set
⇒ redundant
Input: a call tree (← execution trace ← target
program)
1. Extracts contents and context of each node
2. Identifies sets of contents-sharing nodes
3. Removes redundant nodes (filtering with
contexts)
Detection result
A clone class:
{ print_extensions_w_map_func(),
print_extensions_w_for_stmt() }
Shared items:
{ os.path.split(), print }
print_extensions_w_for_stmt()
main()
os.path.split()
print
print_extensions_w_map_func()
main()
get_extensions(),
map(),
lambda() at line 8,
os.path.split(),
print,
str.join()
Detection result
A clone class:
{ print_extensions_w_map_func(),
print_extensions_w_for_stmt() }
Shared items:
{ os.path.split(), print }
dagified (merged) by label
(DAG = directed acyclic graph)
Context
Contents
main()
print_extensions
_w_for_stmt()
print_extensions
_w_map_func()
get_extensions()print
map()
lambda() at line 8
os.path.
splitext()
Content-and-context analysis for triaging
● Clone class (a), shared items (b), distinct contents (or gap) (c)
● The distinct contents (c) shared the same set of
(sub-)contents (d) → (c) is another clone class.
● If (c) is merged before (a), (c) will not be a gap of (a)
anymore.
(a)
(b)
(c)
(d)
Detected from markdown2's
code (described later)
Tool prototype
Target program Inputs / Test
cases
Execution
(Python
interpreter)
Execution trace
Debugging /
profiling APIs
Execution trace
extraction
String balloon
generation
String balloons
Frequent item-set
mining
(Apriori)
Similar sets of
contents
Redundant context
removal
Code clones
Step 1
Step 2
Step 3
Detection
Visualization Metrics calculation
Analysis
● Input: Python source code
● Uses a frequent item-set mining
algorithm / implementation
– Apriori (www.borgelt.net/apriori.html)
● Heuristics / optimizations
– Max. depth of contents from a target node
(default 5)
– Max. number of content items of a
candidate node (default 25)
● Filters out the nodes with large contents, i.e.,
nodes near to the root of call tree
– Removal of basic, primitive functions
– ...
Content-and-context clone on call graph
Preliminary experiment
for each of the parameter(“Max. number of
content items of a candidate node”) values:
10, 15, …, 30.
Target product Collection of exe. seq. # function
calls
# unique
labels
markdown2 Running 144 unit tests 227K 1128
wxPython Invoking a sample
program “pySketch”
483K 1058
Results
Results
Exponential to
number of contents
Too “peaky” for practical use
Summary
● A code-clone detection from a dynamic info, execution trace
– Aiming to apply functional/dynamically typed PLs
● Context-and-content analysis for triage
● Algorithm, implementation, heuristics
● Preliminary experiment
– Targets: markdown2 and wxPython
– Peaky, sensitive to a parameter Max. number of content items of a candidate node →
Needs refinements
Omitted, refer the paper:
● Threats to validity
● Future plan
(a)
(b)
(c)
(d)

More Related Content

PDF
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods
PDF
Clone detection in Python
PPTX
Clonedigger-Python
PDF
Unsupervised Machine Learning for clone detection
PPTX
C Language (All Concept)
PDF
C programming language
PDF
File Handling in C Programming
PPT
C++ Interview Questions
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods
Clone detection in Python
Clonedigger-Python
Unsupervised Machine Learning for clone detection
C Language (All Concept)
C programming language
File Handling in C Programming
C++ Interview Questions

What's hot (20)

PDF
Notes part 8
PDF
answer-model-qp-15-pcd13pcd
PPTX
C language updated
PDF
Embedded C - Lecture 2
PDF
Hands-on Introduction to the C Programming Language
PDF
C Programming Project
PPTX
Yacc (yet another compiler compiler)
PDF
Advanced C Language for Engineering
ODP
OpenGurukul : Language : C Programming
PDF
Programming languages
PDF
C Programming Tutorial - www.infomtec.com
PDF
C programming day#1
PPT
C++ Programming Course
PDF
Function overloading ppt
PPTX
Overview of c language
PDF
L6
DOC
'C' language notes (a.p)
PPT
C language basics
PDF
Unit iii
PDF
C intro
Notes part 8
answer-model-qp-15-pcd13pcd
C language updated
Embedded C - Lecture 2
Hands-on Introduction to the C Programming Language
C Programming Project
Yacc (yet another compiler compiler)
Advanced C Language for Engineering
OpenGurukul : Language : C Programming
Programming languages
C Programming Tutorial - www.infomtec.com
C programming day#1
C++ Programming Course
Function overloading ppt
Overview of c language
L6
'C' language notes (a.p)
C language basics
Unit iii
C intro
Ad

Similar to An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and Analysis (20)

PDF
Not Your Fathers C - C Application Development In 2016
ODP
Linux kernel tracing superpowers in the cloud
PPTX
Andriy Shalaenko - GO security tips
PDF
Semmle Codeql
PDF
02 c++g3 d (1)
PDF
R programming for data science
PDF
LAS16-501: Introduction to LLVM - Projects, Components, Integration, Internals
PDF
Picking Mushrooms after Cppcheck
PDF
Scala laboratory: Globus. iteration #2
PDF
C notes.pdf
PDF
Modeling the Behavior of Threads in the PREEMPT_RT Linux Kernel Using Automata
PDF
breaking_dependencies_the_solid_principles__klaus_iglberger__cppcon_2020.pdf
PDF
Go 1.10 Release Party - PDX Go
PDF
Clang: More than just a C/C++ Compiler
PDF
Internship - Final Presentation (26-08-2015)
PDF
ceph::errorator<> throw/catch-free, compile time-checked exceptions for seast...
PPTX
Generate typings from JavaScript with TypeScript 3.7
PDF
C++ amp on linux
PDF
Checking the Open-Source Multi Theft Auto Game
Not Your Fathers C - C Application Development In 2016
Linux kernel tracing superpowers in the cloud
Andriy Shalaenko - GO security tips
Semmle Codeql
02 c++g3 d (1)
R programming for data science
LAS16-501: Introduction to LLVM - Projects, Components, Integration, Internals
Picking Mushrooms after Cppcheck
Scala laboratory: Globus. iteration #2
C notes.pdf
Modeling the Behavior of Threads in the PREEMPT_RT Linux Kernel Using Automata
breaking_dependencies_the_solid_principles__klaus_iglberger__cppcon_2020.pdf
Go 1.10 Release Party - PDX Go
Clang: More than just a C/C++ Compiler
Internship - Final Presentation (26-08-2015)
ceph::errorator<> throw/catch-free, compile time-checked exceptions for seast...
Generate typings from JavaScript with TypeScript 3.7
C++ amp on linux
Checking the Open-Source Multi Theft Auto Game
Ad

More from Kamiya Toshihiro (14)

PDF
ソースコード推薦あるいは修正の情報源としての質問掲示板とソースコードレポジトリの比較
PDF
Code Difference Visualization by a Call Tree
PDF
実行トレース間のデータの差異に基づくデータフロー解析手法の提案
PDF
コードクローン研究 ふりかえり ~ストロング・スタイルで行こう~
PDF
逆戻りデバッグ補助のための嵌入的スパイの試作
PDF
任意粒度機能モデルコードクローン検出手法のリファクタリング理解への適用の試み
PDF
任意粒度機能モデルに基づく動的型付けプログラミング言語向けソースコード検索手法の提案
PDF
Web アプリケーションの UI 機能テストの ための HTML 構造パターンの抽出手法
PDF
WebアプリケーションのUI機能テストのためのHTML構造パターンの提案
PDF
An Algorithm for Keyword Search on an Execution Path
PDF
And/Or/Callグラフの提案とソースコード検索への応用
PDF
PBLへのアジャイル開発手法導入の試み
PDF
任意粒度機能モデルに基づくコードクローン検出手法の大規模プログラムの適用に向けた改善
PDF
任意粒度機能モデルに基づくバイトコードからのコードクローン検出手法
ソースコード推薦あるいは修正の情報源としての質問掲示板とソースコードレポジトリの比較
Code Difference Visualization by a Call Tree
実行トレース間のデータの差異に基づくデータフロー解析手法の提案
コードクローン研究 ふりかえり ~ストロング・スタイルで行こう~
逆戻りデバッグ補助のための嵌入的スパイの試作
任意粒度機能モデルコードクローン検出手法のリファクタリング理解への適用の試み
任意粒度機能モデルに基づく動的型付けプログラミング言語向けソースコード検索手法の提案
Web アプリケーションの UI 機能テストの ための HTML 構造パターンの抽出手法
WebアプリケーションのUI機能テストのためのHTML構造パターンの提案
An Algorithm for Keyword Search on an Execution Path
And/Or/Callグラフの提案とソースコード検索への応用
PBLへのアジャイル開発手法導入の試み
任意粒度機能モデルに基づくコードクローン検出手法の大規模プログラムの適用に向けた改善
任意粒度機能モデルに基づくバイトコードからのコードクローン検出手法

Recently uploaded (20)

PPTX
Platelet disorders - thrombocytopenia.pptx
PDF
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
PPT
Enhancing Laboratory Quality Through ISO 15189 Compliance
PPTX
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
PDF
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
PPTX
Introcution to Microbes Burton's Biology for the Health
PPTX
gene cloning powerpoint for general biology 2
PDF
The Future of Telehealth: Engineering New Platforms for Care (www.kiu.ac.ug)
PPT
LEC Synthetic Biology and its application.ppt
PPTX
congenital heart diseases of burao university.pptx
PDF
From Molecular Interactions to Solubility in Deep Eutectic Solvents: Explorin...
PDF
Cosmology using numerical relativity - what hapenned before big bang?
PPTX
limit test definition and all limit tests
PPTX
LIPID & AMINO ACID METABOLISM UNIT-III, B PHARM II SEMESTER
PDF
CuO Nps photocatalysts 15156456551564161
PDF
5.Physics 8-WBS_Light.pdfFHDGJDJHFGHJHFTY
PPTX
A powerpoint on colorectal cancer with brief background
PPT
Mutation in dna of bacteria and repairss
PPT
Biochemestry- PPT ON Protein,Nitrogenous constituents of Urine, Blood, their ...
PDF
7.Physics_8_WBS_Electricity.pdfXFGXFDHFHG
Platelet disorders - thrombocytopenia.pptx
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
Enhancing Laboratory Quality Through ISO 15189 Compliance
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
Introcution to Microbes Burton's Biology for the Health
gene cloning powerpoint for general biology 2
The Future of Telehealth: Engineering New Platforms for Care (www.kiu.ac.ug)
LEC Synthetic Biology and its application.ppt
congenital heart diseases of burao university.pptx
From Molecular Interactions to Solubility in Deep Eutectic Solvents: Explorin...
Cosmology using numerical relativity - what hapenned before big bang?
limit test definition and all limit tests
LIPID & AMINO ACID METABOLISM UNIT-III, B PHARM II SEMESTER
CuO Nps photocatalysts 15156456551564161
5.Physics 8-WBS_Light.pdfFHDGJDJHFGHJHFTY
A powerpoint on colorectal cancer with brief background
Mutation in dna of bacteria and repairss
Biochemestry- PPT ON Protein,Nitrogenous constituents of Urine, Blood, their ...
7.Physics_8_WBS_Electricity.pdfXFGXFDHFHG

An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and Analysis

  • 1. An {Execution-Semantic, Content-and-Context}-Based Code-Clone {Detection,Analysis} Toshihiro Kamiya Future University Hakodate [email protected] Toshihiro Kamiya: An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and Analysis, Proceedings of the 9th IEEE International Workshop on Software Clones (IWSC'15), pp. 1-7 (2015).
  • 2. TOC ● Problem/Motivation ● Outline of proposed method ● Example ● Algorithm of clone detection ● Visualization ● Implementation ● Preliminary experiment
  • 3. The problems / Motivation ● In functional PLs, developers can define their own control structure. – Analyzing only pre-defined control statements is no longer sufficient to represent code pattern. – E.g., if (C) A; else B; ⇔ myIf(C, lambdaA, lambdaB); → inter-procedural analysis ● Dynamic dispatching makes inter-procedural analysis difficult. – Esp. in functional + OO + dynamically typed PLs (no explicit type declaration → hard to analyze dispatches in a static way)
  • 4. Idea Detect clones from an execution trace ! ● Dispatches and control structures have been expanded (resolved). ● Detected clones are inter-procedural, type 3 clones.
  • 5. Outline of proposed method ● Execution trace → Call tree → Contents and Context (for each node) ● main() os.listdir() print_extensions _w_for_stmt() print_extensions _w_map_func() os.path. splitext() print str.join()get_extensions() print map() lambda() at line 8 os.path. splitext() contents context Clone detection Clone analysis Contents Context
  • 7. These two functions are... A helper function
  • 8. ...a semantic clone. The same functionality: finds extensions of given files and prints them out
  • 10. and differences Distinct loops. for vs map All shared items are contained in a function. Shared items are spread into functions.
  • 11. Detection steps Input: a call tree (← execution trace ← target program) 1. Extracts contents and context of each node 2. Identifies sets of contents-sharing nodes 3. Removes redundant nodes (filtering with contexts)
  • 12. Input … call __main__//<module> runpy//_run_code 69 : load_const __main__//<module> 0 load_const __main__//<module> 12 load_const __main__//<module> 21 load_const __main__//<module> 30 load_const __main__//<module> 39 call __main__//main __main__//<module> 63 : call __main__//print_extensions_w_for_stmt __main__//main 24 : <list> call posixpath//splitext __main__//print_extensions_w_for_stmt 25 : 'about.txt' call genericpath//_splitext posixpath//splitext 18 : 'about.txt' '/' None '.' load_const genericpath//_splitext 0 return genericpath//_splitext 139 : * 'about' '.txt' return posixpath//splitext 21 : * 'about' '.txt' call pygoat.hook/Out/write __main__//print_extensions_w_for_stmt 32 : '.txt' return pygoat.hook/Out/write 15 call pygoat.hook/Out/write __main__//print_extensions_w_for_stmt 33 : 'n' return pygoat.hook/Out/write 15 call posixpath//splitext __main__//print_extensions_w_for_stmt 25 : 'pygoat.data' call genericpath//_splitext posixpath//splitext 18 : 'pygoat.data' '/' None '.' load_const genericpath//_splitext 0 return genericpath//_splitext 139 : * 'pygoat' '.data' return posixpath//splitext 21 : * 'pygoat' '.data' call pygoat.hook/Out/write __main__//print_extensions_w_for_stmt 32 : '.data' return pygoat.hook/Out/write 15 call pygoat.hook/Out/write __main__//print_extensions_w_for_stmt 33 : 'n' return pygoat.hook/Out/write 15 call posixpath//splitext __main__//print_extensions_w_for_stmt 25 : 'greeting.md' call genericpath//_splitext posixpath//splitext 18 : 'greeting.md' '/' None '.' load_const genericpath//_splitext 0 return genericpath//_splitext 139 : * 'greeting' '.md' return posixpath//splitext 21 : * 'greeting' '.md' call pygoat.hook/Out/write __main__//print_extensions_w_for_stmt 32 : '.md' return pygoat.hook/Out/write 15 call pygoat.hook/Out/write __main__//print_extensions_w_for_stmt 33 Program Execution trace main() os.listdir() print_extensions _w_for_stmt() print_extensions _w_map_func() os.path. splitext() print str.join()get_extensions() print map() lambda() at line 8 os.path. splitext() Call tree Input: a call tree (← execution trace ← target program) 1. Extracts contents and context of each node 2. Identifies sets of contents-sharing nodes 3. Removes redundant nodes (filtering with contexts)
  • 13. Step 1. 1. Extracts contents and context of each node main() os.listdir() print_extensions _w_for_stmt() print_extensions _w_map_func() os.path. splitext() print str.join()get_extensions() print map() lambda() at line 8 os.path. splitext() main() get_extensions(), map(), lambda() at line 8, os.listdir(), os.path.split(), print, print_extensions_w_for_stmt(), print_extensions_w_map_func(), str.join() print_extensions_w_for_stmt() main() os.path.split() print print_extensions_w_map_func() main() get_extensions(), map(), lambda() at line 8, os.path.split(), print, str.join() Input: a call tree (← execution trace ← target program) 1. Extracts contents and context of each node 2. Identifies sets of contents-sharing nodes 3. Removes redundant nodes (filtering with contexts)
  • 14. Step 2. 2. Identifies sets of contents-sharing nodes main() get_extensions(), map(), lambda() at line 8, os.listdir(), os.path.split(), print, print_extensions_w_for_stmt(), print_extensions_w_map_func(), str.join() print_extensions_w_for_stmt() main() os.path.split() print print_extensions_w_map_func() main() get_extensions(), map(), lambda() at line 8, os.path.split(), print, str.join() Input: a call tree (← execution trace ← target program) 1. Extracts contents and context of each node 2. Identifies sets of contents-sharing nodes 3. Removes redundant nodes (filtering with contexts)
  • 15. Step 3. 3. Removes redundant nodes (filtering with contexts) main() get_extensions(), map(), lambda() at line 8, os.listdir(), os.path.split(), print, print_extensions_w_for_stmt(), print_extensions_w_map_func(), str.join() print_extensions_w_for_stmt() main() os.path.split() print print_extensions_w_map_func() main() get_extensions(), map(), lambda() at line 8, os.path.split(), print, str.join() Included by all of other nodes in the set ⇒ redundant Input: a call tree (← execution trace ← target program) 1. Extracts contents and context of each node 2. Identifies sets of contents-sharing nodes 3. Removes redundant nodes (filtering with contexts)
  • 16. Detection result A clone class: { print_extensions_w_map_func(), print_extensions_w_for_stmt() } Shared items: { os.path.split(), print } print_extensions_w_for_stmt() main() os.path.split() print print_extensions_w_map_func() main() get_extensions(), map(), lambda() at line 8, os.path.split(), print, str.join()
  • 17. Detection result A clone class: { print_extensions_w_map_func(), print_extensions_w_for_stmt() } Shared items: { os.path.split(), print } dagified (merged) by label (DAG = directed acyclic graph) Context Contents main() print_extensions _w_for_stmt() print_extensions _w_map_func() get_extensions()print map() lambda() at line 8 os.path. splitext()
  • 18. Content-and-context analysis for triaging ● Clone class (a), shared items (b), distinct contents (or gap) (c) ● The distinct contents (c) shared the same set of (sub-)contents (d) → (c) is another clone class. ● If (c) is merged before (a), (c) will not be a gap of (a) anymore. (a) (b) (c) (d) Detected from markdown2's code (described later)
  • 19. Tool prototype Target program Inputs / Test cases Execution (Python interpreter) Execution trace Debugging / profiling APIs Execution trace extraction String balloon generation String balloons Frequent item-set mining (Apriori) Similar sets of contents Redundant context removal Code clones Step 1 Step 2 Step 3 Detection Visualization Metrics calculation Analysis ● Input: Python source code ● Uses a frequent item-set mining algorithm / implementation – Apriori (www.borgelt.net/apriori.html) ● Heuristics / optimizations – Max. depth of contents from a target node (default 5) – Max. number of content items of a candidate node (default 25) ● Filters out the nodes with large contents, i.e., nodes near to the root of call tree – Removal of basic, primitive functions – ... Content-and-context clone on call graph
  • 20. Preliminary experiment for each of the parameter(“Max. number of content items of a candidate node”) values: 10, 15, …, 30. Target product Collection of exe. seq. # function calls # unique labels markdown2 Running 144 unit tests 227K 1128 wxPython Invoking a sample program “pySketch” 483K 1058
  • 22. Results Exponential to number of contents Too “peaky” for practical use
  • 23. Summary ● A code-clone detection from a dynamic info, execution trace – Aiming to apply functional/dynamically typed PLs ● Context-and-content analysis for triage ● Algorithm, implementation, heuristics ● Preliminary experiment – Targets: markdown2 and wxPython – Peaky, sensitive to a parameter Max. number of content items of a candidate node → Needs refinements Omitted, refer the paper: ● Threats to validity ● Future plan (a) (b) (c) (d)