SlideShare a Scribd company logo
International Journal of Scientific Research and Engineering Development-– Volume 2 Issue 6, Nov- Dec 2019
Available at www.ijsred.com
ISSN : 2581-7175 ©IJSRED:All Rights are Reserved Page 202
Exploiting Semantics-Based Plagiarism Detection Methods
Sajin R Nair1
, Smita C Thomas2
1
P G scholar, Dept. of CSE, Mount Zion College of Engineering, Kadammanitta, Kerala, India
Email:sajinrnair1995@gmail.com
2
Research Scholar,Vels University
Email: smitabejoy@gmail.com
----------------------------------------************************----------------------------------
Abstract:
Manycomplex networked systems exhibit natural divisions of network nodes. Each division, or community, is
adversely connected subgroup. Such community structure not only helps comprehension but also finds wide
applications in complex systems. Moreover, existing proposed applications of software community structure have not
been directly compared or combined with existing software engineering practices. Comparison with baseline practices
is needed to convince practitioners to adopt the proposed approaches. However today’s vetting mechanisms are slow
and less capable of catching new threats author have implemented a set of tools collectively called TOB-PD (TOB
based Plagiarism Detection tool) by applying TOB to three existing representative dynamic birthmarks, including
SCSSB (System Call Short Sequence Birthmark), DYKIS (DYnamic Key Instruction Sequence birthmark) and JB (an
API based birthmark for Java). This experiments conducted on large number of binary programs show that
proposedapproach exhibits strong resilience against state-of-the-art semantics-preserving code obfuscation techniques.
Comparisons against the three existing tools SCSSB, DYKIS and JB show that the new framework is effective for
plagiarism detection of multithreaded programs. The tools, the benchmarks and the experimental results are all
publicly available.
Keywords —Multithreaded programming, Android malware,Software plagiarism, Birthmark,
SCSSB (System Call Short Sequence Birthmark), DYKIS (DYnamic Key Instruction Sequence birthmark), JB
(an API based birthmark for JavTOB-PD (TOB based Plagiarism Detection tool).
----------------------------------------************************----------------------------------
I. INTRODUCTION
The growth of Android devices brings in a
vibrant application ecosystem. Millions of
applications (appforshort) have been installed by
Android users around the world from various app
markets. Prominent examples include Google Play,
Amazon Appstore, Samsung Galaxy Apps,and tens
of smaller third-party markets. Software plagiarism,
ranging from open source code reusing to
smartphone app repacking, severely a ect both
open source communities and software
companies.Despite the tremendous progress in
software birthmarking, For example, birthmarks
extracted from multiple runs of the same
multithreaded programs can bevery different due to
the inherent non-determinism of thread scheduling.
In this case software birthmarking fails to declare
plagiarism even for simply duplicated
multithreaded programs.Despite the tremendous
progress in software birthmarking, the trend
towards multithreaded programming greatly
threatens its effectiveness, as the existing
approaches remain optimized for sequential
RESEARCH ARTICLE OPEN ACCESS
International Journal of Scientific Research and Engineering Development-– Volume 2 Issue 6, Nov- Dec 2019
Available at www.ijsred.com
ISSN : 2581-7175 ©IJSRED: All Rights are Reserved Page 203
programs. Birthmarks extracted from multiple runs
of the same multithreaded programs can be very
different due to the inherent non-determinism of
thread scheduling.
The basic research problem for code similarity
measurement techniques is to detect whether a
component in one program is similar to a
component in another program and quantitatively
measure their similarity. A component can be a set
of functions or a whole program. Existing methods
include clone detection, binary similarity detection,
and software plagiarism detection. While these
approaches have been proven to be very useful,
each of them has its shortcomings.software
plagiarism detection technology, a new trend in
software development greatly threatens its
effectiveness. The trend towards multithreaded
programs is creating a gap betauthoren the current
software development practice and the software
plagiarism detection technology, as the existing
dynamic approaches remain optimized for
sequential programs. Due to the perturbation caused
by non-deterministic thread scheduling, existing
birthmark generation and comparison are no longer
applicable to modern software with multiple threads.
In this paper, compare different technologies or
methods that can be used to find out the plagiarized
softwares. And also find out best one from the
comparison.
II. LITERATURE SURVEY
K. Chen, et al.[5] propose a new designs of
vetting techniques have recently been proposed by
there search community for capturing new apps
associated with known suspicious behavior, such as
dynamic loading of binary code from a remote
untrusted authorbsite operations related to
component hijacking Intent injection , etc.
All these approaches involve a heavyauthoright
information-flow analysis and require a set of
heuristics that characterize the known threats.
Theyoftenneedadynamicanalysisinadditiontothestati
cin section performedonappcode and further
human interventions to annotate the code or even
participate in the analysis. Moreover, emulators that
most dynamic analysis tools employ can be
detected and evaded by malware. Mass vetting at
scale. Based on this simple idea, developed a novel,
highly-scalable vetting mechanism for detecting
repackaged Android malware on one market or
cross markets.
Z. Tian, T. Liu, Q. Zheng, M. Fan, E. Zhuang,
and Z. Yang, [4] Despite the tremendous progress
in software birthmarking, the trend towards multi-
threaded programming greatly threatens its
e ectiveness, as the existing approaches remain
optimized for sequential programs. For example,
birthmarks extracted from multiple runs of the same
multithreaded programs can be very different due to
the inherent non-determinism of thread scheduling.
In this case software birthmarking fails to declare
plagiarism even for simply duplicated
multithreaded programs. In this paper, author
introduce a thread-aware dynamic birthmark called
TreSB (Thread-related System call Birthmark) that
can e ectively detect plagiarism of multithreaded
programs. Being extracted by mining behavior
characteristics from thread related system calls,
TreSB is less susceptible to thread scheduling as
these system calls are sources that impose thread
scheduling rather than being a ected. In addition,
unlike many approachesour approach operates on
binary executables rather than source code. The
latter is usually unavailable when birthmarking is
used to obtain initial evidence of software
plagiarism. Author have implemented a prototype
based on the PIN (Luk et al., 2005) instrumentation
framework, and conducted extensive experiments
on an publicly available benchmark1 consisting of
234 versions of 35 di erent multithreaded
programs. Our empirical study shows that TreSB
and its comparison algorithms are credible in
diferentiating independently
developedprograms,andresilienttomoststate-of-the-
artsemanticspreserving obfuscation techniques
implemented in the best commercial and academic
tools such as SandMark and DashO . In addition, a
comparison of our method against two recently
proposed thread-aware birthmarks show that TreSB
outperforms both of them with respect to any of the
three performance metrics URC, F-Measure and
MCC.
The work in [2] introduces a thread-aware
dynamic birthmark called TreSB (Thread-related
International Journal of Scientific Research and Engineering Development-– Volume 2 Issue 6, Nov- Dec 2019
Available at www.ijsred.com
ISSN : 2581-7175 ©IJSRED: All Rights are Reserved Page 204
System call Birthmark) that can effectively detect
plagiarism of multithreaded programs. Unlike many
approaches ,TreSB operates on binary executables
rather than source code. The latter is usually
unavailable when birthmarking is used to obtain
initial evidence of plagiarism. The extensive
experiments conducted on a publicly available
benchmark consisting of 234 versions of 35
multithreaded programs show good resilience and
credibility of TreSB. A comparison of TreSB
against two recently proposed thread-aware
birthmarks shows that TreSB outperforms both of
them. System calls recorded during program
execution are a favorable basis for extracting high
quality birthmarks. The hypothesis is that
modifications of system calls usually leads to
incorrect program behavior, and therefore a
birthmark generated from system calls can be used
to identify stolen programs even after they have
been modified. Also previous empirical study
shows that the system call based birthmarks are
resilient against various obfuscation techniques. Yet
as illustrated in , these birthmarks become no longer
efficient for multithread programs due to the
nondeterminism of thread scheduling. Based on
similar principle, author also generate birthmarks
from system calls. Yet rather than using all the calls,
author extract TreSB just from the thread-related
system calls, which are essential to the semantics
and correct executions of a multithreaded program.
A random or deliberate modification to the calls can
result in very subtle errors and therefore they are
the least possible code to be changed. More
importantly, these calls are the source to impose
thread interleaving rather than being affected by the
nondeterminism, thus a birthmark extracted from
these calls is less susceptible to thread scheduling.
Specifically, author treat system calls that
accomplish tasks including thread and process
management (such as creation, join and termination,
capability setting and getting), thread
synchronization, signal manipulating, as authorll as
thread and process priority setting, as thread-related.
L. Luo, J. Ming, D. Wu, P. Liu, and S.
Zhu[1]binary-oriented, obfuscationresilient method
named CoP. CoP is based on a new concept, longest
common subsequence of semantically equivalent
basic blocks, which combines rigorous program
semantics with longest common subsequence based
fuzzy matching. Specifically, author model program
semantics at three different levels: basic block, path,
and whole program. To model the semantics of a
basic block, author adopt the symbolic execution
technique to obtain a set of symbolic formulas that
represent the input-output relations of the basic
block in consideration.Then calculate the
percentage of the output variables of the plaintiff
block that have a semantically equivalent
counterpart in the suspicious block. Author set a
threshold for this percentage to allow some noises
to be injected into the suspicious block. At the path
level, author utilize the Longest Common
Subsequence (LCS) algorithm to compare the
semantic similarity of two paths, one from the
plaintiff and the other from the
suspicious ,constructedbased on the LCS dynamic
programming algorithm, with basic blocks as the
sequence elements. By trying more than one path,
author use the path similarity scores from LCS
collectively to model program semantic similarity.
Notethat LCS is different from the longest common
substring. Because LCS allows skipping non-
matching nodes, it naturally tolerates noises
inserted by obfuscation techniques. This novel
combination of rigorous program semantics with
longest common subsequence based fuzzy matching
results in strong resiliency to obfuscation. Author
have developed a prototype of CoP using the above
method. Author evaluated CoP with several
different experiments to measure its obfuscation
resiliency, precision, and scalability.Benchmark
programs, ranging from small to large real-world
production software, authorre applied with different
code obfuscation techniques and semantics-
preserving transformations, including different
compilers and compiler optimization levels. Author
also compared our results with four state-of-the-art
detection systems, MOSS, JPLagBdiff and
DarunGrim2 where MOSS and JPLag are source
code based, and Bdiff and DarunGrim2 are binary
code based. Our experimental results show that CoP
has stronger resiliency to the latest code obfuscation
techniques as authorll as other semantics-preserving
transformations; it can be applied to software
International Journal of Scientific Research and Engineering Development-– Volume 2 Issue 6, Nov- Dec 2019
Available at www.ijsred.com
ISSN : 2581-7175 ©IJSRED: All Rights are Reserved Page 205
plagiarism and algorithm detection, and is effective
and practical to analyze real-world software.
III. PROPOSED SYSTEM
In this paper propose TOB (Thread-
Obliviousdynamic Birthmark), a framework that
can revive existing dynamic birthmarks such as
SCSSB(System Call Short Sequence
Birthmark),DYKIS(DYnamic Key Instruction
Sequence birthmark), JB(an API based birthmark
for Java) to handle multithreaded programs. The
program is a test case in the AUTHORT project
with slight modifications. Author apply two typical
plagiarism detection algorithms on multiple runs of
this program. If the existing approaches fail to
claimplagiarism on different runs of the same small
program, even without any code modifications, the
capability of such approaches is in doubt. Dynamic
birthmarks usually give quantitative measurement
betauthoren 0 and 1 to indicate the similarity
betauthoren two runs. A value of 1 indicates
identicalness and 0 refers to complete difference.
The measurement is given by applying metrics,
such as Cosine distance, Jaccard index, Dice
coefficient and Containment on the birthmarks
obtained by a pair of executions. Tables 1a and 1b
show the experimental results of SCSSB and
DYKIS, where the column and row headings
indicate the number of threads and the evaluation
metrics, respectively. When there is only one thread,
the program becomes sequential. Without non-
deterministic thread scheduling, the executions are
deterministic.As expected, the similarity scores are
all 1.0, pointing out correctly that the runs are from
identical programs. Hoauthorver, as the number of
threads increase, the similarity scores quickly
deteriorate.
A similarity score greater than 1" indicates
strong possibility of plagiarism, while a score less
than " indicates the opposite. Considering typical
value of " is betauthoren 0.15 and 0.35, SCSSB and
DYKIS will not claim plagiarism when the number
of threads is 6, and startto claim the runs are from
different programs when the number of threads
becomes 8.
To the best of our knowledge, this is the first
work that discusses the impact of thread scheduling
on birthmark based software plagiarism detection,
and proposes a solution to remedy the problem
apply the var-gram algorithm in birthmark
generation. As far as author know, this is the first
time this algorithm is used for such purpose. Our
experiments confirm its effectiveness.Implemented
a set of tools collectively called TOB-PD (TOB
based Plagiarism Detection tool) by integrating the
principle of TOB with existing algorithms,
including SCSSB, DYKIS and JB .
Experiments on 418 versions of 35 different
multithreaded programs show that the new tools are
highly effective in detecting plagiarism and are
resilient to most state-of-the-art semantics-
preserving obfuscation techniques implemented in
tools such as SandMark ,DashO and UPX.
IV. CONCLUSION
In this paper, implement a simple method, as
multithreaded software become increasingly more
popular, current dynamic software plagiarism
detection technology geared toward sequential
programs are no longer
sufficient.Existingapproaches are not only accurate
in detecting plagiarism of multithreaded programs
but also robust against most state-of-the-art
semantics preserving obfuscation techniques. The
new birth mark technique can be easy to implement
andThe proposed work addresses the challenges of
applying dynamic birthmark based approaches for
whole program plagiarism detection of
multithreaded software. As far as author know, this
is the first work that discusses the impact of thread
scheduling on birthmark based plagiarism detection,
and the first work that propose thread-oblivious
birthmarks for solving the problem systemically.
REFERENCES
[1] ZhenzhouTian , Ting Liu, Member, IEEE, QinghuaZheng,“Reviving
Sequential Program Birthmarking for Multithreaded Software
Plagiarism Detection,” IEEE Trans. Softw.VOL. 44, NO. 5, pp.491-
511 MAY 2018 .
[2] L. Luo, J. Ming, D. Wu, P. Liu, and S. Zhu, “Semantics-based
obfuscation-resilient binary code similarity comparison with
International Journal of Scientific Research and Engineering Development-– Volume 2 Issue 6, Nov- Dec 2019
Available at www.ijsred.com
ISSN : 2581-7175 ©IJSRED: All Rights are Reserved Page 206
applications to software and algorithm plagiarism detection,” IEEE
Trans. Softw. Eng., 2017
[3] Z. Tian, T. Liu, Q. Zheng, F. Tong, M. Fan,and Z. Yang, “A new
thread-aware birthmark for plagiarism detection of multithreaded
programs,” in Proc. Int. Conf. Softw. Eng. Companion, pp. 734– 736,
2016.
.[4] Z. Tian, T. Liu, Q. Zheng, M. Fan, E. Zhuang, and Z. Yang, “Exploiting
thread-related system calls for plagiarism detection of multithreaded
programs,” J. Syst. and Softw., vol. 119, pp. 136–148, 2016
[5] K. Chen, et al., “Finding unknown malice in 10 seconds: Mass vetting
for new threats at the Google-Play scale,” in Proc. USENIX Secur.
Symp., , pp. 659–674,2015.

More Related Content

PDF
Social Debt Analytics for Improving the Management of Software Evolution Tasks
PDF
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
PDF
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
PDF
COMPARISON OF MALWARE CLASSIFICATION METHODS USING CONVOLUTIONAL NEURAL NETWO...
PDF
MALWARE DETECTION USING MACHINE LEARNING ALGORITHMS AND REVERSE ENGINEERING O...
PDF
Evaluating SZZ Implementations Through a Developer-informed Oracle (ICSE 2021)
PDF
Integrated Feature Extraction Approach Towards Detection of Polymorphic Malwa...
PDF
Social Debt Analytics for Improving the Management of Software Evolution Tasks
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
COMPARISON OF MALWARE CLASSIFICATION METHODS USING CONVOLUTIONAL NEURAL NETWO...
MALWARE DETECTION USING MACHINE LEARNING ALGORITHMS AND REVERSE ENGINEERING O...
Evaluating SZZ Implementations Through a Developer-informed Oracle (ICSE 2021)
Integrated Feature Extraction Approach Towards Detection of Polymorphic Malwa...

What's hot (20)

PDF
A Comparison Study of Open Source Penetration Testing Tools
PDF
IRJET- Zombie - Venomous File: Analysis using Legitimate Signature for Securi...
PDF
Hybrid Feature Classification Approach for Malicious JavaScript Attack Detect...
PDF
Detecting Windows Operating System’s Ransomware based on Statistical Analysis...
PDF
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
PDF
An automated approach to fix buffer overflows
PDF
Welcome to International Journal of Engineering Research and Development (IJERD)
PDF
Malware1
PDF
Final_Presentation_FlowDroid
PDF
Software Birthmark for Theft Detection of JavaScript Programs: A Survey
PDF
Paper id 22201490
PDF
Classification of Malware based on Data Mining Approach
PDF
Metamorphic Malware Analysis and Detection
PDF
A Comparative Study between Vulnerability Assessment and Penetration Testing
PDF
A feature selection and evaluation scheme for computer virus detection
PDF
Software Engineering Domain Knowledge to Identify Duplicate Bug Reports
PPT
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
PDF
PDF
csmalware_malware
DOC
Analysis of field data on web security vulnerabilities
A Comparison Study of Open Source Penetration Testing Tools
IRJET- Zombie - Venomous File: Analysis using Legitimate Signature for Securi...
Hybrid Feature Classification Approach for Malicious JavaScript Attack Detect...
Detecting Windows Operating System’s Ransomware based on Statistical Analysis...
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
An automated approach to fix buffer overflows
Welcome to International Journal of Engineering Research and Development (IJERD)
Malware1
Final_Presentation_FlowDroid
Software Birthmark for Theft Detection of JavaScript Programs: A Survey
Paper id 22201490
Classification of Malware based on Data Mining Approach
Metamorphic Malware Analysis and Detection
A Comparative Study between Vulnerability Assessment and Penetration Testing
A feature selection and evaluation scheme for computer virus detection
Software Engineering Domain Knowledge to Identify Duplicate Bug Reports
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
csmalware_malware
Analysis of field data on web security vulnerabilities
Ad

Similar to Exploiting Semantics-Based Plagiarism Detection Methods (20)

PDF
Malware detection and pattern classification using NPL
PDF
J034057065
PDF
IRJET - Survey on Malware Detection using Deep Learning Methods
PDF
Malware analysis and detection using reverse Engineering, Available at: www....
PDF
Machine Learning Techniques Used for the Detection and Analysis of Modern Typ...
PDF
A035401010
PDF
System call frequency analysis-based generative adversarial network model for...
PDF
Ijetr012045
DOCX
2014 IEEE DOTNET PARALLEL DISTRIBUTED PROJECT A system-for-denial-of-service-...
DOCX
IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS A system-for-denial-of-service...
DOCX
Machine learning techniques applied to detect cyber attacks on web applications
DOCX
Machine learning techniques applied to detect cyber attacks on web applications
PPTX
spamzombieppt
PDF
A Study on Vulnerability Management
DOCX
rpaper
PDF
A fast static analysis approach to detect exploit code inside network flows
PDF
IRJET- Cross Platform Penetration Testing Suite
PDF
Self Adaptive Automatch Protocol for Batch Identification Mechanism in Wirele...
PDF
Obfuscated computer virus detection using machine learning algorithm
PDF
Obfuscated computer virus detection using machine learning algorithm
Malware detection and pattern classification using NPL
J034057065
IRJET - Survey on Malware Detection using Deep Learning Methods
Malware analysis and detection using reverse Engineering, Available at: www....
Machine Learning Techniques Used for the Detection and Analysis of Modern Typ...
A035401010
System call frequency analysis-based generative adversarial network model for...
Ijetr012045
2014 IEEE DOTNET PARALLEL DISTRIBUTED PROJECT A system-for-denial-of-service-...
IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS A system-for-denial-of-service...
Machine learning techniques applied to detect cyber attacks on web applications
Machine learning techniques applied to detect cyber attacks on web applications
spamzombieppt
A Study on Vulnerability Management
rpaper
A fast static analysis approach to detect exploit code inside network flows
IRJET- Cross Platform Penetration Testing Suite
Self Adaptive Automatch Protocol for Batch Identification Mechanism in Wirele...
Obfuscated computer virus detection using machine learning algorithm
Obfuscated computer virus detection using machine learning algorithm
Ad

More from IJSRED (20)

PDF
IJSRED-V3I6P13
PDF
School Bus Tracking and Security System
PDF
BigBasket encashing the Demonetisation: A big opportunity
PDF
Quantitative and Qualitative Analysis of Plant Leaf Disease
PDF
DC Fast Charger and Battery Management System for Electric Vehicles
PDF
Growth Path Followed by France
PDF
Acquisition System
PDF
Parallelization of Graceful Labeling Using Open MP
PDF
Study of Phenotypic Plasticity of Fruits of Luffa Acutangula Var. Amara
PDF
Understanding Architecture of Internet of Things
PDF
Smart shopping cart
PDF
An Emperical Study of Learning How Soft Skills is Essential for Management St...
PDF
Smart Canteen Management
PDF
Gandhian trusteeship and Economic Ethics
PDF
Impacts of a New Spatial Variable on a Black Hole Metric Solution
PDF
A Study to Assess the Effectiveness of Planned Teaching Programme on Knowledg...
PDF
Inginious Trafalgar Contrivition System
PDF
Farmer's Analytical assistant
PDF
Functions of Forensic Engineering Investigator in India
PDF
Participation Politique Feminine En Competition Électorale Au Congo-Kinshasa....
IJSRED-V3I6P13
School Bus Tracking and Security System
BigBasket encashing the Demonetisation: A big opportunity
Quantitative and Qualitative Analysis of Plant Leaf Disease
DC Fast Charger and Battery Management System for Electric Vehicles
Growth Path Followed by France
Acquisition System
Parallelization of Graceful Labeling Using Open MP
Study of Phenotypic Plasticity of Fruits of Luffa Acutangula Var. Amara
Understanding Architecture of Internet of Things
Smart shopping cart
An Emperical Study of Learning How Soft Skills is Essential for Management St...
Smart Canteen Management
Gandhian trusteeship and Economic Ethics
Impacts of a New Spatial Variable on a Black Hole Metric Solution
A Study to Assess the Effectiveness of Planned Teaching Programme on Knowledg...
Inginious Trafalgar Contrivition System
Farmer's Analytical assistant
Functions of Forensic Engineering Investigator in India
Participation Politique Feminine En Competition Électorale Au Congo-Kinshasa....

Recently uploaded (20)

PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
Road Safety tips for School Kids by a k maurya.pptx
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Internship_Presentation_Final engineering.pptx
PDF
Structs to JSON How Go Powers REST APIs.pdf
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PDF
ETO & MEO Certificate of Competency Questions and Answers
PPTX
Simulation of electric circuit laws using tinkercad.pptx
PPTX
24AI201_AI_Unit_4 (1).pptx Artificial intelligence
PDF
BRKDCN-2613.pdf Cisco AI DC NVIDIA presentation
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
anatomy of limbus and anterior chamber .pptx
PPTX
Practice Questions on recent development part 1.pptx
PPTX
Unit 5 BSP.pptxytrrftyyydfyujfttyczcgvcd
PDF
오픈소스 LLM, vLLM으로 Production까지 (Instruct.KR Summer Meetup, 2025)
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Road Safety tips for School Kids by a k maurya.pptx
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Internship_Presentation_Final engineering.pptx
Structs to JSON How Go Powers REST APIs.pdf
Operating System & Kernel Study Guide-1 - converted.pdf
Strings in CPP - Strings in C++ are sequences of characters used to store and...
ETO & MEO Certificate of Competency Questions and Answers
Simulation of electric circuit laws using tinkercad.pptx
24AI201_AI_Unit_4 (1).pptx Artificial intelligence
BRKDCN-2613.pdf Cisco AI DC NVIDIA presentation
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
anatomy of limbus and anterior chamber .pptx
Practice Questions on recent development part 1.pptx
Unit 5 BSP.pptxytrrftyyydfyujfttyczcgvcd
오픈소스 LLM, vLLM으로 Production까지 (Instruct.KR Summer Meetup, 2025)

Exploiting Semantics-Based Plagiarism Detection Methods

  • 1. International Journal of Scientific Research and Engineering Development-– Volume 2 Issue 6, Nov- Dec 2019 Available at www.ijsred.com ISSN : 2581-7175 ©IJSRED:All Rights are Reserved Page 202 Exploiting Semantics-Based Plagiarism Detection Methods Sajin R Nair1 , Smita C Thomas2 1 P G scholar, Dept. of CSE, Mount Zion College of Engineering, Kadammanitta, Kerala, India Email:[email protected] 2 Research Scholar,Vels University Email: [email protected] ----------------------------------------************************---------------------------------- Abstract: Manycomplex networked systems exhibit natural divisions of network nodes. Each division, or community, is adversely connected subgroup. Such community structure not only helps comprehension but also finds wide applications in complex systems. Moreover, existing proposed applications of software community structure have not been directly compared or combined with existing software engineering practices. Comparison with baseline practices is needed to convince practitioners to adopt the proposed approaches. However today’s vetting mechanisms are slow and less capable of catching new threats author have implemented a set of tools collectively called TOB-PD (TOB based Plagiarism Detection tool) by applying TOB to three existing representative dynamic birthmarks, including SCSSB (System Call Short Sequence Birthmark), DYKIS (DYnamic Key Instruction Sequence birthmark) and JB (an API based birthmark for Java). This experiments conducted on large number of binary programs show that proposedapproach exhibits strong resilience against state-of-the-art semantics-preserving code obfuscation techniques. Comparisons against the three existing tools SCSSB, DYKIS and JB show that the new framework is effective for plagiarism detection of multithreaded programs. The tools, the benchmarks and the experimental results are all publicly available. Keywords —Multithreaded programming, Android malware,Software plagiarism, Birthmark, SCSSB (System Call Short Sequence Birthmark), DYKIS (DYnamic Key Instruction Sequence birthmark), JB (an API based birthmark for JavTOB-PD (TOB based Plagiarism Detection tool). ----------------------------------------************************---------------------------------- I. INTRODUCTION The growth of Android devices brings in a vibrant application ecosystem. Millions of applications (appforshort) have been installed by Android users around the world from various app markets. Prominent examples include Google Play, Amazon Appstore, Samsung Galaxy Apps,and tens of smaller third-party markets. Software plagiarism, ranging from open source code reusing to smartphone app repacking, severely a ect both open source communities and software companies.Despite the tremendous progress in software birthmarking, For example, birthmarks extracted from multiple runs of the same multithreaded programs can bevery different due to the inherent non-determinism of thread scheduling. In this case software birthmarking fails to declare plagiarism even for simply duplicated multithreaded programs.Despite the tremendous progress in software birthmarking, the trend towards multithreaded programming greatly threatens its effectiveness, as the existing approaches remain optimized for sequential RESEARCH ARTICLE OPEN ACCESS
  • 2. International Journal of Scientific Research and Engineering Development-– Volume 2 Issue 6, Nov- Dec 2019 Available at www.ijsred.com ISSN : 2581-7175 ©IJSRED: All Rights are Reserved Page 203 programs. Birthmarks extracted from multiple runs of the same multithreaded programs can be very different due to the inherent non-determinism of thread scheduling. The basic research problem for code similarity measurement techniques is to detect whether a component in one program is similar to a component in another program and quantitatively measure their similarity. A component can be a set of functions or a whole program. Existing methods include clone detection, binary similarity detection, and software plagiarism detection. While these approaches have been proven to be very useful, each of them has its shortcomings.software plagiarism detection technology, a new trend in software development greatly threatens its effectiveness. The trend towards multithreaded programs is creating a gap betauthoren the current software development practice and the software plagiarism detection technology, as the existing dynamic approaches remain optimized for sequential programs. Due to the perturbation caused by non-deterministic thread scheduling, existing birthmark generation and comparison are no longer applicable to modern software with multiple threads. In this paper, compare different technologies or methods that can be used to find out the plagiarized softwares. And also find out best one from the comparison. II. LITERATURE SURVEY K. Chen, et al.[5] propose a new designs of vetting techniques have recently been proposed by there search community for capturing new apps associated with known suspicious behavior, such as dynamic loading of binary code from a remote untrusted authorbsite operations related to component hijacking Intent injection , etc. All these approaches involve a heavyauthoright information-flow analysis and require a set of heuristics that characterize the known threats. Theyoftenneedadynamicanalysisinadditiontothestati cin section performedonappcode and further human interventions to annotate the code or even participate in the analysis. Moreover, emulators that most dynamic analysis tools employ can be detected and evaded by malware. Mass vetting at scale. Based on this simple idea, developed a novel, highly-scalable vetting mechanism for detecting repackaged Android malware on one market or cross markets. Z. Tian, T. Liu, Q. Zheng, M. Fan, E. Zhuang, and Z. Yang, [4] Despite the tremendous progress in software birthmarking, the trend towards multi- threaded programming greatly threatens its e ectiveness, as the existing approaches remain optimized for sequential programs. For example, birthmarks extracted from multiple runs of the same multithreaded programs can be very different due to the inherent non-determinism of thread scheduling. In this case software birthmarking fails to declare plagiarism even for simply duplicated multithreaded programs. In this paper, author introduce a thread-aware dynamic birthmark called TreSB (Thread-related System call Birthmark) that can e ectively detect plagiarism of multithreaded programs. Being extracted by mining behavior characteristics from thread related system calls, TreSB is less susceptible to thread scheduling as these system calls are sources that impose thread scheduling rather than being a ected. In addition, unlike many approachesour approach operates on binary executables rather than source code. The latter is usually unavailable when birthmarking is used to obtain initial evidence of software plagiarism. Author have implemented a prototype based on the PIN (Luk et al., 2005) instrumentation framework, and conducted extensive experiments on an publicly available benchmark1 consisting of 234 versions of 35 di erent multithreaded programs. Our empirical study shows that TreSB and its comparison algorithms are credible in diferentiating independently developedprograms,andresilienttomoststate-of-the- artsemanticspreserving obfuscation techniques implemented in the best commercial and academic tools such as SandMark and DashO . In addition, a comparison of our method against two recently proposed thread-aware birthmarks show that TreSB outperforms both of them with respect to any of the three performance metrics URC, F-Measure and MCC. The work in [2] introduces a thread-aware dynamic birthmark called TreSB (Thread-related
  • 3. International Journal of Scientific Research and Engineering Development-– Volume 2 Issue 6, Nov- Dec 2019 Available at www.ijsred.com ISSN : 2581-7175 ©IJSRED: All Rights are Reserved Page 204 System call Birthmark) that can effectively detect plagiarism of multithreaded programs. Unlike many approaches ,TreSB operates on binary executables rather than source code. The latter is usually unavailable when birthmarking is used to obtain initial evidence of plagiarism. The extensive experiments conducted on a publicly available benchmark consisting of 234 versions of 35 multithreaded programs show good resilience and credibility of TreSB. A comparison of TreSB against two recently proposed thread-aware birthmarks shows that TreSB outperforms both of them. System calls recorded during program execution are a favorable basis for extracting high quality birthmarks. The hypothesis is that modifications of system calls usually leads to incorrect program behavior, and therefore a birthmark generated from system calls can be used to identify stolen programs even after they have been modified. Also previous empirical study shows that the system call based birthmarks are resilient against various obfuscation techniques. Yet as illustrated in , these birthmarks become no longer efficient for multithread programs due to the nondeterminism of thread scheduling. Based on similar principle, author also generate birthmarks from system calls. Yet rather than using all the calls, author extract TreSB just from the thread-related system calls, which are essential to the semantics and correct executions of a multithreaded program. A random or deliberate modification to the calls can result in very subtle errors and therefore they are the least possible code to be changed. More importantly, these calls are the source to impose thread interleaving rather than being affected by the nondeterminism, thus a birthmark extracted from these calls is less susceptible to thread scheduling. Specifically, author treat system calls that accomplish tasks including thread and process management (such as creation, join and termination, capability setting and getting), thread synchronization, signal manipulating, as authorll as thread and process priority setting, as thread-related. L. Luo, J. Ming, D. Wu, P. Liu, and S. Zhu[1]binary-oriented, obfuscationresilient method named CoP. CoP is based on a new concept, longest common subsequence of semantically equivalent basic blocks, which combines rigorous program semantics with longest common subsequence based fuzzy matching. Specifically, author model program semantics at three different levels: basic block, path, and whole program. To model the semantics of a basic block, author adopt the symbolic execution technique to obtain a set of symbolic formulas that represent the input-output relations of the basic block in consideration.Then calculate the percentage of the output variables of the plaintiff block that have a semantically equivalent counterpart in the suspicious block. Author set a threshold for this percentage to allow some noises to be injected into the suspicious block. At the path level, author utilize the Longest Common Subsequence (LCS) algorithm to compare the semantic similarity of two paths, one from the plaintiff and the other from the suspicious ,constructedbased on the LCS dynamic programming algorithm, with basic blocks as the sequence elements. By trying more than one path, author use the path similarity scores from LCS collectively to model program semantic similarity. Notethat LCS is different from the longest common substring. Because LCS allows skipping non- matching nodes, it naturally tolerates noises inserted by obfuscation techniques. This novel combination of rigorous program semantics with longest common subsequence based fuzzy matching results in strong resiliency to obfuscation. Author have developed a prototype of CoP using the above method. Author evaluated CoP with several different experiments to measure its obfuscation resiliency, precision, and scalability.Benchmark programs, ranging from small to large real-world production software, authorre applied with different code obfuscation techniques and semantics- preserving transformations, including different compilers and compiler optimization levels. Author also compared our results with four state-of-the-art detection systems, MOSS, JPLagBdiff and DarunGrim2 where MOSS and JPLag are source code based, and Bdiff and DarunGrim2 are binary code based. Our experimental results show that CoP has stronger resiliency to the latest code obfuscation techniques as authorll as other semantics-preserving transformations; it can be applied to software
  • 4. International Journal of Scientific Research and Engineering Development-– Volume 2 Issue 6, Nov- Dec 2019 Available at www.ijsred.com ISSN : 2581-7175 ©IJSRED: All Rights are Reserved Page 205 plagiarism and algorithm detection, and is effective and practical to analyze real-world software. III. PROPOSED SYSTEM In this paper propose TOB (Thread- Obliviousdynamic Birthmark), a framework that can revive existing dynamic birthmarks such as SCSSB(System Call Short Sequence Birthmark),DYKIS(DYnamic Key Instruction Sequence birthmark), JB(an API based birthmark for Java) to handle multithreaded programs. The program is a test case in the AUTHORT project with slight modifications. Author apply two typical plagiarism detection algorithms on multiple runs of this program. If the existing approaches fail to claimplagiarism on different runs of the same small program, even without any code modifications, the capability of such approaches is in doubt. Dynamic birthmarks usually give quantitative measurement betauthoren 0 and 1 to indicate the similarity betauthoren two runs. A value of 1 indicates identicalness and 0 refers to complete difference. The measurement is given by applying metrics, such as Cosine distance, Jaccard index, Dice coefficient and Containment on the birthmarks obtained by a pair of executions. Tables 1a and 1b show the experimental results of SCSSB and DYKIS, where the column and row headings indicate the number of threads and the evaluation metrics, respectively. When there is only one thread, the program becomes sequential. Without non- deterministic thread scheduling, the executions are deterministic.As expected, the similarity scores are all 1.0, pointing out correctly that the runs are from identical programs. Hoauthorver, as the number of threads increase, the similarity scores quickly deteriorate. A similarity score greater than 1" indicates strong possibility of plagiarism, while a score less than " indicates the opposite. Considering typical value of " is betauthoren 0.15 and 0.35, SCSSB and DYKIS will not claim plagiarism when the number of threads is 6, and startto claim the runs are from different programs when the number of threads becomes 8. To the best of our knowledge, this is the first work that discusses the impact of thread scheduling on birthmark based software plagiarism detection, and proposes a solution to remedy the problem apply the var-gram algorithm in birthmark generation. As far as author know, this is the first time this algorithm is used for such purpose. Our experiments confirm its effectiveness.Implemented a set of tools collectively called TOB-PD (TOB based Plagiarism Detection tool) by integrating the principle of TOB with existing algorithms, including SCSSB, DYKIS and JB . Experiments on 418 versions of 35 different multithreaded programs show that the new tools are highly effective in detecting plagiarism and are resilient to most state-of-the-art semantics- preserving obfuscation techniques implemented in tools such as SandMark ,DashO and UPX. IV. CONCLUSION In this paper, implement a simple method, as multithreaded software become increasingly more popular, current dynamic software plagiarism detection technology geared toward sequential programs are no longer sufficient.Existingapproaches are not only accurate in detecting plagiarism of multithreaded programs but also robust against most state-of-the-art semantics preserving obfuscation techniques. The new birth mark technique can be easy to implement andThe proposed work addresses the challenges of applying dynamic birthmark based approaches for whole program plagiarism detection of multithreaded software. As far as author know, this is the first work that discusses the impact of thread scheduling on birthmark based plagiarism detection, and the first work that propose thread-oblivious birthmarks for solving the problem systemically. REFERENCES [1] ZhenzhouTian , Ting Liu, Member, IEEE, QinghuaZheng,“Reviving Sequential Program Birthmarking for Multithreaded Software Plagiarism Detection,” IEEE Trans. Softw.VOL. 44, NO. 5, pp.491- 511 MAY 2018 . [2] L. Luo, J. Ming, D. Wu, P. Liu, and S. Zhu, “Semantics-based obfuscation-resilient binary code similarity comparison with
  • 5. International Journal of Scientific Research and Engineering Development-– Volume 2 Issue 6, Nov- Dec 2019 Available at www.ijsred.com ISSN : 2581-7175 ©IJSRED: All Rights are Reserved Page 206 applications to software and algorithm plagiarism detection,” IEEE Trans. Softw. Eng., 2017 [3] Z. Tian, T. Liu, Q. Zheng, F. Tong, M. Fan,and Z. Yang, “A new thread-aware birthmark for plagiarism detection of multithreaded programs,” in Proc. Int. Conf. Softw. Eng. Companion, pp. 734– 736, 2016. .[4] Z. Tian, T. Liu, Q. Zheng, M. Fan, E. Zhuang, and Z. Yang, “Exploiting thread-related system calls for plagiarism detection of multithreaded programs,” J. Syst. and Softw., vol. 119, pp. 136–148, 2016 [5] K. Chen, et al., “Finding unknown malice in 10 seconds: Mass vetting for new threats at the Google-Play scale,” in Proc. USENIX Secur. Symp., , pp. 659–674,2015.