SlideShare a Scribd company logo
Evaluating software vulnerabilities using fuzzing
methods
Victor Varza, Laura Gheorghe
Faculty of Automatic Control and Computers
University Politehnica of Bucharest
Bucharest, Romania
victor.varza@cti.pub.ro, laura.gheorghe@cs.pub.ro
Abstract: Fuzzing is a programming testing technique that has gained more interest from the
research security community. The purpose of fuzzing is to find software errors, bugs and
vulnerabilities. The target application is bombarded with random input data generated by another
program and then it is monitored for any malformation of the results. The fuzzers principles have not
changed over time, but the mechanism used has known a significant evolution from a dumb fuzzer to
a modern fuzzer that is able to cover most of the source code of a software application. This paper is
a survey of the concept of fuzzing and presents a method that combines two fuzzing techniques to
improve finding security weakness in a software application.
Key-Words: fuzzing, bugs, testing, security, software vulnerabilities, exploit
1 Introduction
Fuzzing is an automated security testing
method for discovering software
vulnerabilities by providing invalid or
random input and monitoring the system
under test for exceptions, errors or potential
vulnerabilities. Fuzzing was initially used to
find zero day security vulnerabilities in
black-hat community. The main idea is to
generate testing data that can be able to
crash the target application and to monitor
the results. There are also other ways to
discover potential software vulnerabilities
such as source code review, static analysis,
beta testing or to create unit tests. Although
there are alternative ways to find software
vulnerabilities, fuzzing tends to become
widely used because it is less expensive
than manual tests, does not have blind
spots like human testers, it offers portability
and fewer false positives.
Fuzzing efficiency has grown from a simple
input data generator that discovers flaws in
software development to a complex system
that can be able to discover bugs and
security vulnerabilities. For example
between 2007 and 2010, a third of Microsoft
Windows 7 bugs were discovered by the
SAGE fuzzer tool [4].
This paper presents a fuzzing technique
that combine whitebox and blackbox fuzzing
to improve finding security vulnerabilities.
The paper is organized as follows: section 2
describes the proposed architecture,
section 3 describes the path predicates
collector module, section 4 describes the
input data generator and section 5
describes the delivery mechanism. Section
6 presents a simple evaluation of our
method and finally last section presents the
open problems, conclusions and future
work.
2 Project architecture
Whitebox and blackbox fuzzing are two
fuzzing techniques that are different by how
much information do the fuzzers have about
the target application internals. Whitebox
fuzzers use symbolic execution and
constraint solving techniques with complete
knowledge about the System under tests
(SUT). Blackbox fuzzers generate input
data randomly by modifying correct input
and do not require any knowledge or
understanding of the target application
internals.
Whitebox fuzzing uses symbolic execution
that it is capable to discover all possible
unique paths of the target application but it
is limited by the number of test cases and it
is not scalable for large programs where the
number of paths may grow exponentially.
Blackbox fuzzing is faster and the SUT is
bombarded with a widely number of tests
case, therefore it can explore deeper path
of code but it is limited on code coverage.
Figure 1 shows the difference between
whitebox and blackbox fuzzing in terms of
code exploration.
Figure 1. Code exploration for both fuzzing methods
In order to combine whitebox and blackbox
fuzzing techniques in a single fuzzer, we use
four fuzzing components: a path predicates
collector, an input data generator, a delivery
mechanism and a monitoring system. In this
way we use some open source tools for
each phase such as: KLEE, PPL and Zzuf.
Figure 2 shows the relationship between
these four components.
Path predicates collector gathers all paths
as constraints systems from the target
application using symbolic execution. For
this phase we use KLEE tool that it is
capable to generate constraints systems as
queries. KLEE is a symbolic execution tool
built on top of LLVM (Low Level Virtual
Machine) and it is developed by Stanford
University [10]. KLEE generates paths
predicates and test cases with high
coverage of the target application source
code. The queries can be generated in
different language formats such as KQuery
(used by Kleaver constraints solver tool) or
SMT-LIBv2 (used by Satisfiability Modulo
Theories Library). We use KLEE only to
generate queries (constraints systems) in
KQuery language. After the constraints
systems are generated with KLEE, we parse
and transform them in equation systems
with a simple parser written in C++. After
that, the equation systems are sent to the
PPL library to generate input space.
Figure 2. Architecture overview
Input generation is the phase where test
cases are generated using numerical
abstraction and mixed integer linear
programming problem. In this way, we use
Parma Polyedra Library to generate pre-
configured number of test cases that
represents the input space for the SUT. PPL
(Parma Polyedra Lybrary) [11] is a
mathematical library developed by
University of Parma, Italy and it supports
numerical abstractions of domains of
polyhedra, octagonal shapes, bounded
difference shapes and mixed integer linear
programming problems. We use PPL to
generate input space from linear equations
systems obtained from KLEE.
Delivery mechanism and monitoring system
represent the phases where the SUT is
tested and monitored for crashes and
hangs. We use Zzuf tool for this propose
with the mutation ratio 0%. ZZuf [12] is a
mutative fuzzer that corrupts input data and
delivers it to the target application. It also
SYSTEM UNDER
TEST
Path predicates collector
(Symbolic execution)
 KLEE
Input data generator
 PPL
Delivery Mechanism
Monitoring system
 Zzuf
Whitebox
fuzzing
Blackbox
fuzzing
can monitor the SUT to discover security
weaknesses such as segmentation faults or
memory corruption. We use Zzuf tool as
delivery and monitoring mechanism with the
mutation ration 0 for input data.
3 Path predicates collector
To generate unique paths, we use symbolic
execution. The symbolic execution analyses
the program by abstraction interpretation of
the variable and tracking them symbolically
rather than their values. We use KLEE tool
to collect paths of the target program. KLEE
uses the online symbolic execution
technique, which means that the predicates
are flipped for each branch and it not
required to rerun the program in order to
find new paths towards the offline execution
technique such SAGE. SAGE is a tool
developed by Microsoft, where the fuzzer
must execute again the target program
forcing it to use other paths every time
[2][4]. Online symbolic execution may have
limitations in terms of memory usage for
complex programs because all states needs
to be stored “online” in memory.
Because it is a whitebox fuzzer, KLEE uses
symbolic input that can initially be anything
instead of use random or mutating input.
The fuzzer replaces concrete operations
with those that manipulate symbolic values.
When the SUT gains a branch condition
(boolean expression) the instruction pointer
is altered if the condition is true or false. In
this point, KLEE queries the constraint
solver to find if the condition is true or false
along the current path and the instruction
pointer is updated with the correct value.
Otherwise, both paths are possible. In this
case, KLEE clones the current state and it
modifies at once the instruction pointer and
the path condition in order to explore both
paths [10]. The branches may be generated
by unsafe operations that could cause
errors such as division instruction. In this
case, it is generated a branch that checks if
the divisor is 0 (zero). If an error occurs, the
execution continues on the other path (the
false branch), which modifies the divisor not
to be 0.
A query constraint is generated when the
path terminates or a bug is catch in the
execution of the SUT. KLEE can also
generate test cases for each path by solving
the current condition path (query constraint)
with a constraint solver based on STP. The
STP library accepts as input formula as bit-
vectors and arrays. If the query (input) is
satisfactory, the STP generates values that
satisfy the input formula [13]. Since we want
to integrate a blackbox technique in the
input generation process we additionally
use a numerical abstraction of the
constraint system (represented by KLEE
output query) with PPL tool help.
There are also others tools similar with
KLEE such as SAGE (developed by
Microsoft) or Mayhem symbolic executor
(developed by Carnegie Mellon University).
We chose KLEE because it is an open
source, easy to use and well documented.
Because KLEE is built on the top of LLVM,
when we start checking a program we have
to compile our source code to bytecode
using LLVM compiler. After this, KLEE is
used to execute the resulted bytecode.
Figure 3 shows the steps that we follow in
order to test a simple C program with KLEE.
Figure 3. KLEE steps
The source code requires no modification to
the target application. For testing purpose
we use the following source code named
test1.c:
int test(unsigned int t_a, int t_b) {
unsigned int a = t_a; // unsigned byte
int b = t_b; // signed byte
if (a >= 20) {
LLVM-
GCCinpu
fsda
fasd
fdsi
npu
C/C++
application
Bytecode
KLEE
KQuery
files
if (b <= -80) {
printf("PATH 1t");
printf("a=%d, b=%dn", a, b);
return 1;
}
printf("PATH 2t");
printf("a=%d, b=%dn", a, b);
return 2;
}
printf("PATH 3t");
printf("a=%d, b=%dn", a, b);
return -1;
}
int main(int argc, char** argv) {
returntest(atoi(argv[1]),
atoi(argv[2]));
}
To compile the source code with LLVM-
GCC tool we have to type:
llvm-gcc --emit-llvm -c –g test1.c
The LLVM compiler will compile the test1.c
to bytecode named test1.o. After this we
can run KLEE on the generated bytecode:
klee --write-pc test1.o
After running KLEE, the application
generates test cases for each unique path
of the target program. KLEE can also output
some statistic information and the constraint
systems of each path condition. We use the
option --write-pc to generate constraints
systems as queries in KQuery language.
For test1.c, the fuzzer will find three unique
paths and three KQuery files will be
generated. For example the path
corresponding to “PATH 2” will have the
KQuery file as following:
array b[4] : w32 -> w8 = symbolic
array a[4] : w32 -> w8 = symbolic
(query [(Ult 19
(ReadLSB w32 0 a))
(Eq false
(Sle (ReadLSB w32 0 b)
4294967216))]
false)
KQuery is a representation language for
constraint expressions. KQuery is the
language used by Kleaver tool, a constraint
solver tool integrated in KLEE. KQuery
represents formulas as bitvectors and
arrays, and supports all standard operations
on bitvectors [10].The language is very easy
to read and write and accepts two kinds of
declarations: array declaration and query
commands.
Array declarations are used to declare
symbolic variables as arrays of bitvectors.
The syntax for declare array is:
"array" name "[" [ size ] "]" ":" domain
"->" range "=" array-initializer [10]
The parameter “name” represents the name
for the variable, in our case, for the test1.c
query, we have the names of the two
variables a and b. Because a and b are two
integer variables, their size is 4 and the
domain is w32 (32 bits width). Each array
can be initialized as symbolic or as constant
values.
Query command represents the command
that the constraint solver have to execute it.
A query is composed by constraints,
expressions and, if the query expression is
invalid, the query may contains values
computed by expressions and arrays [10].
The syntax of a query command is:
"(" "query" constraint-list query-
expression [ eval-expr-list [ eval-
array-list ] ] ")" [10]
A query command starts with the word
“query” and consists of a constraints list and
a query expression. To match the “PATH 2”
of test1.c we have a query composed by
two constraint list for each variable. A query
expression may contain arithmetic
operations (Add, Sub, Mul, UDiv, SDiv,
URem, SRem), bitwise operations (And, Or,
Xor, Shl, LShr, AShr), comparisons (Eq, Ne,
Ule,Ult, Ugt, Uge, Slt, Sgt, Sle), bitvector
manipulation (Concat, Extract, ZExt, SExt),
special expressions (Read, Select), macro
expressions (Neg, ReadLSB, ReadMSB)
[10].
For our example we have in the first
query constraint the command “ReadLSB
w32 0 a”, which means that we have to read
less sign bits from w32 to w0 bit. The
expression Ult (unsigned less than)
suggests that 19 is less than the value read
from preview expression. The same
reasoning is used for the second constraint
applied to the variable b: first we have to
read LSB for the symbolic variable b, solve
Slt (signed less than). The expression “Eq
false” means that the result of the previews
equation will be negate: b < -79  b >= -79.
4 Input data generator
After generating path predicates through
symbolic execution we obtained constraints
for our system. We can generate input data
by solving these constraints using a
constraint solver or using a numerical
abstraction method based on convex
polyedra created by inequalities. In this
way we can used Parma Polyedra Library.
The PPL can only handle linear inequalities
with first order symbolic variable. For
example a_symb + b_symb < c is a linear
formula whereas a_symb * b_symb < c is
not. In section 3, we saw that the KLEE
generates a constraint system of queries in
KQuery language. While the PPL can
handle linear inequalities we have to
transform those queries in linear inequalities
systems. Figure 4 shows the steps that we
follow in order to generate the input space:
Figure 4. Input data generation steps
To transform queries generated by KLEE in
linear inequalities systems we wrote a
simple C++ parser tool that has as input a
KQuery file format and outputs a system of
linear inequalities. Having the following
query constraint:
query [(Ult 19 (ReadLSB w32 0 a))
(Eq false (Sle (ReadLSB w32 0 b)
4294967216))
]
The transformation will be:
19 < a
b > -79
To solve the linear inequalities we use
numerical abstraction and convex polyedra.
The polyedra represents the solution of
linear inequalities defined as (1) and (2).
(1)
(2)
The result is the intersection of m halfspace
with normal vectors and represents a
polytope object (3).
( ) (3)
Figure 5. Convex polyhedron
Parma Polyhedra Library can support
numerical abstraction for domains of
polyedra, octagonal shaped, bounded
difference shaped and mixed integer linear
programming problems. Because PPL can
only manipulate polytope objects defined as
a1
a2
a3
a4
a5
INPUT
SPAC
E
Parserinpu
fsda
fasd
fdsi
npu
KQuery
constraints
Linear
inequations
PPL
Input space
non strict inequalities we have to transform
all inequalities to be non strict [11].
To find the input space for a target program
we have to find the maximum and minimum
value of each variable. In this way, we
compose the linear non strict inequalities
system and then to solve the system using
mixed integer linear programming problem,
which is included in PPL.
The mixed integer linear programming
problem is a mathematical method, which
can find the best outcome (maxim or minim)
in a mathematical model described as a
linear relationship. It represents a technique
to optimize the linear objective function
defined as linear equality and linear
inequality constraints. The feasible region
represents a convex polyhedron, which is a
set of intersection of halfspaces with normal
vectors defined by a linear inequality. The
mixed integer linear programming algorithm
can find the minim or the maxim value of the
objective function in the polyhedron, if such
a point exists.
The canonical form of the mixed integer
linear programming problem (MILP or MIP)
is defined in (4), (5) and (6).
(4)
(5)
(6)
For example, we use the constraints
system defined in (7), (8), (9), (10) and (11)
(7)
(8)
(9)
(10)
(11)
The feasible region will be represented by
the white region in the graph described in
the Figure 6.
Figure 6. Feasible region
To generate random points from inside of
the polyhedronwe use the linear
programming method described in [11] for
each variable (dimension). PPL provides
methods to maximize and minimize a given
linear objective function. The objective
function consists of a single variable (we use
an objective function for each direction).
First we have to find lower and upper bound
for each variable, which represents the
boundaries of the feasible region.
For our test1.c source code example we
obtained the constraint system described in
(12), (13), (14) and (15).
(12)
(13)
(14)
(15)
The feasible region is described in Figure 7.
Figure 7. Feasible region
The input space will be random generated
from the feasible region by providing the
upper and lower bound obtained after
solving the constraint system of inequalities
with the PPL. The results are described in
(16) and (17).
{ (16)
{ (17)
If we have complex systems of constraints it
is mandatory to check if the values that are
random generated are in the feasible region
or not. This can be done using PPL by
checking if the points are in the interior of
the polyhedron.
Figure 8. Values checking (the red points are ok while
the yellow ones are not)
5 Delivery and monitoring mechanism
Random values generated with PPL are
delivered to the target application using
Zzuf fuzzer. The input data generator
module creates for each path predicate a
file with .zz extension that contains random
values generated using PPL. The Zzuf tool
reads each of these file and sent it to the
SUT.
Zzuf is a mutation based fuzzer generator,
which randomly alters the target program
correct input. Because we use this tool only
as a delivery mechanism and monitoring,
system the ratio of the mutation will be set
to 0%.
As a monitoring system Zzuf can monitor
the target program for security weakness
such as segmentation faults or memory
corruption. The command, which we used
to deliver the test cases and monitor the
SUT is:
zzuf -r 0 $gcc_app_path/$app_name $line
The parameter -r represents the ratio set to
be 0, $gcc_app_path/$app_name
represents the path to the application that is
tested and the parameter $line represents a
line read from the file with extension zz
which contains inputs used be the SUT.
4 Evaluation
In this section, we evaluate our method
through a simple example program (test1.c).
For future work we purposed to use real
application (such as coreutils) that KLEE is
capable to handle. In the evaluation of our
method we are focused on the code
coverage and input generation.
Code coverage represents a standard
metric in software analysis because it shows
how much of the code is used in the testing
process. If we have higher code coverage
ration the probability of finding bugs
increases. We use klee-stat command to
measure the code coverage, which
represents the number of lines of code
executed over total number of lines of the
source code of the target application. Table
1 shows the statistics for our test1.c
example.
Instrs Time(s) ICov(%) BCov(%) ICount TSolver(%)
45 0.44 75.00 100.00 60 52.76
Table 1. Statistics of symbolic execution
The value of Instrs represents the number
of executed instructions, Time represents
the total time, ICov represents the
percentage of LLVM instructions that were
covered, BCovrepresents the percentage of
branches that were covered, ICount
represents the total static instructions in the
LLVM bitcode and TSolver represents the
time spent in the constraint solver (the
internal KLEE constraint solver).
Input generation is another important
measure in software analysis because, if we
have a large number of test cases, the
probability to finding bugs is high. Symbolic
execution does not generate a widely
number of test cases because, depending
on constraint solver, a test case is
generated for a given path. Because we use
a polytope as input space, the number of
test cases that we can generate is very
large. For example for our test1.c,if we want
to go through the “PATH 2”,the variable a
may be between values 20 and INT_MAX
and variable b is between -80 and
INT_MAX. On a 32 bit Linux-based
operating system it means that an signed
integer is represented on 32 bit and
INT_MAX will have the value 2147483647.
In this case the number of test cases for the
“PATH 2” may be (2147483627 x
2147483727), which means a very large
input space.
4 Conclusion
This report presents a method of automatic
finding software vulnerabilities using both
whitebox and blackbox fuzzing methods.
Traditional fuzzers are called blackbox,
while modern fuzzers are called whitebox.
But, in practice which is the best: whitebox
or blackbox fuzzing? Blackbox is simple,
fast, easy to use, lightweight, but has issues
regarding to code coverage. Whitebox is
smarter but also more complex, and may
not be easily used.
We show that we can combine whitebox
and blackbox fuzzing to improve code
coverage and input data generation. The
technique presented in this paper shows
that efficient input generation is possible
and this cannot be done if we use only
blackbox or whitebox fuzzing.
Next step in the development of this
solution is to evaluate real application, such
as coreutils, and to compare blackbox and
whitebox fuzzers with our solution.
References:
[1] Richard McNally, Ken Yiu, Duncan
Grove and Damien Gerhardy, “Fuzzing: The
State of the Art”, Australian Command,
Control, Communications and Intelligence
Division Defence Science and Technology
Organisation, February 2012,
http://www.dtic.mil/cgi-
bin/GetTRDoc?AD=ADA558209.
[2] Brian S. Pak, “Hybrid Fuzz Testing:
Discovering Software Bugs via Fuzzing and
Symbolic Execution”, School of Computer
Science Carnegie Mellon University
Pittsburgh, Master thesis, May 2012.
[3] Sofia Bekrar, Chaouki Bekrar, Roland
Groz, Laurent Mounier, “Finding Software
Vulnerabilities by Smart Fuzzing”, Software
Testing, Verification and Validation (ICST),
2011 IEEE Fourth International Conference,
March 2011, pages 427- 430.
[4] Patrice Godefroid, Michael Y. Levin, and
David Molnar, “SAGE: Whitebox fuzzing for
security testing”, communications of the
ACM, vol. 55, no. 3, March 2012.
[5] James C. King. “Symbolic execution and
program testing”, Commun. ACM 19(7),
July 1976, pages 385–394.
[6] Tsankov P., Dashti, M.T., Basin, D.,
“SECFUZZ: Fuzz-testing security
protocols”, Automation of Software Test
(AST), 2012 7th International Workshop,
June 2012, pages 1-7.
[7] HyoungChun Kim, YoungHan Choi,
DoHoon Lee, Donghoon Lee, “Advanced
Communication Technology, 2008. ICACT
2008. 10th International Conference”, Feb.
2008, paegs 1304 – 1307.
[8] Adrian Furtuna, “Proactive cyber security
by red teaming”, PhD thesis, Military
Technical Academy, 2011.
[9] Marjan Aslani, Nga Chung, Jason
Doherty, Nichole Stockman, and William
Quach, “Comparison of Blackbox and
Whitebox Fuzzers in Finding Software
Bugs”, Summer Undergraduate Program in
Engineering Research at Berkeley, Talk or
presentation, 2008, URL
www.truststc.org/pubs/493.html
[10] Cristian Cadar, Daniel Dunbar, Dawson
Engler, “Unassisted and Automatic
Generation of High-Coverage Tests for
Complex Systems Programs”, OSDI'08
Proceedings of the 8th USENIX conference
on Operating systems design and
implementation, 2008, Pages 209-224,
URL: http://klee.llvm.org.
[11] Roberto Bagnara, Patricia M. Hill, Enea
Zaffanella, “The Parma Polyhedra Library:
Toward a Complete Set of Numerical
Abstractions for the Analysis and
Verification of Hardware and Software
Systems”, Journal Science of Computer
Programming, Volume 72 Issue 1-2, June,
2008, pages 3-21.
[12] Caca Labs, “Zzuf multi-purpose fuzzer”,
January, 2010, URL
http://caca.zoy.org/wiki/zzuf
[13] Vijay Ganesh and David L. Dill., “A
Decision Procedure for Bit-Vectors and
Arrays”, Proceedings of Computer Aided
Verification, Berlin, Germany, July 2007.
Ad

Recommended

Validation and Verification of SYSML Activity Diagrams Using HOARE Logic
Validation and Verification of SYSML Activity Diagrams Using HOARE Logic
ijseajournal
 
50120140502017
50120140502017
IAEME Publication
 
Pointcut rejuvenation
Pointcut rejuvenation
Ravi Theja
 
Specification-based Verification of Incomplete Programs
Specification-based Verification of Incomplete Programs
IDES Editor
 
Fuzz
Fuzz
Sunny Summer
 
Model Based Software Timing Analysis Using Sequence Diagram for Commercial Ap...
Model Based Software Timing Analysis Using Sequence Diagram for Commercial Ap...
iosrjce
 
A Search-based Testing Approach for XML Injection Vulnerabilities in Web Appl...
A Search-based Testing Approach for XML Injection Vulnerabilities in Web Appl...
Lionel Briand
 
Automated server-side model for recognition of security vulnerabilities in sc...
Automated server-side model for recognition of security vulnerabilities in sc...
IJECEIAES
 
FAULT MODELING OF COMBINATIONAL AND SEQUENTIAL CIRCUITS AT REGISTER TRANSFER ...
FAULT MODELING OF COMBINATIONAL AND SEQUENTIAL CIRCUITS AT REGISTER TRANSFER ...
VLSICS Design
 
Finding latent code errors via machine learning over program ...
Finding latent code errors via machine learning over program ...
butest
 
Code coverage based test case selection and prioritization
Code coverage based test case selection and prioritization
ijseajournal
 
A WHITE BOX TESTING TECHNIQUE IN SOFTWARE TESTING : BASIS PATH TESTING
A WHITE BOX TESTING TECHNIQUE IN SOFTWARE TESTING : BASIS PATH TESTING
Journal For Research
 
Crosscutting Specification Interference Detection at Aspect Oriented UML-Base...
Crosscutting Specification Interference Detection at Aspect Oriented UML-Base...
IJERA Editor
 
Building a new CTL model checker using Web Services
Building a new CTL model checker using Web Services
infopapers
 
Design and Implementation of Automated Visualization for Input/Output for Pro...
Design and Implementation of Automated Visualization for Input/Output for Pro...
ijseajournal
 
150104 3 methods for-binary_analysis_and_valgrind
150104 3 methods for-binary_analysis_and_valgrind
Raghu Palakodety
 
TBar: Revisiting Template-based Automated Program Repair
TBar: Revisiting Template-based Automated Program Repair
Dongsun Kim
 
Method-Level Code Clone Modification using Refactoring Techniques for Clone M...
Method-Level Code Clone Modification using Refactoring Techniques for Clone M...
acijjournal
 
Model-based GUI testing using Uppaal
Model-based GUI testing using Uppaal
Ulrik Hørlyk Hjort
 
A Novel Approach for Code Clone Detection Using Hybrid Technique
A Novel Approach for Code Clone Detection Using Hybrid Technique
INFOGAIN PUBLICATION
 
Opal Hermes - towards representative benchmarks
Opal Hermes - towards representative benchmarks
MichaelEichberg1
 
Lab manual object oriented technology (it 303 rgpv) (usefulsearch.org) (usef...
Lab manual object oriented technology (it 303 rgpv) (usefulsearch.org) (usef...
Make Mannan
 
DHSSTTSL11192.HSI Process (1)
DHSSTTSL11192.HSI Process (1)
John Chin
 
OORPT Dynamic Analysis
OORPT Dynamic Analysis
lienhard
 
Handout#03
Handout#03
Sunita Milind Dol
 
Bench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization
Bench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization
Dongsun Kim
 
4.analisisleksikalfix
4.analisisleksikalfix
yuster92
 
โครงการคัดกรองโรคทางหลอดเลือดและสุขภาพจิต ปี ๒๕๕๘
โครงการคัดกรองโรคทางหลอดเลือดและสุขภาพจิต ปี ๒๕๕๘
Fah Chimchaiyaphum
 
8. ptk optimasi kode
8. ptk optimasi kode
yuster92
 
Evaluating software vulnerabilities using fuzzing methods
Evaluating software vulnerabilities using fuzzing methods
Victor Ionel
 

More Related Content

What's hot (18)

FAULT MODELING OF COMBINATIONAL AND SEQUENTIAL CIRCUITS AT REGISTER TRANSFER ...
FAULT MODELING OF COMBINATIONAL AND SEQUENTIAL CIRCUITS AT REGISTER TRANSFER ...
VLSICS Design
 
Finding latent code errors via machine learning over program ...
Finding latent code errors via machine learning over program ...
butest
 
Code coverage based test case selection and prioritization
Code coverage based test case selection and prioritization
ijseajournal
 
A WHITE BOX TESTING TECHNIQUE IN SOFTWARE TESTING : BASIS PATH TESTING
A WHITE BOX TESTING TECHNIQUE IN SOFTWARE TESTING : BASIS PATH TESTING
Journal For Research
 
Crosscutting Specification Interference Detection at Aspect Oriented UML-Base...
Crosscutting Specification Interference Detection at Aspect Oriented UML-Base...
IJERA Editor
 
Building a new CTL model checker using Web Services
Building a new CTL model checker using Web Services
infopapers
 
Design and Implementation of Automated Visualization for Input/Output for Pro...
Design and Implementation of Automated Visualization for Input/Output for Pro...
ijseajournal
 
150104 3 methods for-binary_analysis_and_valgrind
150104 3 methods for-binary_analysis_and_valgrind
Raghu Palakodety
 
TBar: Revisiting Template-based Automated Program Repair
TBar: Revisiting Template-based Automated Program Repair
Dongsun Kim
 
Method-Level Code Clone Modification using Refactoring Techniques for Clone M...
Method-Level Code Clone Modification using Refactoring Techniques for Clone M...
acijjournal
 
Model-based GUI testing using Uppaal
Model-based GUI testing using Uppaal
Ulrik Hørlyk Hjort
 
A Novel Approach for Code Clone Detection Using Hybrid Technique
A Novel Approach for Code Clone Detection Using Hybrid Technique
INFOGAIN PUBLICATION
 
Opal Hermes - towards representative benchmarks
Opal Hermes - towards representative benchmarks
MichaelEichberg1
 
Lab manual object oriented technology (it 303 rgpv) (usefulsearch.org) (usef...
Lab manual object oriented technology (it 303 rgpv) (usefulsearch.org) (usef...
Make Mannan
 
DHSSTTSL11192.HSI Process (1)
DHSSTTSL11192.HSI Process (1)
John Chin
 
OORPT Dynamic Analysis
OORPT Dynamic Analysis
lienhard
 
Handout#03
Handout#03
Sunita Milind Dol
 
Bench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization
Bench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization
Dongsun Kim
 
FAULT MODELING OF COMBINATIONAL AND SEQUENTIAL CIRCUITS AT REGISTER TRANSFER ...
FAULT MODELING OF COMBINATIONAL AND SEQUENTIAL CIRCUITS AT REGISTER TRANSFER ...
VLSICS Design
 
Finding latent code errors via machine learning over program ...
Finding latent code errors via machine learning over program ...
butest
 
Code coverage based test case selection and prioritization
Code coverage based test case selection and prioritization
ijseajournal
 
A WHITE BOX TESTING TECHNIQUE IN SOFTWARE TESTING : BASIS PATH TESTING
A WHITE BOX TESTING TECHNIQUE IN SOFTWARE TESTING : BASIS PATH TESTING
Journal For Research
 
Crosscutting Specification Interference Detection at Aspect Oriented UML-Base...
Crosscutting Specification Interference Detection at Aspect Oriented UML-Base...
IJERA Editor
 
Building a new CTL model checker using Web Services
Building a new CTL model checker using Web Services
infopapers
 
Design and Implementation of Automated Visualization for Input/Output for Pro...
Design and Implementation of Automated Visualization for Input/Output for Pro...
ijseajournal
 
150104 3 methods for-binary_analysis_and_valgrind
150104 3 methods for-binary_analysis_and_valgrind
Raghu Palakodety
 
TBar: Revisiting Template-based Automated Program Repair
TBar: Revisiting Template-based Automated Program Repair
Dongsun Kim
 
Method-Level Code Clone Modification using Refactoring Techniques for Clone M...
Method-Level Code Clone Modification using Refactoring Techniques for Clone M...
acijjournal
 
Model-based GUI testing using Uppaal
Model-based GUI testing using Uppaal
Ulrik Hørlyk Hjort
 
A Novel Approach for Code Clone Detection Using Hybrid Technique
A Novel Approach for Code Clone Detection Using Hybrid Technique
INFOGAIN PUBLICATION
 
Opal Hermes - towards representative benchmarks
Opal Hermes - towards representative benchmarks
MichaelEichberg1
 
Lab manual object oriented technology (it 303 rgpv) (usefulsearch.org) (usef...
Lab manual object oriented technology (it 303 rgpv) (usefulsearch.org) (usef...
Make Mannan
 
DHSSTTSL11192.HSI Process (1)
DHSSTTSL11192.HSI Process (1)
John Chin
 
OORPT Dynamic Analysis
OORPT Dynamic Analysis
lienhard
 
Bench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization
Bench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization
Dongsun Kim
 

Viewers also liked (6)

4.analisisleksikalfix
4.analisisleksikalfix
yuster92
 
โครงการคัดกรองโรคทางหลอดเลือดและสุขภาพจิต ปี ๒๕๕๘
โครงการคัดกรองโรคทางหลอดเลือดและสุขภาพจิต ปี ๒๕๕๘
Fah Chimchaiyaphum
 
8. ptk optimasi kode
8. ptk optimasi kode
yuster92
 
Evaluating software vulnerabilities using fuzzing methods
Evaluating software vulnerabilities using fuzzing methods
Victor Ionel
 
0. kontrak kuliah
0. kontrak kuliah
yuster92
 
4.analisisleksikalfix
4.analisisleksikalfix
yuster92
 
โครงการคัดกรองโรคทางหลอดเลือดและสุขภาพจิต ปี ๒๕๕๘
โครงการคัดกรองโรคทางหลอดเลือดและสุขภาพจิต ปี ๒๕๕๘
Fah Chimchaiyaphum
 
8. ptk optimasi kode
8. ptk optimasi kode
yuster92
 
Evaluating software vulnerabilities using fuzzing methods
Evaluating software vulnerabilities using fuzzing methods
Victor Ionel
 
0. kontrak kuliah
0. kontrak kuliah
yuster92
 
Ad

Similar to Evaluating software vulnerabilities using fuzzing methods (20)

Symbexecsearch
Symbexecsearch
Abhik Roychoudhury
 
NSC #2 - D2 06 - Richard Johnson - SAGEly Advice
NSC #2 - D2 06 - Richard Johnson - SAGEly Advice
NoSuchCon
 
Binary Analysis - Luxembourg
Binary Analysis - Luxembourg
Abhik Roychoudhury
 
Dagstuhl2021
Dagstuhl2021
Abhik Roychoudhury
 
Blaze Information Security: Slaying bugs and improving software security thro...
Blaze Information Security: Slaying bugs and improving software security thro...
Blaze Information Security
 
Fuzzing.pptx
Fuzzing.pptx
Abhik Roychoudhury
 
Fuzzing: The New Unit Testing
Fuzzing: The New Unit Testing
Dmitry Vyukov
 
0-knowledge fuzzing white paper
0-knowledge fuzzing white paper
zynamics GmbH
 
0-knowledge fuzzing white paper
0-knowledge fuzzing white paper
Vincenzo Iozzo
 
Fuzzing underestimated method of finding hidden bugs
Fuzzing underestimated method of finding hidden bugs
Pawel Rzepa
 
IFIP2023-Abhik.pptx
IFIP2023-Abhik.pptx
Abhik Roychoudhury
 
Seii unit6 software-testing-techniques
Seii unit6 software-testing-techniques
Ahmad sohail Kakar
 
Understanding Key Concepts and Applications in Week 11: A Comprehensive Overv...
Understanding Key Concepts and Applications in Week 11: A Comprehensive Overv...
bahay78365
 
Using Grammar Extracted from Sample Inputs to Generate Effective Fuzzing Files
Using Grammar Extracted from Sample Inputs to Generate Effective Fuzzing Files
CSCJournals
 
FUZZING & SOFTWARE SECURITY TESTING
FUZZING & SOFTWARE SECURITY TESTING
MuH4f1Z
 
New software testing-techniques
New software testing-techniques
Fincy V.J
 
CS8494 SOFTWARE ENGINEERING Unit-4
CS8494 SOFTWARE ENGINEERING Unit-4
SIMONTHOMAS S
 
Newsoftware testing-techniques-141114004511-conversion-gate01
Newsoftware testing-techniques-141114004511-conversion-gate01
Mr. Jhon
 
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Codemotion
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
abh.arya
 
NSC #2 - D2 06 - Richard Johnson - SAGEly Advice
NSC #2 - D2 06 - Richard Johnson - SAGEly Advice
NoSuchCon
 
Blaze Information Security: Slaying bugs and improving software security thro...
Blaze Information Security: Slaying bugs and improving software security thro...
Blaze Information Security
 
Fuzzing: The New Unit Testing
Fuzzing: The New Unit Testing
Dmitry Vyukov
 
0-knowledge fuzzing white paper
0-knowledge fuzzing white paper
zynamics GmbH
 
0-knowledge fuzzing white paper
0-knowledge fuzzing white paper
Vincenzo Iozzo
 
Fuzzing underestimated method of finding hidden bugs
Fuzzing underestimated method of finding hidden bugs
Pawel Rzepa
 
Seii unit6 software-testing-techniques
Seii unit6 software-testing-techniques
Ahmad sohail Kakar
 
Understanding Key Concepts and Applications in Week 11: A Comprehensive Overv...
Understanding Key Concepts and Applications in Week 11: A Comprehensive Overv...
bahay78365
 
Using Grammar Extracted from Sample Inputs to Generate Effective Fuzzing Files
Using Grammar Extracted from Sample Inputs to Generate Effective Fuzzing Files
CSCJournals
 
FUZZING & SOFTWARE SECURITY TESTING
FUZZING & SOFTWARE SECURITY TESTING
MuH4f1Z
 
New software testing-techniques
New software testing-techniques
Fincy V.J
 
CS8494 SOFTWARE ENGINEERING Unit-4
CS8494 SOFTWARE ENGINEERING Unit-4
SIMONTHOMAS S
 
Newsoftware testing-techniques-141114004511-conversion-gate01
Newsoftware testing-techniques-141114004511-conversion-gate01
Mr. Jhon
 
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Codemotion
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
abh.arya
 
Ad

Recently uploaded (20)

Python Conference Singapore - 19 Jun 2025
Python Conference Singapore - 19 Jun 2025
ninefyi
 
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
Fwdays
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
AI VIDEO MAGAZINE - June 2025 - r/aivideo
AI VIDEO MAGAZINE - June 2025 - r/aivideo
1pcity Studios, Inc
 
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Safe Software
 
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
 
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
 
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
 
Securing AI - There Is No Try, Only Do!.pdf
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
 
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu
 
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
 
Mastering AI Workflows with FME by Mark Döring
Mastering AI Workflows with FME by Mark Döring
Safe Software
 
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Nilesh Gule
 
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Josef Weingand
 
Python Conference Singapore - 19 Jun 2025
Python Conference Singapore - 19 Jun 2025
ninefyi
 
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
Fwdays
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
AI VIDEO MAGAZINE - June 2025 - r/aivideo
AI VIDEO MAGAZINE - June 2025 - r/aivideo
1pcity Studios, Inc
 
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Safe Software
 
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
 
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
 
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
 
Securing AI - There Is No Try, Only Do!.pdf
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
 
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu
 
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
 
Mastering AI Workflows with FME by Mark Döring
Mastering AI Workflows with FME by Mark Döring
Safe Software
 
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Nilesh Gule
 
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Josef Weingand
 

Evaluating software vulnerabilities using fuzzing methods

  • 1. Evaluating software vulnerabilities using fuzzing methods Victor Varza, Laura Gheorghe Faculty of Automatic Control and Computers University Politehnica of Bucharest Bucharest, Romania [email protected], [email protected] Abstract: Fuzzing is a programming testing technique that has gained more interest from the research security community. The purpose of fuzzing is to find software errors, bugs and vulnerabilities. The target application is bombarded with random input data generated by another program and then it is monitored for any malformation of the results. The fuzzers principles have not changed over time, but the mechanism used has known a significant evolution from a dumb fuzzer to a modern fuzzer that is able to cover most of the source code of a software application. This paper is a survey of the concept of fuzzing and presents a method that combines two fuzzing techniques to improve finding security weakness in a software application. Key-Words: fuzzing, bugs, testing, security, software vulnerabilities, exploit 1 Introduction Fuzzing is an automated security testing method for discovering software vulnerabilities by providing invalid or random input and monitoring the system under test for exceptions, errors or potential vulnerabilities. Fuzzing was initially used to find zero day security vulnerabilities in black-hat community. The main idea is to generate testing data that can be able to crash the target application and to monitor the results. There are also other ways to discover potential software vulnerabilities such as source code review, static analysis, beta testing or to create unit tests. Although there are alternative ways to find software vulnerabilities, fuzzing tends to become widely used because it is less expensive than manual tests, does not have blind spots like human testers, it offers portability and fewer false positives. Fuzzing efficiency has grown from a simple input data generator that discovers flaws in software development to a complex system that can be able to discover bugs and security vulnerabilities. For example between 2007 and 2010, a third of Microsoft Windows 7 bugs were discovered by the SAGE fuzzer tool [4]. This paper presents a fuzzing technique that combine whitebox and blackbox fuzzing to improve finding security vulnerabilities. The paper is organized as follows: section 2 describes the proposed architecture, section 3 describes the path predicates collector module, section 4 describes the input data generator and section 5 describes the delivery mechanism. Section 6 presents a simple evaluation of our method and finally last section presents the open problems, conclusions and future work. 2 Project architecture Whitebox and blackbox fuzzing are two fuzzing techniques that are different by how much information do the fuzzers have about the target application internals. Whitebox fuzzers use symbolic execution and constraint solving techniques with complete
  • 2. knowledge about the System under tests (SUT). Blackbox fuzzers generate input data randomly by modifying correct input and do not require any knowledge or understanding of the target application internals. Whitebox fuzzing uses symbolic execution that it is capable to discover all possible unique paths of the target application but it is limited by the number of test cases and it is not scalable for large programs where the number of paths may grow exponentially. Blackbox fuzzing is faster and the SUT is bombarded with a widely number of tests case, therefore it can explore deeper path of code but it is limited on code coverage. Figure 1 shows the difference between whitebox and blackbox fuzzing in terms of code exploration. Figure 1. Code exploration for both fuzzing methods In order to combine whitebox and blackbox fuzzing techniques in a single fuzzer, we use four fuzzing components: a path predicates collector, an input data generator, a delivery mechanism and a monitoring system. In this way we use some open source tools for each phase such as: KLEE, PPL and Zzuf. Figure 2 shows the relationship between these four components. Path predicates collector gathers all paths as constraints systems from the target application using symbolic execution. For this phase we use KLEE tool that it is capable to generate constraints systems as queries. KLEE is a symbolic execution tool built on top of LLVM (Low Level Virtual Machine) and it is developed by Stanford University [10]. KLEE generates paths predicates and test cases with high coverage of the target application source code. The queries can be generated in different language formats such as KQuery (used by Kleaver constraints solver tool) or SMT-LIBv2 (used by Satisfiability Modulo Theories Library). We use KLEE only to generate queries (constraints systems) in KQuery language. After the constraints systems are generated with KLEE, we parse and transform them in equation systems with a simple parser written in C++. After that, the equation systems are sent to the PPL library to generate input space. Figure 2. Architecture overview Input generation is the phase where test cases are generated using numerical abstraction and mixed integer linear programming problem. In this way, we use Parma Polyedra Library to generate pre- configured number of test cases that represents the input space for the SUT. PPL (Parma Polyedra Lybrary) [11] is a mathematical library developed by University of Parma, Italy and it supports numerical abstractions of domains of polyhedra, octagonal shapes, bounded difference shapes and mixed integer linear programming problems. We use PPL to generate input space from linear equations systems obtained from KLEE. Delivery mechanism and monitoring system represent the phases where the SUT is tested and monitored for crashes and hangs. We use Zzuf tool for this propose with the mutation ratio 0%. ZZuf [12] is a mutative fuzzer that corrupts input data and delivers it to the target application. It also SYSTEM UNDER TEST Path predicates collector (Symbolic execution)  KLEE Input data generator  PPL Delivery Mechanism Monitoring system  Zzuf Whitebox fuzzing Blackbox fuzzing
  • 3. can monitor the SUT to discover security weaknesses such as segmentation faults or memory corruption. We use Zzuf tool as delivery and monitoring mechanism with the mutation ration 0 for input data. 3 Path predicates collector To generate unique paths, we use symbolic execution. The symbolic execution analyses the program by abstraction interpretation of the variable and tracking them symbolically rather than their values. We use KLEE tool to collect paths of the target program. KLEE uses the online symbolic execution technique, which means that the predicates are flipped for each branch and it not required to rerun the program in order to find new paths towards the offline execution technique such SAGE. SAGE is a tool developed by Microsoft, where the fuzzer must execute again the target program forcing it to use other paths every time [2][4]. Online symbolic execution may have limitations in terms of memory usage for complex programs because all states needs to be stored “online” in memory. Because it is a whitebox fuzzer, KLEE uses symbolic input that can initially be anything instead of use random or mutating input. The fuzzer replaces concrete operations with those that manipulate symbolic values. When the SUT gains a branch condition (boolean expression) the instruction pointer is altered if the condition is true or false. In this point, KLEE queries the constraint solver to find if the condition is true or false along the current path and the instruction pointer is updated with the correct value. Otherwise, both paths are possible. In this case, KLEE clones the current state and it modifies at once the instruction pointer and the path condition in order to explore both paths [10]. The branches may be generated by unsafe operations that could cause errors such as division instruction. In this case, it is generated a branch that checks if the divisor is 0 (zero). If an error occurs, the execution continues on the other path (the false branch), which modifies the divisor not to be 0. A query constraint is generated when the path terminates or a bug is catch in the execution of the SUT. KLEE can also generate test cases for each path by solving the current condition path (query constraint) with a constraint solver based on STP. The STP library accepts as input formula as bit- vectors and arrays. If the query (input) is satisfactory, the STP generates values that satisfy the input formula [13]. Since we want to integrate a blackbox technique in the input generation process we additionally use a numerical abstraction of the constraint system (represented by KLEE output query) with PPL tool help. There are also others tools similar with KLEE such as SAGE (developed by Microsoft) or Mayhem symbolic executor (developed by Carnegie Mellon University). We chose KLEE because it is an open source, easy to use and well documented. Because KLEE is built on the top of LLVM, when we start checking a program we have to compile our source code to bytecode using LLVM compiler. After this, KLEE is used to execute the resulted bytecode. Figure 3 shows the steps that we follow in order to test a simple C program with KLEE. Figure 3. KLEE steps The source code requires no modification to the target application. For testing purpose we use the following source code named test1.c: int test(unsigned int t_a, int t_b) { unsigned int a = t_a; // unsigned byte int b = t_b; // signed byte if (a >= 20) { LLVM- GCCinpu fsda fasd fdsi npu C/C++ application Bytecode KLEE KQuery files
  • 4. if (b <= -80) { printf("PATH 1t"); printf("a=%d, b=%dn", a, b); return 1; } printf("PATH 2t"); printf("a=%d, b=%dn", a, b); return 2; } printf("PATH 3t"); printf("a=%d, b=%dn", a, b); return -1; } int main(int argc, char** argv) { returntest(atoi(argv[1]), atoi(argv[2])); } To compile the source code with LLVM- GCC tool we have to type: llvm-gcc --emit-llvm -c –g test1.c The LLVM compiler will compile the test1.c to bytecode named test1.o. After this we can run KLEE on the generated bytecode: klee --write-pc test1.o After running KLEE, the application generates test cases for each unique path of the target program. KLEE can also output some statistic information and the constraint systems of each path condition. We use the option --write-pc to generate constraints systems as queries in KQuery language. For test1.c, the fuzzer will find three unique paths and three KQuery files will be generated. For example the path corresponding to “PATH 2” will have the KQuery file as following: array b[4] : w32 -> w8 = symbolic array a[4] : w32 -> w8 = symbolic (query [(Ult 19 (ReadLSB w32 0 a)) (Eq false (Sle (ReadLSB w32 0 b) 4294967216))] false) KQuery is a representation language for constraint expressions. KQuery is the language used by Kleaver tool, a constraint solver tool integrated in KLEE. KQuery represents formulas as bitvectors and arrays, and supports all standard operations on bitvectors [10].The language is very easy to read and write and accepts two kinds of declarations: array declaration and query commands. Array declarations are used to declare symbolic variables as arrays of bitvectors. The syntax for declare array is: "array" name "[" [ size ] "]" ":" domain "->" range "=" array-initializer [10] The parameter “name” represents the name for the variable, in our case, for the test1.c query, we have the names of the two variables a and b. Because a and b are two integer variables, their size is 4 and the domain is w32 (32 bits width). Each array can be initialized as symbolic or as constant values. Query command represents the command that the constraint solver have to execute it. A query is composed by constraints, expressions and, if the query expression is invalid, the query may contains values computed by expressions and arrays [10]. The syntax of a query command is: "(" "query" constraint-list query- expression [ eval-expr-list [ eval- array-list ] ] ")" [10] A query command starts with the word “query” and consists of a constraints list and a query expression. To match the “PATH 2” of test1.c we have a query composed by two constraint list for each variable. A query expression may contain arithmetic operations (Add, Sub, Mul, UDiv, SDiv, URem, SRem), bitwise operations (And, Or, Xor, Shl, LShr, AShr), comparisons (Eq, Ne, Ule,Ult, Ugt, Uge, Slt, Sgt, Sle), bitvector manipulation (Concat, Extract, ZExt, SExt), special expressions (Read, Select), macro expressions (Neg, ReadLSB, ReadMSB) [10].
  • 5. For our example we have in the first query constraint the command “ReadLSB w32 0 a”, which means that we have to read less sign bits from w32 to w0 bit. The expression Ult (unsigned less than) suggests that 19 is less than the value read from preview expression. The same reasoning is used for the second constraint applied to the variable b: first we have to read LSB for the symbolic variable b, solve Slt (signed less than). The expression “Eq false” means that the result of the previews equation will be negate: b < -79  b >= -79. 4 Input data generator After generating path predicates through symbolic execution we obtained constraints for our system. We can generate input data by solving these constraints using a constraint solver or using a numerical abstraction method based on convex polyedra created by inequalities. In this way we can used Parma Polyedra Library. The PPL can only handle linear inequalities with first order symbolic variable. For example a_symb + b_symb < c is a linear formula whereas a_symb * b_symb < c is not. In section 3, we saw that the KLEE generates a constraint system of queries in KQuery language. While the PPL can handle linear inequalities we have to transform those queries in linear inequalities systems. Figure 4 shows the steps that we follow in order to generate the input space: Figure 4. Input data generation steps To transform queries generated by KLEE in linear inequalities systems we wrote a simple C++ parser tool that has as input a KQuery file format and outputs a system of linear inequalities. Having the following query constraint: query [(Ult 19 (ReadLSB w32 0 a)) (Eq false (Sle (ReadLSB w32 0 b) 4294967216)) ] The transformation will be: 19 < a b > -79 To solve the linear inequalities we use numerical abstraction and convex polyedra. The polyedra represents the solution of linear inequalities defined as (1) and (2). (1) (2) The result is the intersection of m halfspace with normal vectors and represents a polytope object (3). ( ) (3) Figure 5. Convex polyhedron Parma Polyhedra Library can support numerical abstraction for domains of polyedra, octagonal shaped, bounded difference shaped and mixed integer linear programming problems. Because PPL can only manipulate polytope objects defined as a1 a2 a3 a4 a5 INPUT SPAC E Parserinpu fsda fasd fdsi npu KQuery constraints Linear inequations PPL Input space
  • 6. non strict inequalities we have to transform all inequalities to be non strict [11]. To find the input space for a target program we have to find the maximum and minimum value of each variable. In this way, we compose the linear non strict inequalities system and then to solve the system using mixed integer linear programming problem, which is included in PPL. The mixed integer linear programming problem is a mathematical method, which can find the best outcome (maxim or minim) in a mathematical model described as a linear relationship. It represents a technique to optimize the linear objective function defined as linear equality and linear inequality constraints. The feasible region represents a convex polyhedron, which is a set of intersection of halfspaces with normal vectors defined by a linear inequality. The mixed integer linear programming algorithm can find the minim or the maxim value of the objective function in the polyhedron, if such a point exists. The canonical form of the mixed integer linear programming problem (MILP or MIP) is defined in (4), (5) and (6). (4) (5) (6) For example, we use the constraints system defined in (7), (8), (9), (10) and (11) (7) (8) (9) (10) (11) The feasible region will be represented by the white region in the graph described in the Figure 6. Figure 6. Feasible region To generate random points from inside of the polyhedronwe use the linear programming method described in [11] for each variable (dimension). PPL provides methods to maximize and minimize a given linear objective function. The objective function consists of a single variable (we use an objective function for each direction). First we have to find lower and upper bound for each variable, which represents the boundaries of the feasible region. For our test1.c source code example we obtained the constraint system described in (12), (13), (14) and (15). (12) (13) (14) (15) The feasible region is described in Figure 7. Figure 7. Feasible region The input space will be random generated from the feasible region by providing the upper and lower bound obtained after solving the constraint system of inequalities
  • 7. with the PPL. The results are described in (16) and (17). { (16) { (17) If we have complex systems of constraints it is mandatory to check if the values that are random generated are in the feasible region or not. This can be done using PPL by checking if the points are in the interior of the polyhedron. Figure 8. Values checking (the red points are ok while the yellow ones are not) 5 Delivery and monitoring mechanism Random values generated with PPL are delivered to the target application using Zzuf fuzzer. The input data generator module creates for each path predicate a file with .zz extension that contains random values generated using PPL. The Zzuf tool reads each of these file and sent it to the SUT. Zzuf is a mutation based fuzzer generator, which randomly alters the target program correct input. Because we use this tool only as a delivery mechanism and monitoring, system the ratio of the mutation will be set to 0%. As a monitoring system Zzuf can monitor the target program for security weakness such as segmentation faults or memory corruption. The command, which we used to deliver the test cases and monitor the SUT is: zzuf -r 0 $gcc_app_path/$app_name $line The parameter -r represents the ratio set to be 0, $gcc_app_path/$app_name represents the path to the application that is tested and the parameter $line represents a line read from the file with extension zz which contains inputs used be the SUT. 4 Evaluation In this section, we evaluate our method through a simple example program (test1.c). For future work we purposed to use real application (such as coreutils) that KLEE is capable to handle. In the evaluation of our method we are focused on the code coverage and input generation. Code coverage represents a standard metric in software analysis because it shows how much of the code is used in the testing process. If we have higher code coverage ration the probability of finding bugs increases. We use klee-stat command to measure the code coverage, which represents the number of lines of code executed over total number of lines of the source code of the target application. Table 1 shows the statistics for our test1.c example. Instrs Time(s) ICov(%) BCov(%) ICount TSolver(%) 45 0.44 75.00 100.00 60 52.76 Table 1. Statistics of symbolic execution The value of Instrs represents the number of executed instructions, Time represents the total time, ICov represents the percentage of LLVM instructions that were covered, BCovrepresents the percentage of branches that were covered, ICount represents the total static instructions in the LLVM bitcode and TSolver represents the time spent in the constraint solver (the internal KLEE constraint solver).
  • 8. Input generation is another important measure in software analysis because, if we have a large number of test cases, the probability to finding bugs is high. Symbolic execution does not generate a widely number of test cases because, depending on constraint solver, a test case is generated for a given path. Because we use a polytope as input space, the number of test cases that we can generate is very large. For example for our test1.c,if we want to go through the “PATH 2”,the variable a may be between values 20 and INT_MAX and variable b is between -80 and INT_MAX. On a 32 bit Linux-based operating system it means that an signed integer is represented on 32 bit and INT_MAX will have the value 2147483647. In this case the number of test cases for the “PATH 2” may be (2147483627 x 2147483727), which means a very large input space. 4 Conclusion This report presents a method of automatic finding software vulnerabilities using both whitebox and blackbox fuzzing methods. Traditional fuzzers are called blackbox, while modern fuzzers are called whitebox. But, in practice which is the best: whitebox or blackbox fuzzing? Blackbox is simple, fast, easy to use, lightweight, but has issues regarding to code coverage. Whitebox is smarter but also more complex, and may not be easily used. We show that we can combine whitebox and blackbox fuzzing to improve code coverage and input data generation. The technique presented in this paper shows that efficient input generation is possible and this cannot be done if we use only blackbox or whitebox fuzzing. Next step in the development of this solution is to evaluate real application, such as coreutils, and to compare blackbox and whitebox fuzzers with our solution. References: [1] Richard McNally, Ken Yiu, Duncan Grove and Damien Gerhardy, “Fuzzing: The State of the Art”, Australian Command, Control, Communications and Intelligence Division Defence Science and Technology Organisation, February 2012, http://www.dtic.mil/cgi- bin/GetTRDoc?AD=ADA558209. [2] Brian S. Pak, “Hybrid Fuzz Testing: Discovering Software Bugs via Fuzzing and Symbolic Execution”, School of Computer Science Carnegie Mellon University Pittsburgh, Master thesis, May 2012. [3] Sofia Bekrar, Chaouki Bekrar, Roland Groz, Laurent Mounier, “Finding Software Vulnerabilities by Smart Fuzzing”, Software Testing, Verification and Validation (ICST), 2011 IEEE Fourth International Conference, March 2011, pages 427- 430. [4] Patrice Godefroid, Michael Y. Levin, and David Molnar, “SAGE: Whitebox fuzzing for security testing”, communications of the ACM, vol. 55, no. 3, March 2012. [5] James C. King. “Symbolic execution and program testing”, Commun. ACM 19(7), July 1976, pages 385–394. [6] Tsankov P., Dashti, M.T., Basin, D., “SECFUZZ: Fuzz-testing security protocols”, Automation of Software Test (AST), 2012 7th International Workshop, June 2012, pages 1-7. [7] HyoungChun Kim, YoungHan Choi, DoHoon Lee, Donghoon Lee, “Advanced Communication Technology, 2008. ICACT 2008. 10th International Conference”, Feb. 2008, paegs 1304 – 1307. [8] Adrian Furtuna, “Proactive cyber security by red teaming”, PhD thesis, Military Technical Academy, 2011. [9] Marjan Aslani, Nga Chung, Jason Doherty, Nichole Stockman, and William Quach, “Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs”, Summer Undergraduate Program in Engineering Research at Berkeley, Talk or presentation, 2008, URL
  • 9. www.truststc.org/pubs/493.html [10] Cristian Cadar, Daniel Dunbar, Dawson Engler, “Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs”, OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation, 2008, Pages 209-224, URL: http://klee.llvm.org. [11] Roberto Bagnara, Patricia M. Hill, Enea Zaffanella, “The Parma Polyhedra Library: Toward a Complete Set of Numerical Abstractions for the Analysis and Verification of Hardware and Software Systems”, Journal Science of Computer Programming, Volume 72 Issue 1-2, June, 2008, pages 3-21. [12] Caca Labs, “Zzuf multi-purpose fuzzer”, January, 2010, URL http://caca.zoy.org/wiki/zzuf [13] Vijay Ganesh and David L. Dill., “A Decision Procedure for Bit-Vectors and Arrays”, Proceedings of Computer Aided Verification, Berlin, Germany, July 2007.