Automated Vulnerability Testing Using Machine Learning and Metaheuristic Search

.lusoftware veriﬁcation & validation
VVS
Automated Vulnerability Testing
Using Machine Learning and
Metaheuristic Search
PI: Lionel Briand
Researchers: Annibale Panichella, Cu Nguyen, Nadia Alshahwan
PhD Students: Dennis Appelt, Sadeeq Jan
1

Code Injection
Manipulated data structures
Collect and analyze information
Indicator
Employ probabilistic techniques
Manipulate system resources
Subvert access control
Abuse existing functionality
Engage in deceptive interactions 2 %
2 %
3 %
3 %
3 %
4 %
9 %
32 %
42 %
X-Force Threat Intelligence Index 2017
2
https://www.ibm.com/security/xforce/
More than 40% of all
attacks were injection
attacks (e.g., SQLi)

Web Applications
3
Server SQL DatabaseClient

Web Applications
4
Web form
str1
str2
Username
Password
OK
SQL query
SELECT *
FROM Users WHERE
(usr = ‘str1’ AND psw = ‘str2’)
Name Surname …
John Smith …
Result

Injection Attacks
5
SQL query
Name Surname …
Aria Stark …
John Snow …
… … …
Query result
SELECT *
FROM Users
WHERE (usr = ‘’ AND
psw = ‘’) OR 1=1 --
Web form
‘) OR 1=1 --
Username
Password
OK

Protection Layers
Server
SQL
Database
Client
Data input
Validation
and
Sanitization
Database
Firewall
Web
Application
Firewall
6

Protection Layers: Trade-Offs
7
Overhead
Detection
Accuracy
Front-end
Level
Defenses
Web
Application
Firewall
Database
Level
Defences

Testing Challenges
• All protection layers need to be tested
• No single layer can possibly block all attacks
• They need to be effective together
• Testing is extensive: Large input space
• Different test techniques for different layers
• Many types of vulnerabilities
8

Testing Front-end
Web Applications for XMLi

Testing the Front-end (XMLi)
10
Front-end
System
XML
I1
I2
In
Generated XML
Messages
Back-end
Systems
System 1
System 2
System n
Input
Strings

Security Mechanisms in Front-end
Web Applications
• Input Sanitization: rejects inputs
containing malicious characters (e.g., <)
• Input Validation: converts malicious
inputs to valid ones (e.g., deleting XML
tags)
• Other transformation: domain speciﬁc
transformation (e.g., JSON to XML,
calculating age)
11
Front-end
System
XML
I1
I2
In
Generated XML
Messages
Back-end
Systems
System 1
System 2
System n
Input
Strings

Testing of the Front-end WAs
12
Does the front-end system (SUT) allow the
generation of XML injection attacks?
YES
The front-end
is vulnerable
NO
The front-end
is secure

Testing of the Front-end WAs
13
Front-end
System
XML
I1
I2
In
Generated XML
Messages
Back-end
Systems
System 1
System 2
System n
<user>
<username>Tom</username>
<password>m1U9q10</password>
<role>user</role>
<mail>role=Adm+ tom@uni.lu</mail>
</user>
Step 1: Create malicious XML messages
Step 2: Verify whether the SUT can generate them
Malicous XML message
Search for
Input String

Step 1: Generating Malicious Messages
Grammar-based Generation: automatically generating malicious
messages for different type of XML injection attacks
14
Our tool SOLMI (ISSTA'16)
Example of message
generated by SOLMI

Step 2: Searching for Input Strings
15
Front-end
System
XML
I1
I2
In
Generated XML
Messages
Back-end
Systems
System 1
System 2
System n
<user>
<username>Tom</username>
<password>m1U9q10</password>
<role>user</role>
<mail>role=Adm+ tom@uni.lu</mail>
</user>
Malicous XML message
Candidate
Input String
The front-end web application (SUT) is a black-box
The search space is very huge: all possible input strings (I1, .., In)

16
Evaluation
Selection
Crossover
Mutation
Search
Algorithm
Initial
Solutions Random Strings
Front-end
System
I1
I2
In
Generated
Messag
Email:“role=Adm”
+tom@uni.lu
Usr: Tom
Psw: m1U9q10

17
Evaluation
Selection
Crossover
Mutation
Search
Algorithm
Initial
Front-end
System
I1
I2
In
Generated
Messag
+tom@uni.lu
Usr: Tom
Psw: m1U9q10
Target Edit
Distance
XMLXML

18
Evaluation
Selection
Crossover
Mutation
Search
Algorithm
Initial
Front-end
System
I1
I2
In
Generated
Messag
+tom@uni.lu
Usr: Tom
Psw: m1U9q10
XML
XML
XML
XML
New Input
Strings

Some Results
19
(W/ validat.) (W/o validat.) (open source) (Industrial)
%CoveredXMLiMessage
0
25
50
75
100
SBANK SSBANK XMLMAO R M
RealCoded GA Standard GA Hill Climbing Random Search
(Industrial)

Testing Web Application
Firewalls (WAFs)

Web Application Firewalls (WAFs)
21
Servermalicious
malicious
malicious
legitimate
WAF

WAF Rule Set
22
Rule set of Apache ModSecurity
https://github.com/SpiderLabs/ModSecurity

Misconﬁgured WAFs
23
BLOCKED
False Positive
ALLOWED
False Negative

Anatomy of SQLi attacks
24
‘ OR“a”=“a”#
Bypassing Attack
<START>
<sq> <wsp> <sqliAttack> <cmt>
<boolAttack>
<opOR> <boolTrueExpr>
OR <bynaryTrue>
<dq> <ch> <dq> <opEq> <dq> <ch> <dq>
“ a ” = “ a ”
<sQuoteContext>
‘ #_
Decomposition Tree
‘
_
OR”a”=“a”
#
S =
{
Attack Slices

Learning Attack Patterns
25
S1 S2 S3 S4 … Sn Outcome
A1 1 1 0 0 … 0 Passed
A2 0 1 0 0 … 0 Blocked
… … … … … … … …
Am 1 1 1 1 … 1 Blocked
Training Set
Sn
PassedBlocked
S4
YesNo
YesNo
YesNo
S3
S1
S2
…
Decision Tree

Learning Attack Patterns
26
S1 S2 S3 S4 … Sn Outcome
A1 1 1 0 0 … 0 Passed
A2 0 1 0 0 … 0 Blocked
… … … … … … … …
Am 1 1 1 1 … 1 Blocked
Sn
PassedBlocked
S4
YesNo
YesNo
YesNo
S3
S1
S2
…
Training Set Decision Tree
Attack Pattern
S2 ∧ ¬ Sn ∧ S1

Machine Learning
Sn
PassedBlocked
S4
YesNo
YesNo
YesNo
S3
S1
S2
…
Generating Attacks via ML and EAs
27
Prepare
Training
Data
Build
Classifier
Mutate
best
attacks
Execute
new
attacks
Slice
attacks
Initial
Attacks
(μ+λ) Evolutionary Algorithm

Some Results
Apache ModSecurity
28
Apache ModSecurity
• ML techniques outperform
random technique
• ML-Driven E superior to
other ML techniques
DistinctAttacks
Industrial Case
Industrial WAFs
DistinctAttacks
Machine Learning-driven attack generation led to more
distinct, successful attacks being discovered

Automated Repairing of
Vulnerable WAFs

Rule Set Customization
30
Customization is error-prone:
•Complex ﬁlter rules
•Limited time and resources
•Lack of automated tools
Rule customization is necessary:
•To protect from new threats
•To avoid false positives

Fixing Vulnerable WAFs
31
SQLi Attacks
Attacks
Decomposition
Machine
Learning (DT)
Attack
Generation
Process
Attack
Patterns

Fixing Vulnerable WAFs
32
SQLi Attacks
Attacks
Decomposition
Machine
Learning (DT)
New Regular
Expressions
Existing
Rule Set
Fixed
Rule Set
# Blocked
Attacks
# Blocked
Legitimate
Request

Multi-Objective Optimization
33
Problem: selecting a subset of the regular expressions produced
by Decision Tree such as to (1) maximizing the recall (blocked
attacks) and (2) minimizing the false positive rate.Recall
False Positive
Pareto
Front

Multi-Objective Genetic Algorithms
34
Evaluation
Selection
Crossover
Mutation
NSGA-II
Initial
Solutions

Multi-Objective Genetic Algorithms
35
R1 R2 R2 R4 … Rk
1 1 0 0 … 0
0 1 1 1 … 1
Initial Solutions
Evaluation
Selection
Crossover
Mutation
NSGA-II
Initial
Solutions
Solutions are evaluated
and selected according
to the Pareto Optimality

Some Results
36
Target WAF:
ModeSecurity
OWASP Core Rule Set
Target Operation:
doPayment()
# Attacks = 1234
# Benign Req = 1567
Hypevolume(NSGAII) >Hypevolume(RS)

Hypervolume Results
37
Hypervolume
0,00
0,25
0,50
0,75
1,00
Op1 Op2 Op3 Op3
NSGA-II Random
Hypervolume
0,00
0,25
0,50
0,75
1,00
doPayment expireTicket simulate- 
Payment
NSGA-II Random
ModSecurity Industrial WAF

Detecting Malicious SQL
Statements at Database Level

Using ML to Detect SQLi Statements
39
SQL
egitimate
cution Logs
Parsing Pruning
Edit distance
Training Phase
L
mate
n Logs
Parsing Pruning
Edit distance Clustering
Training Phase
QL
timate
ion Logs
Parsing Pruning
Training Phase
Parsing Pruning
ng Phase
Parsing Pruning
Edit
Distance
Clustering
SQL
Legitimate
Execution Logs
Phase 1: Training
SQL
Security
Testing Logs
Parsing Pruning
Testing Phase
Classification
SQL
Security
Testing Logs
SQL
Legitimate
Execution Logs
Parsing Pruning
Edit distan
Training Phase
SQL
Legitimate
Execution Logs
Parsing Pruning
Edit distance Clu
Training Phase
Parsing Pruning
Phase 2: Testing (Detection)

Detection Phase
40
Clustering
Incoming
SQL Statement 1
Incoming
SQL Statement 2

Detection Phase
41
Incoming
SQL Statement 1
Clustering
Incoming
SQL Statement 2
APPROVE
REJECT

Some Results
42
SUT Test Gen. Recall False Positive
HotelRS Xavier 100% 0 %
SugarCRM Xavier 100% 0%
0%
TaskFreak
Burpsuite 100% 0%
0%SqlMap 100% 0,1 %
TheOrganizer
Burpsuite 100% 0,6 %
SqlMap 100% 0,3 %
Wordpress-newstat
Burpsuite 100% 0,2 %
SqlMap 100% 0,2 %
Wordpress-landingpage SqlMap 100% 0,1 %

Publications
Automatic Generation of Tests to Exploit XML Injection Vulnerabilities in Web Applications.
Jan, Sadeeq; Panichella, Annibale; Arcuri, Andrea; Briand, Lionel. To appear in IEEE Transaction on Software
Engineering (TSE), 2017
A Machine Learning-Driven Evolutionary Approach for Testing Web Application Firewalls.
Appelt, Dennis, Nguyen, Duy Cu, Panichella, Annibale, Briand, Lionel. To appear in IEEE Transaction on
Reliability (TR)
Automatically Repairing Web Application Firewalls Based on Successful SQL Injection Attacks.
Appelt, Dennis; Annibale Panichella; Briand, Lionel. In IEEE 28th International Symposium on Software
Reliability Engineering (ISSRE 2017) , Toulouse, France.

Search-based Testing Approach for XML Injection Vulnerabilities in Web Applications
Jan, Sadeeq; Nguyen, Duy Cu; Andrea, Arcuri; Briand, Lionel. Proc. of the 10th IEEE International
Conference on Software Testing, Veriﬁcation and validation (ICST 2017), Tokyo, Japan

Automated and Eﬀective Testing of Web Services for XML Injection Attacks
Jan, Sadeeq; Nguyen, Duy Cu; Briand, Lionel. In Proc. the International Symposium on Software Testing
and Analysis (ISSTA 2016), Saarbrücken, Germany

SOFIA: An Automated Security Oracle for Black-Box Testing of SQL-Injection Vulnerabilities
Ceccato, Mariano; Nguyen, Duy Cu; Appelt, Dennis; Briand, Lionel. In Proceedings of the 31th IEEE/ACM
International Conference on Automated Software Engineering (ASE 2016)
43

Publications
Known XML Vulnerabilities Are Still a Threat to Popular Parsers and Open Source Systems
Jan, Sadeeq; Nguyen, Duy Cu; Briand, Lionel. In The 2015 IEEE International Conference on
Software Quality, Reliability & Security (QSR 2015), Vancouver, Canada
Behind an Application Firewall, Are We Safe from SQL Injection Attacks?
Appelt, Dennis; Nguyen, Duy Cu; Briand, Lionel. In Proc. of the 8th International Conference on
Software Testing, Veriﬁcation, and Validation (ICST 2015)

Automated Testing for SQL Injection Vulnerabilities: An Input Mutation Approach
Appelt, Dennis; Nguyen, Duy Cu; Briand, Lionel; Alshahwan, Nadia. In Proc. of the International
Symposium on Software Testing and Analysis (ISSTA 2014)
44

.lusoftware veriﬁcation & validation
VVS
Automated Vulnerability Testing
Using Machine Learning and
Metaheuristic Search
PI: Lionel Briand
Researchers: Annibale Panichella, Cu Nguyen, Nadia Alshahwan
PhD Students: Dennis Appelt, Sadeeq Jan
45

Automated Vulnerability Testing Using Machine Learning and Metaheuristic Search

More Related Content

What's hot (20)

Similar to Automated Vulnerability Testing Using Machine Learning and Metaheuristic Search (20)

More from Lionel Briand (20)

Recently uploaded (20)

Automated Vulnerability Testing Using Machine Learning and Metaheuristic Search