SlideShare a Scribd company logo
Efficient Bytecode Analysis:
Linespeed Shellcode Detection
Georg Wicherski
Security Researcher
Anatomy of a Shellcode

• Little piece of Bytecode that gets jumped to in an exploit
  – Direct overwrite of EIP on the stack
  – Sprayed on the Heap and called as a function pointer
  – Allocated by small ROP payload and jumped to by last gadget
      • Minus Zynamics Google, they do ROPperies


• Usually some requirements because it is delivered inline
  – Null byte free, because it terminates a C-String
  – rn free, because it often is a delimiter in network protocols
  – ...



            Decoder Stub               Encoded Shellcode
Shellcode Decoder Structure

  jmp getpc                   ; jump to GetPC

  start:                      ; GetPC 2: ebp = EIP
    pop ebp
    push 42                   ; load counter = 42
    pop ecx
    push 23                   ; load key = 23
    pop edx

  decrypt:
    xor byte [ebp+ecx], dl    ; unxor one byte
    loop decrypt              ; repeat until ecx = 0
  jmp payload

  getpc:
    call start                ; GetPC 1: push EIP to stack
  payload:
GetPC Sequences

• call $+5, pop r32
  – Push return address for function call onto stack
  – Use stack access to read back the return address

• fnop, fnstenv [esp+0x0c], pop r32
  – Use a floating point instruction, address will be stored in floating point
    control aread
  – Save floating point control area on stack
  – Read back the instruction address from stack


• Structured Exception Handling
  – Windows specific, trigger an exception
  – Get address of exception instruction in exception handler
Existing Detection Approaches

• Static / Statistical Approaches
   – e.g. Markov Chains for Bytecode (Alme & Elser, Caro 2009)
       • Trained with shellcode / non-shellcode data
       • Measures likelyhood of certain instructions following each other
   – Can only detect the decoder and therefore tend to be either false positive
     or false negative prone (weighting, training data, ...)

• GetPC Sequences + Backtracking + Emulation (libemu)
   – Identify possible GetPC sequences in data
   – Build up tree of possible starting locations by disassembling “backwards”
       • A problem on its own on the x86 CISC architecture
   – Software x86 emulation to weed out (the many) false positives
libscizzle

• Identification of possible GetPC sequences
   – A little less strict than libemu in terms of triggering combinations


• Brute force possible starting location around sequence
   – Efficient emulation allows this performance wise


• Use efficient sandboxed hardware execution for verification
   – No, this is not virtualization, no VT involved
   – Yes, it is secure, so we do not get owned (trivially)




       http://code.mwcollect.org/projects/libscizzle
x86 Segmentation vs. Paging


     Segment           Virtual   Physical
Code Execution / “Emulation”

• Disassemble guest code
   – Stop on any privileged or (potentially)
     execution flow modifying instruction
   – This is roughly equivalent to “basic
     blocks”
   – Segment register access is considered
     a privileged instruction ;)

• Execute one basic block at a time
  within the guest segment
• Emulate all other instructions
   – Conditional jumps, calls, ...
   – Abort analysis on any privileged
     instructions

• Exception: backwards short jumps
Evaluation: Performance

$ ./libscizzle-test < urandom.bin
[*] Filtering / scanning over 32.0 MiB of data took 105 ms.
[*] Verifying 700 shellcode candidate offsets...
[*] Verification over 32.0 MiB of data took 217 ms.
[*] Everything over 32.0 MiB of data took 322 ms.


• 99.38 Mib / sec, 795 MiB / sec on my presentation laptop, single core
• About 1000x faster than libemu, a lot faster than Markov Chains

• This is fast enough to do it inline at GigaBit speed on a commodity
  server, think IPS
• Real world data has usually better properties than purely random data
Evaluation: Success Rate

• False Positives: none.
   – If it is detected, it resembles valid
     shellcode
   – Random data might resemble valid
     shellcode but this is a philosophical
     problem then, highly unlikely.

• False Negatives: none so far
   – Tested on a lot of public shellcodes
     (tricky Metasploit ones,
     egghunters)
   – Used during CTFs for testing
     libscizzle, detected everything
        • DefCon, ruCTFe, ...

• Manual evasion possible
Questions?




             Thanks for your attention!

More Related Content

What's hot (20)

PDF
OSNoise Tracer: Who Is Stealing My CPU Time?
ScyllaDB
 
PPTX
Onnc intro
Luba Tang
 
ODP
Linux kernel tracing superpowers in the cloud
Andrea Righi
 
PDF
Using eBPF to Measure the k8s Cluster Health
ScyllaDB
 
PDF
Translation Cache Policies for Dynamic Binary Translation
Saber Ferjani
 
PPT
Concurrency bug identification through kernel panic log (english)
Sneeker Yeh
 
PPTX
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Sneeker Yeh
 
PDF
Whoops! I Rewrote It in Rust
ScyllaDB
 
PDF
Continuous Performance Regression Testing with JfrUnit
ScyllaDB
 
PDF
Linux kernel debugging
libfetion
 
PDF
Crimson: Ceph for the Age of NVMe and Persistent Memory
ScyllaDB
 
PDF
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
ScyllaDB
 
PPTX
Demo
sean chen
 
PDF
LAS16-101: Efficient kernel backporting
Linaro
 
PDF
OSv Unikernel — Optimizing Guest OS to Run Stateless and Serverless Apps in t...
ScyllaDB
 
PDF
Continuous Go Profiling & Observability
ScyllaDB
 
PPTX
Evented Ruby VS Node.js
Nitin Gupta
 
PPTX
Designing Tracing Tools
Sysdig
 
PDF
from Binary to Binary: How Qemu Works
Zhen Wei
 
PDF
IPv4aaS tutorial and hands-on
APNIC
 
OSNoise Tracer: Who Is Stealing My CPU Time?
ScyllaDB
 
Onnc intro
Luba Tang
 
Linux kernel tracing superpowers in the cloud
Andrea Righi
 
Using eBPF to Measure the k8s Cluster Health
ScyllaDB
 
Translation Cache Policies for Dynamic Binary Translation
Saber Ferjani
 
Concurrency bug identification through kernel panic log (english)
Sneeker Yeh
 
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Sneeker Yeh
 
Whoops! I Rewrote It in Rust
ScyllaDB
 
Continuous Performance Regression Testing with JfrUnit
ScyllaDB
 
Linux kernel debugging
libfetion
 
Crimson: Ceph for the Age of NVMe and Persistent Memory
ScyllaDB
 
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
ScyllaDB
 
Demo
sean chen
 
LAS16-101: Efficient kernel backporting
Linaro
 
OSv Unikernel — Optimizing Guest OS to Run Stateless and Serverless Apps in t...
ScyllaDB
 
Continuous Go Profiling & Observability
ScyllaDB
 
Evented Ruby VS Node.js
Nitin Gupta
 
Designing Tracing Tools
Sysdig
 
from Binary to Binary: How Qemu Works
Zhen Wei
 
IPv4aaS tutorial and hands-on
APNIC
 

Viewers also liked (20)

PDF
Anatomy of A Shell Code, Reverse engineering
Abhineet Ayan
 
ODP
Design and implementation_of_shellcodes
Amr Ali
 
PDF
Linux Shellcode disassembling
Harsh Daftary
 
PPTX
07 - Bypassing ASLR, or why X^W matters
Alexandre Moneger
 
PDF
Shellcode and heapspray detection in phoneyc
Z Chen
 
PPTX
05 - Bypassing DEP, or why ASLR matters
Alexandre Moneger
 
PPTX
Java Shellcode Execution
Ryan Wincey
 
PPTX
Exploit Research and Development Megaprimer: Unicode Based Exploit Development
Ajin Abraham
 
PDF
Rooting Your Internals: Inter-Protocol Exploitation, custom shellcode and BeEF
Michele Orru
 
PDF
Talking about exploit writing
sbha0909
 
PPTX
Anton Dorfman. Shellcode Mastering.
Positive Hack Days
 
PDF
Shellcode Analysis - Basic and Concept
Julia Yu-Chin Cheng
 
PDF
Hacking school computers for fun profit and better grades short
Vincent Ohprecio
 
PPTX
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
midnite_runr
 
PPTX
Exploit Research and Development Megaprimer: Win32 Egghunter
Ajin Abraham
 
PDF
One Shellcode to Rule Them All: Cross-Platform Exploitation
Quinn Wilton
 
PPT
Software Exploits
KevinCSmallwood
 
PDF
Shellcode injection
Dhaval Kapil
 
PPT
Writing Metasploit Plugins
amiable_indian
 
PDF
Rooting your internals - Exploiting Internal Network Vulns via the Browser Us...
Michele Orru
 
Anatomy of A Shell Code, Reverse engineering
Abhineet Ayan
 
Design and implementation_of_shellcodes
Amr Ali
 
Linux Shellcode disassembling
Harsh Daftary
 
07 - Bypassing ASLR, or why X^W matters
Alexandre Moneger
 
Shellcode and heapspray detection in phoneyc
Z Chen
 
05 - Bypassing DEP, or why ASLR matters
Alexandre Moneger
 
Java Shellcode Execution
Ryan Wincey
 
Exploit Research and Development Megaprimer: Unicode Based Exploit Development
Ajin Abraham
 
Rooting Your Internals: Inter-Protocol Exploitation, custom shellcode and BeEF
Michele Orru
 
Talking about exploit writing
sbha0909
 
Anton Dorfman. Shellcode Mastering.
Positive Hack Days
 
Shellcode Analysis - Basic and Concept
Julia Yu-Chin Cheng
 
Hacking school computers for fun profit and better grades short
Vincent Ohprecio
 
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
midnite_runr
 
Exploit Research and Development Megaprimer: Win32 Egghunter
Ajin Abraham
 
One Shellcode to Rule Them All: Cross-Platform Exploitation
Quinn Wilton
 
Software Exploits
KevinCSmallwood
 
Shellcode injection
Dhaval Kapil
 
Writing Metasploit Plugins
amiable_indian
 
Rooting your internals - Exploiting Internal Network Vulns via the Browser Us...
Michele Orru
 
Ad

Similar to Efficient Bytecode Analysis: Linespeed Shellcode Detection (20)

PDF
Dive into exploit development
Payampardaz
 
PPTX
Anatomy of a Buffer Overflow Attack
Rob Gillen
 
PDF
Shellcodes for ARM: Your Pills Don't Work on Me, x86
Svetlana Gaivoronski
 
PDF
A CTF Hackers Toolbox
Stefan
 
PPTX
ETCSS: Into the Mind of a Hacker
Rob Gillen
 
PDF
2011-03 Developing Windows Exploits
Raleigh ISSA
 
PDF
Shellcode Disassembling - Reverse Engineering
Sumutiu Marius
 
PDF
Fuzzing: Finding Your Own Bugs and 0days! at Arab Security Conference
Rodolpho Concurde
 
PDF
Defcon 27 - Writing custom backdoor payloads with C#
Mauricio Velazco
 
PDF
Fuzzing - Part 1
UTD Computer Security Group
 
PPTX
Steelcon 2014 - Process Injection with Python
infodox
 
PDF
DARPA CGC and DEFCON CTF: Automatic Attack and Defense Technique
Chong-Kuan Chen
 
PDF
[若渴計畫] Challenges and Solutions of Window Remote Shellcode
Aj MaChInE
 
PDF
My old security advisories on HMI/SCADA and industrial software released betw...
Luigi Auriemma
 
PDF
Sourcefire Vulnerability Research Team Labs
losalamos
 
PDF
Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...
RootedCON
 
PPTX
EhTrace -- RoP Hooks
Shane Macaulay
 
PPTX
Adventures in Asymmetric Warfare
Will Schroeder
 
PDF
stackconf 2021 | Fuzzing: Finding Your Own Bugs and 0days!
NETWAYS
 
PDF
Higher Level Malware
CTruncer
 
Dive into exploit development
Payampardaz
 
Anatomy of a Buffer Overflow Attack
Rob Gillen
 
Shellcodes for ARM: Your Pills Don't Work on Me, x86
Svetlana Gaivoronski
 
A CTF Hackers Toolbox
Stefan
 
ETCSS: Into the Mind of a Hacker
Rob Gillen
 
2011-03 Developing Windows Exploits
Raleigh ISSA
 
Shellcode Disassembling - Reverse Engineering
Sumutiu Marius
 
Fuzzing: Finding Your Own Bugs and 0days! at Arab Security Conference
Rodolpho Concurde
 
Defcon 27 - Writing custom backdoor payloads with C#
Mauricio Velazco
 
Fuzzing - Part 1
UTD Computer Security Group
 
Steelcon 2014 - Process Injection with Python
infodox
 
DARPA CGC and DEFCON CTF: Automatic Attack and Defense Technique
Chong-Kuan Chen
 
[若渴計畫] Challenges and Solutions of Window Remote Shellcode
Aj MaChInE
 
My old security advisories on HMI/SCADA and industrial software released betw...
Luigi Auriemma
 
Sourcefire Vulnerability Research Team Labs
losalamos
 
Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...
RootedCON
 
EhTrace -- RoP Hooks
Shane Macaulay
 
Adventures in Asymmetric Warfare
Will Schroeder
 
stackconf 2021 | Fuzzing: Finding Your Own Bugs and 0days!
NETWAYS
 
Higher Level Malware
CTruncer
 
Ad

Recently uploaded (20)

PPTX
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
PDF
Next level data operations using Power Automate magic
Andries den Haan
 
PPSX
Usergroup - OutSystems Architecture.ppsx
Kurt Vandevelde
 
PPTX
Reimaginando la Ciberdefensa: De Copilots a Redes de Agentes
Cristian Garcia G.
 
PDF
TrustArc Webinar - Navigating APAC Data Privacy Laws: Compliance & Challenges
TrustArc
 
PDF
Bridging CAD, IBM TRIRIGA & GIS with FME: The Portland Public Schools Case
Safe Software
 
PDF
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
 
PDF
Why aren't you using FME Flow's CPU Time?
Safe Software
 
PDF
GDG Cloud Southlake #44: Eyal Bukchin: Tightening the Kubernetes Feedback Loo...
James Anderson
 
PPTX
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Poster...
Michele Kryston
 
PDF
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
PPTX
Enabling the Digital Artisan – keynote at ICOCI 2025
Alan Dix
 
PPTX
Practical Applications of AI in Local Government
OnBoard
 
PDF
Dev Dives: Accelerating agentic automation with Autopilot for Everyone
UiPathCommunity
 
PDF
LLM Search Readiness Audit - Dentsu x SEO Square - June 2025.pdf
Nick Samuel
 
PDF
Simplify Your FME Flow Setup: Fault-Tolerant Deployment Made Easy with Packer...
Safe Software
 
PDF
Automating the Geo-Referencing of Historic Aerial Photography in Flanders
Safe Software
 
PDF
How to Visualize the ​Spatio-Temporal Data Using CesiumJS​
SANGHEE SHIN
 
PDF
How to Comply With Saudi Arabia’s National Cybersecurity Regulations.pdf
Bluechip Advanced Technologies
 
PDF
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
Next level data operations using Power Automate magic
Andries den Haan
 
Usergroup - OutSystems Architecture.ppsx
Kurt Vandevelde
 
Reimaginando la Ciberdefensa: De Copilots a Redes de Agentes
Cristian Garcia G.
 
TrustArc Webinar - Navigating APAC Data Privacy Laws: Compliance & Challenges
TrustArc
 
Bridging CAD, IBM TRIRIGA & GIS with FME: The Portland Public Schools Case
Safe Software
 
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
 
Why aren't you using FME Flow's CPU Time?
Safe Software
 
GDG Cloud Southlake #44: Eyal Bukchin: Tightening the Kubernetes Feedback Loo...
James Anderson
 
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Poster...
Michele Kryston
 
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
Enabling the Digital Artisan – keynote at ICOCI 2025
Alan Dix
 
Practical Applications of AI in Local Government
OnBoard
 
Dev Dives: Accelerating agentic automation with Autopilot for Everyone
UiPathCommunity
 
LLM Search Readiness Audit - Dentsu x SEO Square - June 2025.pdf
Nick Samuel
 
Simplify Your FME Flow Setup: Fault-Tolerant Deployment Made Easy with Packer...
Safe Software
 
Automating the Geo-Referencing of Historic Aerial Photography in Flanders
Safe Software
 
How to Visualize the ​Spatio-Temporal Data Using CesiumJS​
SANGHEE SHIN
 
How to Comply With Saudi Arabia’s National Cybersecurity Regulations.pdf
Bluechip Advanced Technologies
 
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 

Efficient Bytecode Analysis: Linespeed Shellcode Detection

  • 1. Efficient Bytecode Analysis: Linespeed Shellcode Detection Georg Wicherski Security Researcher
  • 2. Anatomy of a Shellcode • Little piece of Bytecode that gets jumped to in an exploit – Direct overwrite of EIP on the stack – Sprayed on the Heap and called as a function pointer – Allocated by small ROP payload and jumped to by last gadget • Minus Zynamics Google, they do ROPperies • Usually some requirements because it is delivered inline – Null byte free, because it terminates a C-String – rn free, because it often is a delimiter in network protocols – ... Decoder Stub Encoded Shellcode
  • 3. Shellcode Decoder Structure jmp getpc ; jump to GetPC start: ; GetPC 2: ebp = EIP pop ebp push 42 ; load counter = 42 pop ecx push 23 ; load key = 23 pop edx decrypt: xor byte [ebp+ecx], dl ; unxor one byte loop decrypt ; repeat until ecx = 0 jmp payload getpc: call start ; GetPC 1: push EIP to stack payload:
  • 4. GetPC Sequences • call $+5, pop r32 – Push return address for function call onto stack – Use stack access to read back the return address • fnop, fnstenv [esp+0x0c], pop r32 – Use a floating point instruction, address will be stored in floating point control aread – Save floating point control area on stack – Read back the instruction address from stack • Structured Exception Handling – Windows specific, trigger an exception – Get address of exception instruction in exception handler
  • 5. Existing Detection Approaches • Static / Statistical Approaches – e.g. Markov Chains for Bytecode (Alme & Elser, Caro 2009) • Trained with shellcode / non-shellcode data • Measures likelyhood of certain instructions following each other – Can only detect the decoder and therefore tend to be either false positive or false negative prone (weighting, training data, ...) • GetPC Sequences + Backtracking + Emulation (libemu) – Identify possible GetPC sequences in data – Build up tree of possible starting locations by disassembling “backwards” • A problem on its own on the x86 CISC architecture – Software x86 emulation to weed out (the many) false positives
  • 6. libscizzle • Identification of possible GetPC sequences – A little less strict than libemu in terms of triggering combinations • Brute force possible starting location around sequence – Efficient emulation allows this performance wise • Use efficient sandboxed hardware execution for verification – No, this is not virtualization, no VT involved – Yes, it is secure, so we do not get owned (trivially) http://code.mwcollect.org/projects/libscizzle
  • 7. x86 Segmentation vs. Paging Segment Virtual Physical
  • 8. Code Execution / “Emulation” • Disassemble guest code – Stop on any privileged or (potentially) execution flow modifying instruction – This is roughly equivalent to “basic blocks” – Segment register access is considered a privileged instruction ;) • Execute one basic block at a time within the guest segment • Emulate all other instructions – Conditional jumps, calls, ... – Abort analysis on any privileged instructions • Exception: backwards short jumps
  • 9. Evaluation: Performance $ ./libscizzle-test < urandom.bin [*] Filtering / scanning over 32.0 MiB of data took 105 ms. [*] Verifying 700 shellcode candidate offsets... [*] Verification over 32.0 MiB of data took 217 ms. [*] Everything over 32.0 MiB of data took 322 ms. • 99.38 Mib / sec, 795 MiB / sec on my presentation laptop, single core • About 1000x faster than libemu, a lot faster than Markov Chains • This is fast enough to do it inline at GigaBit speed on a commodity server, think IPS • Real world data has usually better properties than purely random data
  • 10. Evaluation: Success Rate • False Positives: none. – If it is detected, it resembles valid shellcode – Random data might resemble valid shellcode but this is a philosophical problem then, highly unlikely. • False Negatives: none so far – Tested on a lot of public shellcodes (tricky Metasploit ones, egghunters) – Used during CTFs for testing libscizzle, detected everything • DefCon, ruCTFe, ... • Manual evasion possible
  • 11. Questions? Thanks for your attention!