SlideShare a Scribd company logo
Finding Diversity in Remote Code Injection Exploits Justin Ma ,  John Dunagan ,  Helen J. Wang , Stefan Savage ,  Geoffrey M. Voelker University of California, San Diego Microsoft Research Internet Measurement Conference 2006
Outline Introduction Background and Related Work Methodology Exploit Diversity Discussion and Conclusion
Introduction Internet users are increasingly victimized by online criminal enterprise that spans denial-of-service extortion, identity theft, piracy and unsolicited bulk email At the core of these activities is malware Software used to remotely compromise and harness the resources of millions of hosts There is little research describing the malware ecosystem itself How does one piece of malware relate to another? What pressures drive its structural and functional evolution? This paper focuses on how to  identify and measure the diversity  among remote code injection exploits
Introduction (cont’d) Typically, a host is compromised via a software vulnerability (e.g.  buffer overflow ) that allows network-based input to be “injected” into a running program and executed. Subsequently, the exploit payload may Download additional software Reconfigure the OS to evade detection, etc. This paper focuses on  the exploit  and  its initial payload -the so-called  shellcodes.  Shellcodes : First executed on a newly compromised machine Are typically small, simple, hand-coded machine programs Are well-suited to automated analysis Understand how much variation exists among the shellcodes for an exploit Measure shellcode diversity better understand how malware is created. Infer the paternity of different samples Construct a shellcode phylogeny
Outline Introduction Background and Related Work Methodology Exploit Diversity Discussion and Conclusion
Background Remote code injection attacks are a combination of vulnerability, exploit and shellcode The vulnerability is the particular software structure that allows data provided over the network to subvert and redirect execution control flow An unchecked buffer Overwrite the return address of the calling stack frame An exploit is a particular formulation of a attack against a vulnerability The shellcode is the payload carried by the exploit—it is the first code to execute
Stack Buffer Overflow Simple example of a remote stack-based buffer overflow. The shaded regions represent the shellcode of the exploit as sent over network packets, then as injected into the vulnerable buffer of the target host. The return address has been overwritten with injected data, thereby redirecting the execution flow to the shellcode residing in the vulnerable buffer .
Background (cont’d) Shellcodes: Are frequently limited by  the size of the buffer being processed The need for the buffer to contain “NOP sleds” or long regions of consecutive “do nothing” instructions Can be quite sophisticated in their construction : The creation of pseudo-random NOP sleds Polymorphic payloads that are encrypted (and potentially compressed) in transit and only decrypted just before execution Some polymorphic shellcode generators also create random decryptors
Background (cont’d) Early attempts to defeat polymorphic : X-ray analysis Heuristically decode polymorphic codes based on a portion of known, decoded instance to recover the encryption key Generic decryption Emulate execution while the shellcode decrypts itself Typically using a heuristic to guess when this process terminates Having decoded a malware shellcode, comparing it to other shellcode is another key problem. Approaches: Model each shellcode as a binary string and use traditional lexical distance measures Use structural distance measures that capture variation in the control flow and values at instruction, basic block, or function levels [11] [11] C. Kruegel, E. Kirda, D. Mutz, W. Robertson, and G. Vigna. Polymorphic Worm Detection Using Structural Information of Executables. In  Proceedings of Symposium on Recent Advances in Intrusion Detection (RAID) , Seattle, WA, Sept.2005.
Outline Introduction Background and Related Work Methodology Exploit Diversity Discussion and Conclusion
Methodology-Exploit Collection Primary means of collecting exploits is by examining network traces of traffic sent to and from  active responders . Active responders: hosts that respond to unsolicited probes (exploit attempts) Emulating end-host behavior allows us to collect more session data In particular, completing the infection handshake will suffice to cause the attack to transmit the shellcode For example: ISystemActivator and RemoteActivation exploit Require active responders to capture RPC Bind and Request
Methodology-Extracting Shellcodes Use Shield[29] to extract the shellcode for each exploit session from the traces However, not all of the collected data corresponds to executable code. Execution starts at an offset within the vulnerable buffer The buffer may contain random padding  [29] H. Wang, C. Guo, D. Simon, and A. Zugenmaier. Shield: Vulnerability-Driven Network Filters for Preventing Known Vulnerability Exploits. In  Proceedings of the ACM SIGCOMM Conference , Portland, Oregon, Sept. 2004.
Methodology-Exploit Emulation Decoding the exploits is often necessary to reveal most of the actual executable code The easiest way to deal with the variety of decoding routines is to use binary emulation We implement the emulator using Intel’s Pin[13] on Linux Given an encoded shellcode, we first declare it as a statically allocated buffer in C source code that treats the buffer as a function By Iteratively retrying failed emulations at subsequent offsets To overcome any issues with non-executable prefixes As Pin successfully emulates the binary, we mark the executed instruction bytes for later analysis
Methodology-Clustering Agglomerative clustering A form of hierarchical clustering Begins with each unique shellcode belonging to it own cluster. Performs merging on the closest ( distance ) pair of clusters Builds up a hierarchy of similarity among exploit samples by iteratively merging the closest pair of clusters at each step distance between clusters: the distance between the furthest samples in the two respective clusters
Methodology-Clustering (cont’d) Distance Metrics Exedit Distance Edit Distance Does not distinguish code from data Random padding generates further noise Structural Distance Control flow graph (CFG)[11] Do not capture subtle variation between related exploits because entire basic blocks are summarized [11]C. Kruegel, E. Kirda, D. Mutz, W. Robertson, and G. Vigna. Polymorphic Worm Detection Using Structural Information of Executables. In Proceedings of Symposium on Recent Advances in Intrusion Detection (RAID), Seattle, WA, Sept. 2005
Methodology-Clustering (cont’d) Exedit  distance metric Edit distance over  executed  parts of shellcode Distinguishes  code  from  data Maintains instruction-level details Canonical string for shellcode
Outline Introduction Background and Related Work Methodology Exploit Diversity Discussion and Conclusion
Exploit Diversity Four well-known vulnerabilities SQL Name Resolution (Slammer) LSASS (Sasser) MS RPC ISystemActivator (Blaster) MS RPC RemoteActivation (Blaster) Use methodology from Section 3 to  cluster the shellcodes according to their variability and thus identify shellcode families provide a detailed characterization of each family to both convey the structure of shellcode families as well as the subtle functional variations among them show the prevalence of each shellcode family in the trace The trace 1 Capture exploit attempts on a residential DSL network for 2 days (2005 9/6) Fully patched Windows XP SP2 29 IP addresses Respond to incoming requests The Trace 2 From a honeyfarm at the Lawrence Berkeley National Laboratory
SQL Name Resolution The Slammer worm (Jan. 2003) The outlier Its payload was likely corrupted one the network before being captured. The last 91 bytes: Un identified 22 bytes 20-byte IP header a UDP header The first 41 bytes of the Slammer exploit No exploit diversity
LSASS (Local Security Authority Subsystem Service) The original Sasser worm: Apr. 2004 A handful of variants were responsible for a large number of occurrences
LSASS (cont’d) Exedit Edit structural Not fundamental to the code Ignores subtle differences between shellcodes
LSASS (cont’d) Inter-family analysis (Manual analysis) The differences in variations are 2–20 bytes, and correspond to phone-home/connect-back IP addresses, hostnames, and ports encoded in the payload.   LSASS-1 The main body of the exploit followed immediately after the decoding loop The main body and data session were XOR-ed one byte at a time with key 0x99 LSASS-0 An unencoded main body followed by an encoded data session (byte-wise XOR with key 0xff) There are embedded URL strings belonged to previously classified malware LSASS-2,3,4 Share the same encoding scheme and roughly the same flow of execution
LSASS (cont’d) - Prevalence
ISystemActivator The Blaster worm (Aug 2003) Originally exploited the RemoteActivation The result of polymorphism? indicate that exploits within a family are similar, but that ISys families differ more substantially from each other than the LSASS exploit families
ISystemActivator We confirmed that there were six different code bases There was no code polymorphism The differences were due to variations in data constants, such as encodings of phone-home addresses and hostnames, as well as names of executables ISys-0 used a 4-byte, non-overlapping XOR to encode its payload, whereas all other exploits used a byte-by-byte XOR ISys-4 had the largest payload length and its flow of execution was the most complicated. The moderate exedit distance within the ISys-1 family (9%)  some different instructions, otherwise very similar. ISys-5 exploits had a characteristic execution flow performed consecutive jumps over two text sections “ tftp.exe -i <address> get <executable name>” <address> <executable name> accounted for 6.5% distance
ISystemActivator 4-byte decoding key Kernel-address loading function Function-finding block 4-byte encoding key Kernel base loader Function finder
ISystemActivator largest payload length and its flow of execution was the most complicated
ISystemActivator performed consecutive jumps over two text sections “ tftp.exe -i <address> get <executable name>” <address> <executable name> accounted for 6.5% distance
ISystemActivator Different instructions in parts, otherwise very similar
ISystemActivator “ Bind” version required the newly-infected host to  bind  on a socket and wait for a connection attempt from the infecting host “ Connect-back” version required the newly-infected host to  connect back  to the infecting host Interestingly, the number of iterations in ISys-3’s loop overshoots the exploit payload. Thus, it seems that either ISys-2 was a refinement of ISys-3, or that ISys-3 was a poor imitation of ISys-2.
ISystemActivator
RemoteActivation Unlike the other exploits, RemoteActivation exploits exhibited a high amount of exploit diversity per host
RemoteActivation (cont’d) Exedit distance is very small The byte-wise encoding scheme only covered the main bodies of the exploits, but different exploits used different keys. And with manual inspection we confirmed that variable encoding of the exploit’s main body contributed to the jump in average intra-family distance. Changing keys along with random filler characters are commonly described techniques for polymorphism, and the RemoteActivation exploits had both of these features. 0 : “Bind” version 1 : “Connect-back” version Manual inspection : the last third (roughly 300 bytes) of the payload contained randomly generated characters
Diversity Across Vulnerabilities The trace is a full-payload 4.5-day trace from a Windows honeyfarm running at the Lawrence Berkeley National Laboratory starting on April 19, 2006. Hosts in this honeyfarm served as active responders to incoming requests
Diversity Across Vulnerabilities (cont’d) Dendrogram for the LBL trace exploits using exedit distance. The 1st set of hash marks just below 0% represent ISystemActivator, the 2nd represent LSASS, the 3rd represent PNP, and the 4th represent RemoteActivation.
Diversity Across Vulnerabilities (cont’d) Multi-vector family
Discussion - Polymorphism We generated a small set of signatures that exhaustively covered all exploits we observed for each vulnerability in the DSL residential trace.  Each signature was a contiguous sequence of 100 bytes. For each individual vulnerability except LSASS, one signature sufficed to cover the set of exploits. LSASS required two: one covered 1645/1769 exploits, and the other covered the rest.  Manual investigation of these signatures showed that they primarily focused on the portions of the shellcode that were mostly (but not entirely) NOPs.  We then tested the signatures against a 5-GB trace on our internal network for false positives. None of the signatures yielded false positives in the internal trace The polymorphism was not effective for evasion ? Functional variation Increase the difficulty of reverse engineering
Conclusion This paper presents a methodology for constructing the phylogeny of remote code injection exploits. And evaluates this methodology on network traces taken from several vantage points.  The methodology is robust to the observed polymorphism The techniques reveal non-trivial code sharing among different exploit families, and the resulting phylogenies accurately capture the subtle variations among exploits within each family. Analyzing both the emergence of polymorphism and the phylogeny of remote code injection exploits is important

More Related Content

What's hot (20)

PPTX
Intrusion Detection with Neural Networks
antoniomorancardenas
 
PDF
Layered approach
ingenioustech
 
PDF
NICE: Network Intrusion Detection and Countermeasure Selection in Virtual Net...
Migrant Systems
 
DOC
A wireless intrusion detection system and a new attack model (synopsis)
Mumbai Academisc
 
PDF
F0371046050
inventionjournals
 
PDF
Icacci presentation-cnn intrusion
vinaykumar R
 
PDF
Replay of Malicious Traffic in Network Testbeds
DETER-Project
 
PPSX
Practical real-time intrusion detection using machine learning approaches
Full Stack Developer at Electro Mizan Andisheh
 
PPTX
Databse Intrusion Detection Using Data Mining Approach
Suraj Chauhan
 
PDF
AI for Cybersecurity Innovation
Pete Burnap
 
PDF
Project in malware analysis:C2C
Fabrizio Farinacci
 
PDF
IRJET- Penetration Testing using Metasploit Framework: An Ethical Approach
IRJET Journal
 
PDF
IRJET- SDN Multi-Controller based Framework to Detect and Mitigate DDoS i...
IRJET Journal
 
PDF
Optimized Intrusion Detection System using Deep Learning Algorithm
ijtsrd
 
PDF
Deep Learning based Threat / Intrusion detection system
Affine Analytics
 
PDF
Design and Implementation of Artificial Immune System for Detecting Flooding ...
Kent State University
 
PDF
SECURITY THREATS IN SENSOR NETWORK IN IOT: A SURVEY
Journal For Research
 
PDF
Ijnsa050214
IJNSA Journal
 
PPTX
Deep learning approach for network intrusion detection system
Avinash Kumar
 
PPT
Cloudslam09:Building a Cloud Computing Analysis System for Intrusion Detection
Wei-Yu Chen
 
Intrusion Detection with Neural Networks
antoniomorancardenas
 
Layered approach
ingenioustech
 
NICE: Network Intrusion Detection and Countermeasure Selection in Virtual Net...
Migrant Systems
 
A wireless intrusion detection system and a new attack model (synopsis)
Mumbai Academisc
 
F0371046050
inventionjournals
 
Icacci presentation-cnn intrusion
vinaykumar R
 
Replay of Malicious Traffic in Network Testbeds
DETER-Project
 
Practical real-time intrusion detection using machine learning approaches
Full Stack Developer at Electro Mizan Andisheh
 
Databse Intrusion Detection Using Data Mining Approach
Suraj Chauhan
 
AI for Cybersecurity Innovation
Pete Burnap
 
Project in malware analysis:C2C
Fabrizio Farinacci
 
IRJET- Penetration Testing using Metasploit Framework: An Ethical Approach
IRJET Journal
 
IRJET- SDN Multi-Controller based Framework to Detect and Mitigate DDoS i...
IRJET Journal
 
Optimized Intrusion Detection System using Deep Learning Algorithm
ijtsrd
 
Deep Learning based Threat / Intrusion detection system
Affine Analytics
 
Design and Implementation of Artificial Immune System for Detecting Flooding ...
Kent State University
 
SECURITY THREATS IN SENSOR NETWORK IN IOT: A SURVEY
Journal For Research
 
Ijnsa050214
IJNSA Journal
 
Deep learning approach for network intrusion detection system
Avinash Kumar
 
Cloudslam09:Building a Cloud Computing Analysis System for Intrusion Detection
Wei-Yu Chen
 

Viewers also liked (6)

PPT
Qué Es Internet
laura.com
 
PPT
Soundplanning community
Salvatore Marras
 
PPT
Perfil NicoláS GonzáLez
Susana Micheli
 
PPT
Maramaro Rodriguez Ramirez Zabala
Susana Micheli
 
PPT
Apresentacao Positioning 20062007
thiagoliveira
 
PPS
Solsticio Invierno 2007
Ramiro Peña Guzmán
 
Qué Es Internet
laura.com
 
Soundplanning community
Salvatore Marras
 
Perfil NicoláS GonzáLez
Susana Micheli
 
Maramaro Rodriguez Ramirez Zabala
Susana Micheli
 
Apresentacao Positioning 20062007
thiagoliveira
 
Solsticio Invierno 2007
Ramiro Peña Guzmán
 
Ad

Similar to Finding Diversity In Remote Code Injection Exploits (20)

PPTX
Anatomy of a Buffer Overflow Attack
Rob Gillen
 
PDF
Reverse engineering - Shellcodes techniques
Eran Goldstein
 
PPTX
Picking apart the morris worm
Jayakrishna Menon
 
PPTX
ETCSS: Into the Mind of a Hacker
Rob Gillen
 
PDF
2011-03 Developing Windows Exploits
Raleigh ISSA
 
PDF
Fuzzing: Finding Your Own Bugs and 0days! at Arab Security Conference
Rodolpho Concurde
 
PPT
shostack-blackhat-991.ppt YUGUUYGYGUUYUHJ
Abodahab
 
PPT
Software security
Roman Oliynykov
 
PDF
Fuzzing: Finding Your Own Bugs and 0days! 1.0
Rodolpho Concurde
 
PDF
Intro to Exploitation
UTD Computer Security Group
 
PDF
Dive into exploit development
Payampardaz
 
PPT
Whittaker How To Break Software Security - SoftTest Ireland
David O'Dowd
 
PDF
Reverse engineering – debugging fundamentals
Eran Goldstein
 
PDF
Unix executable buffer overflow
Ammarit Thongthua ,CISSP CISM GXPN CSSLP CCNP
 
PDF
stackconf 2021 | Fuzzing: Finding Your Own Bugs and 0days!
NETWAYS
 
PPTX
Vulnerability, exploit to metasploit
Tiago Henriques
 
PPTX
Linux binary analysis and exploitation
Dharmalingam Ganesan
 
PDF
Blended attacks exploits, vulnerabilities and buffer overflow techniques in c...
UltraUploader
 
DOCX
Backtrack Manual Part7
Nutan Kumar Panda
 
PPT
Linux Operating System Vulnerabilities
Information Technology
 
Anatomy of a Buffer Overflow Attack
Rob Gillen
 
Reverse engineering - Shellcodes techniques
Eran Goldstein
 
Picking apart the morris worm
Jayakrishna Menon
 
ETCSS: Into the Mind of a Hacker
Rob Gillen
 
2011-03 Developing Windows Exploits
Raleigh ISSA
 
Fuzzing: Finding Your Own Bugs and 0days! at Arab Security Conference
Rodolpho Concurde
 
shostack-blackhat-991.ppt YUGUUYGYGUUYUHJ
Abodahab
 
Software security
Roman Oliynykov
 
Fuzzing: Finding Your Own Bugs and 0days! 1.0
Rodolpho Concurde
 
Intro to Exploitation
UTD Computer Security Group
 
Dive into exploit development
Payampardaz
 
Whittaker How To Break Software Security - SoftTest Ireland
David O'Dowd
 
Reverse engineering – debugging fundamentals
Eran Goldstein
 
Unix executable buffer overflow
Ammarit Thongthua ,CISSP CISM GXPN CSSLP CCNP
 
stackconf 2021 | Fuzzing: Finding Your Own Bugs and 0days!
NETWAYS
 
Vulnerability, exploit to metasploit
Tiago Henriques
 
Linux binary analysis and exploitation
Dharmalingam Ganesan
 
Blended attacks exploits, vulnerabilities and buffer overflow techniques in c...
UltraUploader
 
Backtrack Manual Part7
Nutan Kumar Panda
 
Linux Operating System Vulnerabilities
Information Technology
 
Ad

More from amiable_indian (20)

PDF
Phishing As Tragedy of the Commons
amiable_indian
 
PDF
Cisco IOS Attack & Defense - The State of the Art
amiable_indian
 
PDF
Secrets of Top Pentesters
amiable_indian
 
PPS
Workshop on Wireless Security
amiable_indian
 
PDF
Insecure Implementation of Security Best Practices: of hashing, CAPTCHA's and...
amiable_indian
 
PPS
Workshop on BackTrack live CD
amiable_indian
 
PPS
Reverse Engineering for exploit writers
amiable_indian
 
PPS
State of Cyber Law in India
amiable_indian
 
PPS
AntiSpam - Understanding the good, the bad and the ugly
amiable_indian
 
PPS
Reverse Engineering v/s Secure Coding
amiable_indian
 
PPS
Network Vulnerability Assessments: Lessons Learned
amiable_indian
 
PPS
Economic offenses through Credit Card Frauds Dissected
amiable_indian
 
PPS
Immune IT: Moving from Security to Immunity
amiable_indian
 
PPS
Reverse Engineering for exploit writers
amiable_indian
 
PPS
Hacking Client Side Insecurities
amiable_indian
 
PDF
Web Exploit Finder Presentation
amiable_indian
 
PPT
Network Security Data Visualization
amiable_indian
 
PPT
Enhancing Computer Security via End-to-End Communication Visualization
amiable_indian
 
PDF
Top Network Vulnerabilities Over Time
amiable_indian
 
PDF
What are the Business Security Metrics?
amiable_indian
 
Phishing As Tragedy of the Commons
amiable_indian
 
Cisco IOS Attack & Defense - The State of the Art
amiable_indian
 
Secrets of Top Pentesters
amiable_indian
 
Workshop on Wireless Security
amiable_indian
 
Insecure Implementation of Security Best Practices: of hashing, CAPTCHA's and...
amiable_indian
 
Workshop on BackTrack live CD
amiable_indian
 
Reverse Engineering for exploit writers
amiable_indian
 
State of Cyber Law in India
amiable_indian
 
AntiSpam - Understanding the good, the bad and the ugly
amiable_indian
 
Reverse Engineering v/s Secure Coding
amiable_indian
 
Network Vulnerability Assessments: Lessons Learned
amiable_indian
 
Economic offenses through Credit Card Frauds Dissected
amiable_indian
 
Immune IT: Moving from Security to Immunity
amiable_indian
 
Reverse Engineering for exploit writers
amiable_indian
 
Hacking Client Side Insecurities
amiable_indian
 
Web Exploit Finder Presentation
amiable_indian
 
Network Security Data Visualization
amiable_indian
 
Enhancing Computer Security via End-to-End Communication Visualization
amiable_indian
 
Top Network Vulnerabilities Over Time
amiable_indian
 
What are the Business Security Metrics?
amiable_indian
 

Recently uploaded (20)

PDF
Simplify Your FME Flow Setup: Fault-Tolerant Deployment Made Easy with Packer...
Safe Software
 
PPTX
Paycifi - Programmable Trust_Breakfast_PPTXT
FinTech Belgium
 
PDF
Understanding AI Optimization AIO, LLMO, and GEO
CoDigital
 
PDF
Kubernetes - Architecture & Components.pdf
geethak285
 
PDF
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
PDF
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
DOCX
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
 
PDF
LLM Search Readiness Audit - Dentsu x SEO Square - June 2025.pdf
Nick Samuel
 
PDF
Optimizing the trajectory of a wheel loader working in short loading cycles
Reno Filla
 
PDF
Proactive Server and System Monitoring with FME: Using HTTP and System Caller...
Safe Software
 
PPTX
Reimaginando la Ciberdefensa: De Copilots a Redes de Agentes
Cristian Garcia G.
 
PPTX
Enabling the Digital Artisan – keynote at ICOCI 2025
Alan Dix
 
PPTX
Mastering Authorization: Integrating Authentication and Authorization Data in...
Hitachi, Ltd. OSS Solution Center.
 
PPTX
Smarter Governance with AI: What Every Board Needs to Know
OnBoard
 
PPTX
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
PDF
Hello I'm "AI" Your New _________________
Dr. Tathagat Varma
 
PDF
Java 25 and Beyond - A Roadmap of Innovations
Ana-Maria Mihalceanu
 
PDF
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 
PDF
DoS Attack vs DDoS Attack_ The Silent Wars of the Internet.pdf
CyberPro Magazine
 
PDF
ArcGIS Utility Network Migration - The Hunter Water Story
Safe Software
 
Simplify Your FME Flow Setup: Fault-Tolerant Deployment Made Easy with Packer...
Safe Software
 
Paycifi - Programmable Trust_Breakfast_PPTXT
FinTech Belgium
 
Understanding AI Optimization AIO, LLMO, and GEO
CoDigital
 
Kubernetes - Architecture & Components.pdf
geethak285
 
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
 
LLM Search Readiness Audit - Dentsu x SEO Square - June 2025.pdf
Nick Samuel
 
Optimizing the trajectory of a wheel loader working in short loading cycles
Reno Filla
 
Proactive Server and System Monitoring with FME: Using HTTP and System Caller...
Safe Software
 
Reimaginando la Ciberdefensa: De Copilots a Redes de Agentes
Cristian Garcia G.
 
Enabling the Digital Artisan – keynote at ICOCI 2025
Alan Dix
 
Mastering Authorization: Integrating Authentication and Authorization Data in...
Hitachi, Ltd. OSS Solution Center.
 
Smarter Governance with AI: What Every Board Needs to Know
OnBoard
 
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
Hello I'm "AI" Your New _________________
Dr. Tathagat Varma
 
Java 25 and Beyond - A Roadmap of Innovations
Ana-Maria Mihalceanu
 
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 
DoS Attack vs DDoS Attack_ The Silent Wars of the Internet.pdf
CyberPro Magazine
 
ArcGIS Utility Network Migration - The Hunter Water Story
Safe Software
 

Finding Diversity In Remote Code Injection Exploits

  • 1. Finding Diversity in Remote Code Injection Exploits Justin Ma , John Dunagan , Helen J. Wang , Stefan Savage , Geoffrey M. Voelker University of California, San Diego Microsoft Research Internet Measurement Conference 2006
  • 2. Outline Introduction Background and Related Work Methodology Exploit Diversity Discussion and Conclusion
  • 3. Introduction Internet users are increasingly victimized by online criminal enterprise that spans denial-of-service extortion, identity theft, piracy and unsolicited bulk email At the core of these activities is malware Software used to remotely compromise and harness the resources of millions of hosts There is little research describing the malware ecosystem itself How does one piece of malware relate to another? What pressures drive its structural and functional evolution? This paper focuses on how to identify and measure the diversity among remote code injection exploits
  • 4. Introduction (cont’d) Typically, a host is compromised via a software vulnerability (e.g. buffer overflow ) that allows network-based input to be “injected” into a running program and executed. Subsequently, the exploit payload may Download additional software Reconfigure the OS to evade detection, etc. This paper focuses on the exploit and its initial payload -the so-called shellcodes. Shellcodes : First executed on a newly compromised machine Are typically small, simple, hand-coded machine programs Are well-suited to automated analysis Understand how much variation exists among the shellcodes for an exploit Measure shellcode diversity better understand how malware is created. Infer the paternity of different samples Construct a shellcode phylogeny
  • 5. Outline Introduction Background and Related Work Methodology Exploit Diversity Discussion and Conclusion
  • 6. Background Remote code injection attacks are a combination of vulnerability, exploit and shellcode The vulnerability is the particular software structure that allows data provided over the network to subvert and redirect execution control flow An unchecked buffer Overwrite the return address of the calling stack frame An exploit is a particular formulation of a attack against a vulnerability The shellcode is the payload carried by the exploit—it is the first code to execute
  • 7. Stack Buffer Overflow Simple example of a remote stack-based buffer overflow. The shaded regions represent the shellcode of the exploit as sent over network packets, then as injected into the vulnerable buffer of the target host. The return address has been overwritten with injected data, thereby redirecting the execution flow to the shellcode residing in the vulnerable buffer .
  • 8. Background (cont’d) Shellcodes: Are frequently limited by the size of the buffer being processed The need for the buffer to contain “NOP sleds” or long regions of consecutive “do nothing” instructions Can be quite sophisticated in their construction : The creation of pseudo-random NOP sleds Polymorphic payloads that are encrypted (and potentially compressed) in transit and only decrypted just before execution Some polymorphic shellcode generators also create random decryptors
  • 9. Background (cont’d) Early attempts to defeat polymorphic : X-ray analysis Heuristically decode polymorphic codes based on a portion of known, decoded instance to recover the encryption key Generic decryption Emulate execution while the shellcode decrypts itself Typically using a heuristic to guess when this process terminates Having decoded a malware shellcode, comparing it to other shellcode is another key problem. Approaches: Model each shellcode as a binary string and use traditional lexical distance measures Use structural distance measures that capture variation in the control flow and values at instruction, basic block, or function levels [11] [11] C. Kruegel, E. Kirda, D. Mutz, W. Robertson, and G. Vigna. Polymorphic Worm Detection Using Structural Information of Executables. In Proceedings of Symposium on Recent Advances in Intrusion Detection (RAID) , Seattle, WA, Sept.2005.
  • 10. Outline Introduction Background and Related Work Methodology Exploit Diversity Discussion and Conclusion
  • 11. Methodology-Exploit Collection Primary means of collecting exploits is by examining network traces of traffic sent to and from active responders . Active responders: hosts that respond to unsolicited probes (exploit attempts) Emulating end-host behavior allows us to collect more session data In particular, completing the infection handshake will suffice to cause the attack to transmit the shellcode For example: ISystemActivator and RemoteActivation exploit Require active responders to capture RPC Bind and Request
  • 12. Methodology-Extracting Shellcodes Use Shield[29] to extract the shellcode for each exploit session from the traces However, not all of the collected data corresponds to executable code. Execution starts at an offset within the vulnerable buffer The buffer may contain random padding [29] H. Wang, C. Guo, D. Simon, and A. Zugenmaier. Shield: Vulnerability-Driven Network Filters for Preventing Known Vulnerability Exploits. In Proceedings of the ACM SIGCOMM Conference , Portland, Oregon, Sept. 2004.
  • 13. Methodology-Exploit Emulation Decoding the exploits is often necessary to reveal most of the actual executable code The easiest way to deal with the variety of decoding routines is to use binary emulation We implement the emulator using Intel’s Pin[13] on Linux Given an encoded shellcode, we first declare it as a statically allocated buffer in C source code that treats the buffer as a function By Iteratively retrying failed emulations at subsequent offsets To overcome any issues with non-executable prefixes As Pin successfully emulates the binary, we mark the executed instruction bytes for later analysis
  • 14. Methodology-Clustering Agglomerative clustering A form of hierarchical clustering Begins with each unique shellcode belonging to it own cluster. Performs merging on the closest ( distance ) pair of clusters Builds up a hierarchy of similarity among exploit samples by iteratively merging the closest pair of clusters at each step distance between clusters: the distance between the furthest samples in the two respective clusters
  • 15. Methodology-Clustering (cont’d) Distance Metrics Exedit Distance Edit Distance Does not distinguish code from data Random padding generates further noise Structural Distance Control flow graph (CFG)[11] Do not capture subtle variation between related exploits because entire basic blocks are summarized [11]C. Kruegel, E. Kirda, D. Mutz, W. Robertson, and G. Vigna. Polymorphic Worm Detection Using Structural Information of Executables. In Proceedings of Symposium on Recent Advances in Intrusion Detection (RAID), Seattle, WA, Sept. 2005
  • 16. Methodology-Clustering (cont’d) Exedit distance metric Edit distance over executed parts of shellcode Distinguishes code from data Maintains instruction-level details Canonical string for shellcode
  • 17. Outline Introduction Background and Related Work Methodology Exploit Diversity Discussion and Conclusion
  • 18. Exploit Diversity Four well-known vulnerabilities SQL Name Resolution (Slammer) LSASS (Sasser) MS RPC ISystemActivator (Blaster) MS RPC RemoteActivation (Blaster) Use methodology from Section 3 to cluster the shellcodes according to their variability and thus identify shellcode families provide a detailed characterization of each family to both convey the structure of shellcode families as well as the subtle functional variations among them show the prevalence of each shellcode family in the trace The trace 1 Capture exploit attempts on a residential DSL network for 2 days (2005 9/6) Fully patched Windows XP SP2 29 IP addresses Respond to incoming requests The Trace 2 From a honeyfarm at the Lawrence Berkeley National Laboratory
  • 19. SQL Name Resolution The Slammer worm (Jan. 2003) The outlier Its payload was likely corrupted one the network before being captured. The last 91 bytes: Un identified 22 bytes 20-byte IP header a UDP header The first 41 bytes of the Slammer exploit No exploit diversity
  • 20. LSASS (Local Security Authority Subsystem Service) The original Sasser worm: Apr. 2004 A handful of variants were responsible for a large number of occurrences
  • 21. LSASS (cont’d) Exedit Edit structural Not fundamental to the code Ignores subtle differences between shellcodes
  • 22. LSASS (cont’d) Inter-family analysis (Manual analysis) The differences in variations are 2–20 bytes, and correspond to phone-home/connect-back IP addresses, hostnames, and ports encoded in the payload. LSASS-1 The main body of the exploit followed immediately after the decoding loop The main body and data session were XOR-ed one byte at a time with key 0x99 LSASS-0 An unencoded main body followed by an encoded data session (byte-wise XOR with key 0xff) There are embedded URL strings belonged to previously classified malware LSASS-2,3,4 Share the same encoding scheme and roughly the same flow of execution
  • 23. LSASS (cont’d) - Prevalence
  • 24. ISystemActivator The Blaster worm (Aug 2003) Originally exploited the RemoteActivation The result of polymorphism? indicate that exploits within a family are similar, but that ISys families differ more substantially from each other than the LSASS exploit families
  • 25. ISystemActivator We confirmed that there were six different code bases There was no code polymorphism The differences were due to variations in data constants, such as encodings of phone-home addresses and hostnames, as well as names of executables ISys-0 used a 4-byte, non-overlapping XOR to encode its payload, whereas all other exploits used a byte-by-byte XOR ISys-4 had the largest payload length and its flow of execution was the most complicated. The moderate exedit distance within the ISys-1 family (9%) some different instructions, otherwise very similar. ISys-5 exploits had a characteristic execution flow performed consecutive jumps over two text sections “ tftp.exe -i <address> get <executable name>” <address> <executable name> accounted for 6.5% distance
  • 26. ISystemActivator 4-byte decoding key Kernel-address loading function Function-finding block 4-byte encoding key Kernel base loader Function finder
  • 27. ISystemActivator largest payload length and its flow of execution was the most complicated
  • 28. ISystemActivator performed consecutive jumps over two text sections “ tftp.exe -i <address> get <executable name>” <address> <executable name> accounted for 6.5% distance
  • 29. ISystemActivator Different instructions in parts, otherwise very similar
  • 30. ISystemActivator “ Bind” version required the newly-infected host to bind on a socket and wait for a connection attempt from the infecting host “ Connect-back” version required the newly-infected host to connect back to the infecting host Interestingly, the number of iterations in ISys-3’s loop overshoots the exploit payload. Thus, it seems that either ISys-2 was a refinement of ISys-3, or that ISys-3 was a poor imitation of ISys-2.
  • 32. RemoteActivation Unlike the other exploits, RemoteActivation exploits exhibited a high amount of exploit diversity per host
  • 33. RemoteActivation (cont’d) Exedit distance is very small The byte-wise encoding scheme only covered the main bodies of the exploits, but different exploits used different keys. And with manual inspection we confirmed that variable encoding of the exploit’s main body contributed to the jump in average intra-family distance. Changing keys along with random filler characters are commonly described techniques for polymorphism, and the RemoteActivation exploits had both of these features. 0 : “Bind” version 1 : “Connect-back” version Manual inspection : the last third (roughly 300 bytes) of the payload contained randomly generated characters
  • 34. Diversity Across Vulnerabilities The trace is a full-payload 4.5-day trace from a Windows honeyfarm running at the Lawrence Berkeley National Laboratory starting on April 19, 2006. Hosts in this honeyfarm served as active responders to incoming requests
  • 35. Diversity Across Vulnerabilities (cont’d) Dendrogram for the LBL trace exploits using exedit distance. The 1st set of hash marks just below 0% represent ISystemActivator, the 2nd represent LSASS, the 3rd represent PNP, and the 4th represent RemoteActivation.
  • 36. Diversity Across Vulnerabilities (cont’d) Multi-vector family
  • 37. Discussion - Polymorphism We generated a small set of signatures that exhaustively covered all exploits we observed for each vulnerability in the DSL residential trace. Each signature was a contiguous sequence of 100 bytes. For each individual vulnerability except LSASS, one signature sufficed to cover the set of exploits. LSASS required two: one covered 1645/1769 exploits, and the other covered the rest. Manual investigation of these signatures showed that they primarily focused on the portions of the shellcode that were mostly (but not entirely) NOPs. We then tested the signatures against a 5-GB trace on our internal network for false positives. None of the signatures yielded false positives in the internal trace The polymorphism was not effective for evasion ? Functional variation Increase the difficulty of reverse engineering
  • 38. Conclusion This paper presents a methodology for constructing the phylogeny of remote code injection exploits. And evaluates this methodology on network traces taken from several vantage points. The methodology is robust to the observed polymorphism The techniques reveal non-trivial code sharing among different exploit families, and the resulting phylogenies accurately capture the subtle variations among exploits within each family. Analyzing both the emergence of polymorphism and the phylogeny of remote code injection exploits is important