SlideShare a Scribd company logo
Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection of unknown computer worms based on behavioral classification of the host Robert Moskovitch ,Yuval Elovici ,Lior Rokach
Worms  Worms are considered malicious in nature Worms propagate actively over a network, while other types of malicious codes, such as viruses, commonly require human activity to propagate Viruses infect a file (its host), a worm does not require a host file .
What do Antivirus Packages do ? Antivirus software packages inspect each file that enters the system, looking for known  signatures which uniquely identify an  instance of known malcode Polymorphism and metamorphism are two  common obfuscation techniques used by malware  writers Polymorphic virus obfuscates its decryption loop  using several transformations, such as nop- insertion, code transposition
Obfuscation Techniques Metamorphic viruses attempt to evade detection by obfuscating the entire virus. When they  replicate, these viruses change their code in a  variety of ways, such as code transposition,  substitution of equivalent instruction sequences,  change of conditional jumps, and register  reassignment
Example Virus Code :   Morphed Virus Code(From Chernobyl CIH1.4) Loop :  Loop : pop ecx Loop: pop ecx pop ecx   nop   nop  jecxz SFModMark jecxz SFModMark jmp L1 mov esi , ecx  xor ebx , ebx  L3: call edi mov eax , 0d601h beqz N1 xor ebx , ebx  Pop edx  N1: mov esi , ecx beqz N2 Pop ecx  nop N2: jmp Loop  Call edi  mov eax ,0d601h jmp l4 pop edx  L2: nop pop ecx mov eax , 0d601h nop pop edx   Xor ebx , ebx   call edi  pop ecx   beqz  N1 Xor ebx , ebx nop  N1:  mov esi , ecx  beqz N2 jmp L3 jmp L2 N2:  JMP loop L1:  jecxz SFModMark L4:
Current Methods Existing methods rely on the analysis of the  binary for the detection of unknown malcode. Some less typical worms are left undetectable.  Therefore an additional detection layer at  runtime is  required
Proposed Approach Malicious actions are reflected in the  general  behavior of the host. By monitoring the host, one  can inexplicitly identify malcodes. A classifier is trained with computer  measurements  from infected and not  infected  computers.
Contributions of the Paper Machine learning techniques are capable of  detecting and classifying worms Using feature selection techniques to show that a  relatively small set of features are sufficient for  solving the problem without sacrifice accuracy.  Empirical results from an extensive study of  various machine configurations suggesting that  the proposed methods achieve high detection  rates on previously unseen worms.
Train and Test Phase
Dataset creation Lab network consisted of seven computers, which  contained heterogenic hardware, and a server  computer simulating the internet.  Used the windows performance counters and  Vtrace which enable monitoring system features A vector of 323 features for every second. Choose worms that differ in their behavior, from  among the available worms
Dataset Description
Feature selection methods Chi-Square Gain Ratio Relief Features’ ensemble :  fi is a feature, filter is one of the k filtering (feature selection) methods.
Consolidating features from different environments:Averaged and Unified
Feature Sets
Classification algorithms Decision Trees,  Naıve Bayes,  Bayesian Networks Artificial Neural Networks
Evaluation measures
Experiment I Each classifier is trained on a single dataset i  and tested on each one ( j ) of the eight datasets.  Eight  corresponding evaluations were done on  each one  of the datasets, resulting in 64  evaluation runs. When i = j , 10 fold cross validation, in which the  dataset is randomly partitioned into ten  partitions  and repeatedly the classifier is  trained on nine  partitions and tested on the  tenth.
Experiment I (Contd) Each evaluation run (out of the 64) was repeated  for each one of the  combinations of feature  selection  method, classification algorithm, and  number of top features.  Each evaluation run was repeated for the 33  feature set described earlier 132 (four classification algorithms applied to 33  feature sets) evaluations (each comprises 64  runs), summing up to 8448 evaluation runs.
Results
Results(Contd)
Results(Contd)
Experiment II Classifiers based on part of the (five) worms and  the none activity, and tested on the excluded  worms (from the training set) and the none  activity Training set consisted of 5 − k worms and the  testing set contained the k excluded worms,  while the none activity appeared in both  datasets.  This process repeated for all the possible  combinations of the k worms (k = 1–4).  The Top20 features, which outperformed in e1  were used
Results
Conclusion Q1: In the detection of known malicious code,  based on a computer’s measurements, using  machine learning techniques, what is the  achievable level of accuracy? Q2: Is it possible to reduce the number of  features to below 30, while maintaining a high  level of accuracy
Conclusions(Contd) Q3: Will the computer configuration and the  computer background activity, from which the  training sets were taken, have a significant  influence on the detection accuracy? Q4: Is the detection of unknown worms possible,  based on a training set of known worms?

More Related Content

PDF
Bench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization
PPTX
iFixR: Bug Report Driven Program Repair
PDF
Effective Fault-Localization Techniques for Concurrent Software
PDF
Known XML Vulnerabilities Are Still a Threat to Popular Parsers ! & Open Sour...
PPTX
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
PPTX
ACSAC2016: Code Obfuscation Against Symbolic Execution Attacks
PDF
Automated Vulnerability Testing Using Machine Learning and Metaheuristic Search
PDF
Applying Machine Learning to Software Clustering
Bench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization
iFixR: Bug Report Driven Program Repair
Effective Fault-Localization Techniques for Concurrent Software
Known XML Vulnerabilities Are Still a Threat to Popular Parsers ! & Open Sour...
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
ACSAC2016: Code Obfuscation Against Symbolic Execution Attacks
Automated Vulnerability Testing Using Machine Learning and Metaheuristic Search
Applying Machine Learning to Software Clustering

Viewers also liked (6)

PPT
[ppt]
PPTX
Neural network for machine learning
PPTX
Application of machine learning in industrial applications
PPTX
Machine Learning with Applications in Categorization, Popularity and Sequence...
PPTX
Machine Learning and Real-World Applications
PDF
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
[ppt]
Neural network for machine learning
Application of machine learning in industrial applications
Machine Learning with Applications in Categorization, Popularity and Sequence...
Machine Learning and Real-World Applications
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
Ad

Similar to CISC 879 - Machine Learning for Solving Systems Problems (20)

PDF
A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavi...
ODP
Malware Dectection Using Machine learning
PDF
Model-checking for efficient malware detection
PPTX
MINI PROJECT s.pptx
PDF
Sensor Organism project presentation
PDF
Automatically generated win32 heuristic virus detection
DOCX
A malware detection method for health sensor data based on machine learning
PDF
System Event Monitoring for Active Authentication
PDF
Identification of Port-Scans in Honeypot Traffic Using Unsupervised Anomaly D...
PDF
A STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODS
PDF
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
PPTX
Topic 2. Fundamentals of Malware Analysis .pptx
PDF
GPU Computing for Cognitive Robotics
PPT
The Future of Automated Malware Generation
PDF
Application of genetic algorithm in intrusion detection system
PDF
DB-OLS: An Approach for IDS1
PDF
"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...
PDF
Fast detection of Android malware: machine learning approach
DOCX
DETECTION OF NETWORK INTRUSION USING DCGANSEMI-SUPERVISED APPROACH.docx
PPTX
Intrusion Detection Model using Self Organizing Maps.
A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavi...
Malware Dectection Using Machine learning
Model-checking for efficient malware detection
MINI PROJECT s.pptx
Sensor Organism project presentation
Automatically generated win32 heuristic virus detection
A malware detection method for health sensor data based on machine learning
System Event Monitoring for Active Authentication
Identification of Port-Scans in Honeypot Traffic Using Unsupervised Anomaly D...
A STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODS
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
Topic 2. Fundamentals of Malware Analysis .pptx
GPU Computing for Cognitive Robotics
The Future of Automated Malware Generation
Application of genetic algorithm in intrusion detection system
DB-OLS: An Approach for IDS1
"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...
Fast detection of Android malware: machine learning approach
DETECTION OF NETWORK INTRUSION USING DCGANSEMI-SUPERVISED APPROACH.docx
Intrusion Detection Model using Self Organizing Maps.
Ad

More from butest (20)

PDF
EL MODELO DE NEGOCIO DE YOUTUBE
DOC
1. MPEG I.B.P frame之不同
PDF
LESSONS FROM THE MICHAEL JACKSON TRIAL
PPT
Timeline: The Life of Michael Jackson
DOCX
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
PDF
LESSONS FROM THE MICHAEL JACKSON TRIAL
PPTX
Com 380, Summer II
PPT
PPT
DOCX
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
DOC
MICHAEL JACKSON.doc
PPTX
Social Networks: Twitter Facebook SL - Slide 1
PPT
Facebook
DOCX
Executive Summary Hare Chevrolet is a General Motors dealership ...
DOC
Welcome to the Dougherty County Public Library's Facebook and ...
DOC
NEWS ANNOUNCEMENT
DOC
C-2100 Ultra Zoom.doc
DOC
MAC Printing on ITS Printers.doc.doc
DOC
Mac OS X Guide.doc
DOC
hier
DOC
WEB DESIGN!
EL MODELO DE NEGOCIO DE YOUTUBE
1. MPEG I.B.P frame之不同
LESSONS FROM THE MICHAEL JACKSON TRIAL
Timeline: The Life of Michael Jackson
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
LESSONS FROM THE MICHAEL JACKSON TRIAL
Com 380, Summer II
PPT
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
MICHAEL JACKSON.doc
Social Networks: Twitter Facebook SL - Slide 1
Facebook
Executive Summary Hare Chevrolet is a General Motors dealership ...
Welcome to the Dougherty County Public Library's Facebook and ...
NEWS ANNOUNCEMENT
C-2100 Ultra Zoom.doc
MAC Printing on ITS Printers.doc.doc
Mac OS X Guide.doc
hier
WEB DESIGN!

CISC 879 - Machine Learning for Solving Systems Problems

  • 1. Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection of unknown computer worms based on behavioral classification of the host Robert Moskovitch ,Yuval Elovici ,Lior Rokach
  • 2. Worms Worms are considered malicious in nature Worms propagate actively over a network, while other types of malicious codes, such as viruses, commonly require human activity to propagate Viruses infect a file (its host), a worm does not require a host file .
  • 3. What do Antivirus Packages do ? Antivirus software packages inspect each file that enters the system, looking for known signatures which uniquely identify an instance of known malcode Polymorphism and metamorphism are two common obfuscation techniques used by malware writers Polymorphic virus obfuscates its decryption loop using several transformations, such as nop- insertion, code transposition
  • 4. Obfuscation Techniques Metamorphic viruses attempt to evade detection by obfuscating the entire virus. When they replicate, these viruses change their code in a variety of ways, such as code transposition, substitution of equivalent instruction sequences, change of conditional jumps, and register reassignment
  • 5. Example Virus Code : Morphed Virus Code(From Chernobyl CIH1.4) Loop : Loop : pop ecx Loop: pop ecx pop ecx nop nop jecxz SFModMark jecxz SFModMark jmp L1 mov esi , ecx xor ebx , ebx L3: call edi mov eax , 0d601h beqz N1 xor ebx , ebx Pop edx N1: mov esi , ecx beqz N2 Pop ecx nop N2: jmp Loop Call edi mov eax ,0d601h jmp l4 pop edx L2: nop pop ecx mov eax , 0d601h nop pop edx Xor ebx , ebx call edi pop ecx beqz N1 Xor ebx , ebx nop N1: mov esi , ecx beqz N2 jmp L3 jmp L2 N2: JMP loop L1: jecxz SFModMark L4:
  • 6. Current Methods Existing methods rely on the analysis of the binary for the detection of unknown malcode. Some less typical worms are left undetectable. Therefore an additional detection layer at runtime is required
  • 7. Proposed Approach Malicious actions are reflected in the general behavior of the host. By monitoring the host, one can inexplicitly identify malcodes. A classifier is trained with computer measurements from infected and not infected computers.
  • 8. Contributions of the Paper Machine learning techniques are capable of detecting and classifying worms Using feature selection techniques to show that a relatively small set of features are sufficient for solving the problem without sacrifice accuracy. Empirical results from an extensive study of various machine configurations suggesting that the proposed methods achieve high detection rates on previously unseen worms.
  • 10. Dataset creation Lab network consisted of seven computers, which contained heterogenic hardware, and a server computer simulating the internet. Used the windows performance counters and Vtrace which enable monitoring system features A vector of 323 features for every second. Choose worms that differ in their behavior, from among the available worms
  • 12. Feature selection methods Chi-Square Gain Ratio Relief Features’ ensemble : fi is a feature, filter is one of the k filtering (feature selection) methods.
  • 13. Consolidating features from different environments:Averaged and Unified
  • 15. Classification algorithms Decision Trees, Naıve Bayes, Bayesian Networks Artificial Neural Networks
  • 17. Experiment I Each classifier is trained on a single dataset i and tested on each one ( j ) of the eight datasets. Eight corresponding evaluations were done on each one of the datasets, resulting in 64 evaluation runs. When i = j , 10 fold cross validation, in which the dataset is randomly partitioned into ten partitions and repeatedly the classifier is trained on nine partitions and tested on the tenth.
  • 18. Experiment I (Contd) Each evaluation run (out of the 64) was repeated for each one of the combinations of feature selection method, classification algorithm, and number of top features. Each evaluation run was repeated for the 33 feature set described earlier 132 (four classification algorithms applied to 33 feature sets) evaluations (each comprises 64 runs), summing up to 8448 evaluation runs.
  • 22. Experiment II Classifiers based on part of the (five) worms and the none activity, and tested on the excluded worms (from the training set) and the none activity Training set consisted of 5 − k worms and the testing set contained the k excluded worms, while the none activity appeared in both datasets. This process repeated for all the possible combinations of the k worms (k = 1–4). The Top20 features, which outperformed in e1 were used
  • 24. Conclusion Q1: In the detection of known malicious code, based on a computer’s measurements, using machine learning techniques, what is the achievable level of accuracy? Q2: Is it possible to reduce the number of features to below 30, while maintaining a high level of accuracy
  • 25. Conclusions(Contd) Q3: Will the computer configuration and the computer background activity, from which the training sets were taken, have a significant influence on the detection accuracy? Q4: Is the detection of unknown worms possible, based on a training set of known worms?