SlideShare a Scribd company logo
Contents
• Abstract
• Existing System
• Proposed System
• Literature Survey
• Requirements
• System Design
• Results
Contents
A Novel Malware Detection Technique using Machine Learning
Abstract
Abstract
A Novel Malware Detection Technique using Machine Learning
• Enormous growth and generation of data is happening in every day from various sources. The
generated data is presented in various formats i.e., in structured, unstructured, semi-structured,
Pdf’s,Doc’s,Csv’s and Raw file formats.
• All these files are not genuine/pure in all scenarios cause which is generated from identified and
unidentified sources.
• One of the major challenges that antimalware companies face today is to detect, classify and removal
of malware on the vast amounts of data.
Contd….
Abstract
A Novel Malware Detection Technique using Machine Learning
• The modern malware is designed with mutation characteristics that means it can change its behavior
based on the properties of physical file. It is contraction from malicious software.
• The tremendous growth of the data is very helpful to the Malware designers to execute the malware
files such as Virus, Trojans, and Ransomware in any file.
• The formation of modern malware poses a variety of challenges to the antivirus industries.
Contd….
Abstract
A Novel Malware Detection Technique using Machine Learning
• All the antivirus software’s, automatic and manual signature extraction paradigms are only identifying
the known malwares. But malware changes its properties day by day when new data is arrived.
• That is impossible to collect new data every day to train the antivirus software’s and Signatures to
detect malware and manual updation of software’s by security experts to detect malware formation on
data. Because of analyzing all the data manually to update the system is a time and cost effective
process.
Contd….
• To achieve this major defect, companies adapting machine learning technology to detect the malware.
Training the model with the data, it can able to detect the malware while it is occur, with the most
accurate performance.
Existing System
Existing System
A Novel Malware Detection Technique using Machine Learning
• In existing system, the model is well trained but the maintenance of the trade-off between
complexity and performance is a key issue. Very often, then gain in performances of complex
system on validation data is negligible compared to the performances of less complex ones.
Proposed System
Proposed System
A Novel Malware Detection Technique using Machine Learning
• In this proposed system we are going to design a light-weight model to accurately detect the
malware for industrial use with a high accuracy and maintaining the trade off between the
complexity of data and performance of model.
Literature Survey
A Novel Malware Detection Technique using Machine Learning Literature Survey
S.NO Title & Author Method of Developing Remarks
1 Nir Nissim et al’s.
Novel active learning methods for enhanced PC
malware detection,2014.
• Static and Dynamic Features
• An Active Learning Framework which is
developed to detect the malware by
using Support vector machines model
• SVM-Margin AL Method
showed a decrease
performance while AL methods
continued to increase to
acquire malwares.
2 Liu LIU et al’s.
Automatic malware classification and new malware
detection using machine learning,2017.
• incremental malware detection system
to classify malware families and to
detect new malware.
• shared nearest neighbour (SNN) to find
new malware on static features.
• Resulting with 86.7% when
detecting the new malware.
3 Dragos Gavrilut et al’s.
Malware Detection Using Machine learning,2011.
• Data that could be derived in train, test
and scaled up datasets to improve the
performance.
• Multilayer Perceptron algorithms are
used to detect the malware on byte
features
• Overfitting over fitting was
occurred when this algorithm
applied to large datasets.
• Resulted in an average of
89.77%
Literature Survey
A Novel Malware Detection Technique using Machine Learning Literature Survey
S.NO Title & Author Method of Developing Remarks
4 Yujie Fan et al’s.
Malicious sequential pattern mining for automatic
malware detection,2016.
• API calls as features.
• Heuristic based detection method,
which utilizes data mining as well as
machine learning techniques.
• Aims to learn special patterns that
capture the characteristics of malware
• Unable to mine discriminative
features.
• Classification of malware can’t
be done by this model.
5 Mansour Ahmadi et al’s.
Novel Feature Extraction, Selection and Fusion for
Effective Malware Family Classification,2016.
• Static Analysis on Assembly and Byte
Features.
• Feature Selection and Feature Fusion
• Learning based system to classify the
malware using Random Forest and
XGBoost Algorithm.
• 99.7% accuracy resulted in Test Results
with 5-Fold cross validation.
• Method not yet been tested
for robustness against evasion
attacks or poisoning attacks.
• Many Features are used to
classify the malware
Malware Detection
and classification
Dynamic
Analysis
Static
Analysis
Assembly
Features
Byte
Features
API Call
Features
K Nearest
Neighbour
Naive
Bayes
Support Vector
Machines
Decision
Trees
Artificial Neural
Networks
Logistic
Regression
Random
Forest
Machine learning
Methodologies
A Novel Malware Detection Technique using Machine Learning Literature Survey
Literature Survey Diagram
Requirements
Requirements
A Novel Malware Detection Technique using Machine Learning
Hardware Requirements:
The Hardware consists of the physical components of the computer that input storage
processing control, output devices. The kind of hardware used in the project is
• Processor : Intel I3
• Hard Disk : 1TB
• RAM : 4GB
Requirements
Requirements
A Novel Malware Detection Technique using Machine Learning
Software Requirements:
Software is a set of programs to do a particular task. Software is an essential requirement of
computer systems. The kind of software used in the project is
• Operating System : Windows, Linux, Mac, Ubuntu.
• IDE : Anaconda
• Script Book : Jupyter Notebook
• Coding Language : Python
System Design
• Prior to the development of signatures for anti-malware products, the two main tasks that have to be
carried out within the scope of malware analysis are malware detection and malware classification.
• While the goal of malware detection mechanisms is to catch the malware in wild, malware
classification systems assign each sample to the correct malware family.
• In this project, we are going to propose a model which uses different malware characteristics to
effectively assign malware samples to their corresponding families without doing any deobfuscation
and unpacking process.
A Novel Malware Detection Technique using Machine Learning System Design
Steps to design a Machine Learning Model
A Novel Malware Detection Technique using Machine Learning
• Gathering the data
• Prepare the data
• Choosing a model
• Training
• Evaluation
• Hyper Parameter Tuning
• Prediction
System Design
A Novel Malware Detection Technique using Machine Learning System Design
Train/
Test
Split
Process
Execution
Feature
Engineering
Statistic
Analysis
Feature Selection
& Feature Fusion
Testing
Data
Training
Data
Prediction
Classification
Model
Feature Extraction
.asm .byte
Malware Classification System Architecture
Evaluation Measure
Parameter Tuning
System Design
A Novel Malware Detection Technique using Machine Learning System Design
Data Description
Compiler
Human
Understandable
Code
Assembly Level
Instruction
Assembler Byte Code
.asm Files (Assembly View)
.text:00419B02 2B CE sub ecx, esi
.text:00419B07 8D 4C 85 E0l ea ecx, [ebp+eax*4+var_20]
.text:00419B0B 8B 31 mov esi, [ecx]
.text:00419B0D 8D 3C 16 lea edi, [esi+edx]
.byte files (Hex-View)
00401000 56 8D 44 24 08 50 8B F1 E8 1C 1B 00 00 C7 06 08
00401010 BB 42 00 8B C6 5E C2 04 00 CC
00401020 C7 01 08 BB 42 00 E9 26 1C 00 00 CC
00401030 56 8B F1 C7 06 08 BB 42 00 E8 13 1C 00 00 F6
• Spilt the dataset randomly into three parts Train, Cross Validation and Test with 64%, 16%, 20% of data respectively
Malware Classes
• Ramnit
• Lollipop
• Kelihos_ver3
• Vundo
• Simda
• Tracur
• Kelihos_ver1
• Obfuscator.ACY
• Gatak
System Design
A Novel Malware Detection Technique using Machine Learning System Design
Feature Engineering(Feature Selection & Feature Fusion)
• N-gram
• Metadata
.asm file features
.byte file features
• Symbol
• Operation Code
• Register
• Section
System Design
A Novel Malware Detection Technique using Machine Learning System Design
Classification Model
• K-Nearest Neighbour
• Logistic Regression
• Random Forest
• XGBoost
Evaluation Measures
• Confusion Matrix
• Multi Class Log-Loss Function
Conclusion
System Design
A Novel Malware Detection Technique using Machine Learning
• In this we are going to identifying 9 different types of malwares on huge amount of data (0.5TB) that
is provided by Microsoft.
• Malware detection and classification techniques are two different tasks, that are performed by anti-
malware companies.
• Firstly, an executable needs to be analysed to detect if it exhibits any malicious content: Then, in the
case a malware is detected, it is assigned to the most appropriate malware family through a
classification mechanism.
Results
A Novel Malware Detection Technique using Machine Learning
Malware Probability Count
Results
A Novel Malware Detection Technique using Machine Learning
Malware Train and Test Data Values Count
Results
A Novel Malware Detection Technique using Machine Learning
Malware Cross Validation Data Values Count
Results
A Novel Malware Detection Technique using Machine Learning
KNN Malware Class Count on Byte Malware Data
Results
A Novel Malware Detection Technique using Machine Learning
Logistic Regression Malware Class Count on Byte Malware Data
Results
A Novel Malware Detection Technique using Machine Learning
Random Forest Malware Class Count on Byte Malware Data
Results
A Novel Malware Detection Technique using Machine Learning
Xgboost Malware Class Count on Byte Malware Data
Results
A Novel Malware Detection Technique using Machine Learning
Accuracy Values on Byte Malware Data
Classifier Test Log-loss Misclassification Rate Accuracy
K-NN 0.24% 4.50% 95.50
Logistic
Regression
0.528% 12.32% 77.68
Random Forest 0.085% 2.02% 97.98
XGboost 0.078% 1.24% 98.76
Results
A Novel Malware Detection Technique using Machine Learning
Results
A Novel Malware Detection Technique using Machine Learning
K-NN Malware Class Count on ASM Malware Data
Results
A Novel Malware Detection Technique using Machine Learning
Logistic Regression Malware Class Count on ASM Malware Data
Results
A Novel Malware Detection Technique using Machine Learning
Random Forest Malware Class Count on ASM Malware Data
Results
A Novel Malware Detection Technique using Machine Learning
Xgboost Malware Class Count on ASM Malware Data
Results
A Novel Malware Detection Technique using Machine Learning
Accuracy Values on ASM Malware Data
Classifier Test Log-loss Misclassification Rate Accuracy
K-NN 0.089% 2.02% 97.98
Logistic Regression 0.415% 9.61% 90.39
Random Forest 0.057% 1.15% 98.85
XGboost 0.048% 0.87% 99.13
Results
A Novel Malware Detection Technique using Machine Learning
Accuracy Values on ASM Malware Data
Thank You.
A Novel Malware Detection Technique using Machine Learning

More Related Content

PPTX
Keith J. Jones, Ph.D. - MALGAZER: AN AUTOMATED MALWARE CLASSIFIER WITH RUNNIN...
PDF
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
PDF
Integrated Feature Extraction Approach Towards Detection of Polymorphic Malwa...
PDF
IRJET - Survey on Malware Detection using Deep Learning Methods
PPTX
Malware classification using Machine Learning
PDF
Malware Detection - A Machine Learning Perspective
PDF
Malware1
PPTX
Malware Detection Using Machine Learning Techniques
Keith J. Jones, Ph.D. - MALGAZER: AN AUTOMATED MALWARE CLASSIFIER WITH RUNNIN...
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
Integrated Feature Extraction Approach Towards Detection of Polymorphic Malwa...
IRJET - Survey on Malware Detection using Deep Learning Methods
Malware classification using Machine Learning
Malware Detection - A Machine Learning Perspective
Malware1
Malware Detection Using Machine Learning Techniques

What's hot (20)

PDF
Automatically generated win32 heuristic
PDF
Zero day malware detection
PDF
Metamorphic Malware Analysis and Detection
PDF
A STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODS
DOCX
robust malware detection for iot devices using deep eigen space learning
PDF
Computer Worms Based on Monitoring Replication and Damage: Experiment and Eva...
PDF
"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...
PDF
Application of data mining based malicious code detection techniques for dete...
PPT
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMS
PDF
PE Trojan Detection Based on the Assessment of Static File Features
PDF
IRJET- Android Malware Detection using Machine Learning
PPTX
Cognitive Computing in Security with AI
PDF
Applications of genetic algorithms to malware detection and creation
PPT
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
PDF
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
PDF
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
PDF
COMPARATIVE REVIEW OF MALWARE ANALYSIS METHODOLOGIES
PDF
COMPARISON OF MALWARE CLASSIFICATION METHODS USING CONVOLUTIONAL NEURAL NETWO...
PDF
A feature selection and evaluation scheme for computer virus detection
PPTX
Machine Learning for Malware Classification and Clustering
Automatically generated win32 heuristic
Zero day malware detection
Metamorphic Malware Analysis and Detection
A STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODS
robust malware detection for iot devices using deep eigen space learning
Computer Worms Based on Monitoring Replication and Damage: Experiment and Eva...
"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...
Application of data mining based malicious code detection techniques for dete...
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMS
PE Trojan Detection Based on the Assessment of Static File Features
IRJET- Android Malware Detection using Machine Learning
Cognitive Computing in Security with AI
Applications of genetic algorithms to malware detection and creation
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
COMPARATIVE REVIEW OF MALWARE ANALYSIS METHODOLOGIES
COMPARISON OF MALWARE CLASSIFICATION METHODS USING CONVOLUTIONAL NEURAL NETWO...
A feature selection and evaluation scheme for computer virus detection
Machine Learning for Malware Classification and Clustering
Ad

Similar to Design and Development of an Efficient Malware Detection Using ML (20)

PPTX
malware detection ppt for vtu project and other final year project
PPTX
Presentation (1).pptx
PPTX
Presentation.pptx..................................
PPTX
savi technical ppt.pptx
PPTX
Malware Detector
PDF
A novel ensemble-based approach for Windows malware detection
PDF
PROVIDING CYBER SECURITY SOLUTION FOR MALWARE DETECTION USING SUPPORT VECTOR ...
PDF
Features for Detecting Malware on Computing Environments
DOCX
A malware detection method for health sensor data based on machine learning
PPTX
Malware Detection By Machine Learning Presentation.pptx
PPT
CISC 879 - Machine Learning for Solving Systems Problems
PDF
[IJET-V1I6P6] Authors: Ms. Neeta D. Birajdar, Mr. Madhav N. Dhuppe, Ms. Trupt...
PDF
Malware detection-using-machine-learning
PPTX
Malware_SmartCom_2017
PDF
MACHINE LEARNING APPLICATIONS IN MALWARE CLASSIFICATION: A METAANALYSIS LITER...
PDF
IRJET - Different Data Mining Techniques for Intrusion Detection System
PPTX
Automated Malware Detection Project R1.pptx
ODP
Malware Dectection Using Machine learning
PDF
Analysis of Malware Infected Systems & Classification with Gradient-boosted T...
PPTX
MALWARE DETECTION A FRAMEWORK FOR REVERSE ENGINEERED ANDROID APPLICATIONS_.pptx
malware detection ppt for vtu project and other final year project
Presentation (1).pptx
Presentation.pptx..................................
savi technical ppt.pptx
Malware Detector
A novel ensemble-based approach for Windows malware detection
PROVIDING CYBER SECURITY SOLUTION FOR MALWARE DETECTION USING SUPPORT VECTOR ...
Features for Detecting Malware on Computing Environments
A malware detection method for health sensor data based on machine learning
Malware Detection By Machine Learning Presentation.pptx
CISC 879 - Machine Learning for Solving Systems Problems
[IJET-V1I6P6] Authors: Ms. Neeta D. Birajdar, Mr. Madhav N. Dhuppe, Ms. Trupt...
Malware detection-using-machine-learning
Malware_SmartCom_2017
MACHINE LEARNING APPLICATIONS IN MALWARE CLASSIFICATION: A METAANALYSIS LITER...
IRJET - Different Data Mining Techniques for Intrusion Detection System
Automated Malware Detection Project R1.pptx
Malware Dectection Using Machine learning
Analysis of Malware Infected Systems & Classification with Gradient-boosted T...
MALWARE DETECTION A FRAMEWORK FOR REVERSE ENGINEERED ANDROID APPLICATIONS_.pptx
Ad

Recently uploaded (20)

PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Foundation of Data Science unit number two notes
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
The Rise of Impact Investing- How to Align Profit with Purpose
PPTX
Understanding Prototyping in Design and Development
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Data Science Trends & Career Guide---ppt
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPT
Quality review (1)_presentation of this 21
PPTX
Global journeys: estimating international migration
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
climate analysis of Dhaka ,Banglades.pptx
Fluorescence-microscope_Botany_detailed content
Launch Your Data Science Career in Kochi – 2025
Introduction to Knowledge Engineering Part 1
Foundation of Data Science unit number two notes
Moving the Public Sector (Government) to a Digital Adoption
The Rise of Impact Investing- How to Align Profit with Purpose
Understanding Prototyping in Design and Development
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Data Science Trends & Career Guide---ppt
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Quality review (1)_presentation of this 21
Global journeys: estimating international migration
Business Acumen Training GuidePresentation.pptx
Major-Components-ofNKJNNKNKNKNKronment.pptx

Design and Development of an Efficient Malware Detection Using ML

  • 1. Contents • Abstract • Existing System • Proposed System • Literature Survey • Requirements • System Design • Results Contents A Novel Malware Detection Technique using Machine Learning
  • 2. Abstract Abstract A Novel Malware Detection Technique using Machine Learning • Enormous growth and generation of data is happening in every day from various sources. The generated data is presented in various formats i.e., in structured, unstructured, semi-structured, Pdf’s,Doc’s,Csv’s and Raw file formats. • All these files are not genuine/pure in all scenarios cause which is generated from identified and unidentified sources. • One of the major challenges that antimalware companies face today is to detect, classify and removal of malware on the vast amounts of data. Contd….
  • 3. Abstract A Novel Malware Detection Technique using Machine Learning • The modern malware is designed with mutation characteristics that means it can change its behavior based on the properties of physical file. It is contraction from malicious software. • The tremendous growth of the data is very helpful to the Malware designers to execute the malware files such as Virus, Trojans, and Ransomware in any file. • The formation of modern malware poses a variety of challenges to the antivirus industries. Contd….
  • 4. Abstract A Novel Malware Detection Technique using Machine Learning • All the antivirus software’s, automatic and manual signature extraction paradigms are only identifying the known malwares. But malware changes its properties day by day when new data is arrived. • That is impossible to collect new data every day to train the antivirus software’s and Signatures to detect malware and manual updation of software’s by security experts to detect malware formation on data. Because of analyzing all the data manually to update the system is a time and cost effective process. Contd…. • To achieve this major defect, companies adapting machine learning technology to detect the malware. Training the model with the data, it can able to detect the malware while it is occur, with the most accurate performance.
  • 5. Existing System Existing System A Novel Malware Detection Technique using Machine Learning • In existing system, the model is well trained but the maintenance of the trade-off between complexity and performance is a key issue. Very often, then gain in performances of complex system on validation data is negligible compared to the performances of less complex ones.
  • 6. Proposed System Proposed System A Novel Malware Detection Technique using Machine Learning • In this proposed system we are going to design a light-weight model to accurately detect the malware for industrial use with a high accuracy and maintaining the trade off between the complexity of data and performance of model.
  • 7. Literature Survey A Novel Malware Detection Technique using Machine Learning Literature Survey S.NO Title & Author Method of Developing Remarks 1 Nir Nissim et al’s. Novel active learning methods for enhanced PC malware detection,2014. • Static and Dynamic Features • An Active Learning Framework which is developed to detect the malware by using Support vector machines model • SVM-Margin AL Method showed a decrease performance while AL methods continued to increase to acquire malwares. 2 Liu LIU et al’s. Automatic malware classification and new malware detection using machine learning,2017. • incremental malware detection system to classify malware families and to detect new malware. • shared nearest neighbour (SNN) to find new malware on static features. • Resulting with 86.7% when detecting the new malware. 3 Dragos Gavrilut et al’s. Malware Detection Using Machine learning,2011. • Data that could be derived in train, test and scaled up datasets to improve the performance. • Multilayer Perceptron algorithms are used to detect the malware on byte features • Overfitting over fitting was occurred when this algorithm applied to large datasets. • Resulted in an average of 89.77%
  • 8. Literature Survey A Novel Malware Detection Technique using Machine Learning Literature Survey S.NO Title & Author Method of Developing Remarks 4 Yujie Fan et al’s. Malicious sequential pattern mining for automatic malware detection,2016. • API calls as features. • Heuristic based detection method, which utilizes data mining as well as machine learning techniques. • Aims to learn special patterns that capture the characteristics of malware • Unable to mine discriminative features. • Classification of malware can’t be done by this model. 5 Mansour Ahmadi et al’s. Novel Feature Extraction, Selection and Fusion for Effective Malware Family Classification,2016. • Static Analysis on Assembly and Byte Features. • Feature Selection and Feature Fusion • Learning based system to classify the malware using Random Forest and XGBoost Algorithm. • 99.7% accuracy resulted in Test Results with 5-Fold cross validation. • Method not yet been tested for robustness against evasion attacks or poisoning attacks. • Many Features are used to classify the malware
  • 9. Malware Detection and classification Dynamic Analysis Static Analysis Assembly Features Byte Features API Call Features K Nearest Neighbour Naive Bayes Support Vector Machines Decision Trees Artificial Neural Networks Logistic Regression Random Forest Machine learning Methodologies A Novel Malware Detection Technique using Machine Learning Literature Survey Literature Survey Diagram
  • 10. Requirements Requirements A Novel Malware Detection Technique using Machine Learning Hardware Requirements: The Hardware consists of the physical components of the computer that input storage processing control, output devices. The kind of hardware used in the project is • Processor : Intel I3 • Hard Disk : 1TB • RAM : 4GB
  • 11. Requirements Requirements A Novel Malware Detection Technique using Machine Learning Software Requirements: Software is a set of programs to do a particular task. Software is an essential requirement of computer systems. The kind of software used in the project is • Operating System : Windows, Linux, Mac, Ubuntu. • IDE : Anaconda • Script Book : Jupyter Notebook • Coding Language : Python
  • 12. System Design • Prior to the development of signatures for anti-malware products, the two main tasks that have to be carried out within the scope of malware analysis are malware detection and malware classification. • While the goal of malware detection mechanisms is to catch the malware in wild, malware classification systems assign each sample to the correct malware family. • In this project, we are going to propose a model which uses different malware characteristics to effectively assign malware samples to their corresponding families without doing any deobfuscation and unpacking process. A Novel Malware Detection Technique using Machine Learning System Design
  • 13. Steps to design a Machine Learning Model A Novel Malware Detection Technique using Machine Learning • Gathering the data • Prepare the data • Choosing a model • Training • Evaluation • Hyper Parameter Tuning • Prediction
  • 14. System Design A Novel Malware Detection Technique using Machine Learning System Design Train/ Test Split Process Execution Feature Engineering Statistic Analysis Feature Selection & Feature Fusion Testing Data Training Data Prediction Classification Model Feature Extraction .asm .byte Malware Classification System Architecture Evaluation Measure Parameter Tuning
  • 15. System Design A Novel Malware Detection Technique using Machine Learning System Design Data Description Compiler Human Understandable Code Assembly Level Instruction Assembler Byte Code .asm Files (Assembly View) .text:00419B02 2B CE sub ecx, esi .text:00419B07 8D 4C 85 E0l ea ecx, [ebp+eax*4+var_20] .text:00419B0B 8B 31 mov esi, [ecx] .text:00419B0D 8D 3C 16 lea edi, [esi+edx] .byte files (Hex-View) 00401000 56 8D 44 24 08 50 8B F1 E8 1C 1B 00 00 C7 06 08 00401010 BB 42 00 8B C6 5E C2 04 00 CC 00401020 C7 01 08 BB 42 00 E9 26 1C 00 00 CC 00401030 56 8B F1 C7 06 08 BB 42 00 E8 13 1C 00 00 F6 • Spilt the dataset randomly into three parts Train, Cross Validation and Test with 64%, 16%, 20% of data respectively
  • 16. Malware Classes • Ramnit • Lollipop • Kelihos_ver3 • Vundo • Simda • Tracur • Kelihos_ver1 • Obfuscator.ACY • Gatak
  • 17. System Design A Novel Malware Detection Technique using Machine Learning System Design Feature Engineering(Feature Selection & Feature Fusion) • N-gram • Metadata .asm file features .byte file features • Symbol • Operation Code • Register • Section
  • 18. System Design A Novel Malware Detection Technique using Machine Learning System Design Classification Model • K-Nearest Neighbour • Logistic Regression • Random Forest • XGBoost Evaluation Measures • Confusion Matrix • Multi Class Log-Loss Function
  • 19. Conclusion System Design A Novel Malware Detection Technique using Machine Learning • In this we are going to identifying 9 different types of malwares on huge amount of data (0.5TB) that is provided by Microsoft. • Malware detection and classification techniques are two different tasks, that are performed by anti- malware companies. • Firstly, an executable needs to be analysed to detect if it exhibits any malicious content: Then, in the case a malware is detected, it is assigned to the most appropriate malware family through a classification mechanism.
  • 20. Results A Novel Malware Detection Technique using Machine Learning Malware Probability Count
  • 21. Results A Novel Malware Detection Technique using Machine Learning Malware Train and Test Data Values Count
  • 22. Results A Novel Malware Detection Technique using Machine Learning Malware Cross Validation Data Values Count
  • 23. Results A Novel Malware Detection Technique using Machine Learning KNN Malware Class Count on Byte Malware Data
  • 24. Results A Novel Malware Detection Technique using Machine Learning Logistic Regression Malware Class Count on Byte Malware Data
  • 25. Results A Novel Malware Detection Technique using Machine Learning Random Forest Malware Class Count on Byte Malware Data
  • 26. Results A Novel Malware Detection Technique using Machine Learning Xgboost Malware Class Count on Byte Malware Data
  • 27. Results A Novel Malware Detection Technique using Machine Learning Accuracy Values on Byte Malware Data Classifier Test Log-loss Misclassification Rate Accuracy K-NN 0.24% 4.50% 95.50 Logistic Regression 0.528% 12.32% 77.68 Random Forest 0.085% 2.02% 97.98 XGboost 0.078% 1.24% 98.76
  • 28. Results A Novel Malware Detection Technique using Machine Learning
  • 29. Results A Novel Malware Detection Technique using Machine Learning K-NN Malware Class Count on ASM Malware Data
  • 30. Results A Novel Malware Detection Technique using Machine Learning Logistic Regression Malware Class Count on ASM Malware Data
  • 31. Results A Novel Malware Detection Technique using Machine Learning Random Forest Malware Class Count on ASM Malware Data
  • 32. Results A Novel Malware Detection Technique using Machine Learning Xgboost Malware Class Count on ASM Malware Data
  • 33. Results A Novel Malware Detection Technique using Machine Learning Accuracy Values on ASM Malware Data Classifier Test Log-loss Misclassification Rate Accuracy K-NN 0.089% 2.02% 97.98 Logistic Regression 0.415% 9.61% 90.39 Random Forest 0.057% 1.15% 98.85 XGboost 0.048% 0.87% 99.13
  • 34. Results A Novel Malware Detection Technique using Machine Learning Accuracy Values on ASM Malware Data
  • 35. Thank You. A Novel Malware Detection Technique using Machine Learning