SlideShare a Scribd company logo
IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 3, Ver. 1 (May – Jun. 2015), PP 76-81
www.iosrjournals.org
DOI: 10.9790/0661-17317681 www.iosrjournals.org 76 | Page
Internet Worm Classification and Detection using Data Mining
Techniques
Dipali Kharche1
, Anuradha Thakare2
1 (Department of Computer Engg., PCCOE, Savitribai Phule Pune University, Pune India)
2 (Department of Computer Engg., PCCOE, Savitribai Phule Pune University, Pune India)
Abstract: Internet worm means separate malware computer programs that repeated itself and in order to spread
one computer to another computer. Malware includes computer viruses, worms, root kits, key loggers, Trojan horse,
and dialers, adware, malicious, spyware, rogue security software and other malicious programs. It is programmed
by attackers to interrupt computer process, gatherDelicate Information, or gain entry to private computer systems.
We need to detect a worm on the internet, because it may create network vulnerabilities and also it will reduce the
system performance. We can detect the various types of Internet worm the worm like, Port scan worm, Udp worm,
http worm, User to Root Worm and Remote to Local Worm. In existing process it is not easy to detect the worm,
there is difficult to detect the worm process. In our proposed systems, internet worm is a critical threat in computer
networks. Internet worm is fast spreading and self propagating. We need to detect the worm and classify the worm
using data mining algorithms. For use data mining, machine learning algorithm like Random Forest, Decision Tree,
Bayesian Network we can effectively classify the worm in internet.
Keywords: Bayesian Network, Classification,Data Mining, Decision Tree, Random Forest, Worm Detection.
I. Introduction
Internet worm is a critical threat in computer networks. Internet worm is self propagating, and fast
scattering. The internet worm [1] was released for the first time and more over hundred hosts were infected. After
that the threat of internet worm has been increasing and causing more harm to network systems. Many research
methods for internet worm detection have been projected. Most of internet worm detection is based on intrusion
detection system (IDS) [2]. Automatic detection is challenging because it is tough to predict what form the next
worm will take so, an automatic response and detection is becoming an imperative because a afresh released worm
can infect lots of hosts in a substance of seconds. Internet worm based IDS can be divided into twocategories. That
are network-based and host-based. The network-based internet worm detection reflects network packets before they
spread to an end-host, whereas the host-based internet worm detection reflects network packets that already spread
to the end-host. Moreover, the host-based detection studiesencoded network packets so that the stroke of the internet
worm may be struck. When we focus on the network packet without encoding, we must studythe performances of
traffic in the network. Numerous different types of machine learning techniques were used in the field of intrusion
detection in general and worm detection. Data Mining has an important role and is essential in worm detection
systems, which using different data mining techniques to build several models have been proposed to detect worms.
In this paper, we provide a new method for network-based internet worm detection. We preprocess the
network packet data by mining a certain number of features of abnormal/normal traffic data and use three different
data mining algorithms for data classification. Our model can detect internet worm with a detection ratenear to
99.6%, and false alarm is nearly zero.
The paper is structured as follows. In section II, provide related methods of internet worm detection. In
section III, present details of irregularbehavior/patterns in the network traffic data. In section IV & V, present
related study and our proposed model, respectively. In section VI & VII, experimental results and conclusion.
II. Related Work
Several recent researches in the few last years were proposed “Worms Detection” are based on data mining
as an efficient ways to increase the security of networks. Classification techniques were the best for many recent
researches.
Some data mining algorithms are operative to classify behaviors of internet worms. For example, internet
worms by mining their features [6] from cleaned/infected platform. They made a data mining model and train it with
Internet Worm Classification and Detection using Data Mining Techniques
DOI: 10.9790/0661-17317681 www.iosrjournals.org 77 | Page
these performances and set up results of internet worm detection with greater overall accuracy and low false-positive
rate.
Amethod [3] using association behavior to detect the internet worm. They considered the change of normal
connections and worm connections. The worm connections were predictable to have a high number of failed
connections. Moreover, the failure networks can be occurred when a source IP sends a request linking a packet to an
unused IP address or some ports that no longer in service. After that, SYN/ACK packet, ICMP packet, and TCP
RESET will be returned. So the amount of these packets will be high [4].
Anew method of internet worm detection[5] that categorizedalarm in source-destination ports that worms
use for scattering themselves. They use K-L divergence to identify features of abnormal actions and use Support
Vector Machine (SVM) to organize these actions. They obtain good results with a 90% detection rate for all
endpoints and with false-alarm rate nearby zero.
We emphasis on a idea of network-based internet worm detection. We preprocess fresh network packets
before it influences to an end user and consider association of source-destination IP addresses, association of source-
destination ports and number of some abnormal packets that occur when some users produce internet worm traffic.
Here, we use three different kinds of data mining algorithms that are Bayesian Network, Decision Tree and Random
Forest to classify data into worm, normal data or network attack data (i.e., DOS and Port Scan).
III. Attack And Worm Characteristics
In this paper, we consider Blaster worm, which is one type of the public worms. Most worms have
performances similar to those of the Port Scan and Denial of Service (DoS) attacks. Thus, our method is to classify
and detect the Blaster worm, Port Scan and DoS attack performances.We consider UDP flood and HTTP flood in a
DoS attack. Particulars about data type are presented below.
• Blaster worm activities a buffer overflow susceptibility of the DCOM RPC on Windows platforms by spreading to
ports 135 and 4444 on TCP protocol and port 69 on UDP protocol. This worm can transfer and operate by itself.
After that, the worm creates DoS attacks to escape patching update by makinga SYN flood to port 80.
• UDP flood is a sort of DoS attack. This attack will refer a lot of UDP packets to any target operators or a network
system. This performance will consume more bandwidth.
• HTTP flood is a kind of DoS attack as well. This attack is as analogous as the UDP flood. The HTTP flood will
send a lot of unusable packets to any target operators to consume high bandwidth on Web Server.
• Port Scan is a procedure to scan for accessible port or service that runs on any ports from any users.
IV. Classification Algorithms
4.1 C4.5 Decision Tree [8]:
It is famous data mining algorithm that classifies data set by using numerous nodes of the tree. It forms a
tree by using a divide-and-conquer procedure. A Decision tree is approached with over-fitting on large datasets. The
classification model of Decision tree is created by mining rules from the training set. These rules are used to
calculate and classify a new or anonymous dataset called a testing set. The Decision tree will discover asolution
class by starting at the root and crossing to a leafnode. The result of prediction and classification can be found in a
leaf node. Moreover C4.5 Decision tree is an algorithm that is well-known and has an efficiency in classification.
4.2 Random Forest [9]:
It is an operational data mining algorithm since it can fix problem of over-fitting on large dataset and can
train/test rapidly on large and complex data set. A tree is constructed using random data from a training dataset
through replacement; major of these datasets is used for training, and the remaining of dataset is used for testing or
result assessment. This model can calculate important features used in classification and un-pruned rules that are
formed and estimated by the training dataset. There are many classification trees included in Random Forest model.
Each classification tree is exclusive and is voted for a class. Finally, an solution class is assigned constructed on the
maximum vote.
4.3 Bayesian network [10]:
It is a graphical model and a probabilistic model. A Bayesian network uses numerous nodes or positions
that have probabilistic relation with each other. The Bayesian network studiesunexpected relation from the training
dataset to classify or predict unknown cases. Moreover, it can avoid over-fitting with large data.
Internet Worm Classification and Detection using Data Mining Techniques
DOI: 10.9790/0661-17317681 www.iosrjournals.org 78 | Page
4.4 Information Gain:
Itis a proposition of feature selection. Information Gain computes for an entropy cost of each attribute. An
entropy cost can be called as a rank. Rank of each feature represents its importance or association with an solution
class that is used to recognize the data. So a feature with comparatively high rank will be one of the most important
features for classification.
V. Proposed Model
5.1 Overview
Our worm detection model divides into preprocessing and classification part as shown in Figure 1.In the
preprocessing, we insert the actual Blaster worm, obtained from a consistent online source, into a local area network
(LAN). At the same time, we also produce UDP flood, HTTP flood and Port Scan attacks into a LAN (local area
network).
Fig. 1. Worm Detection Model
Here,snort raw network packets from the Local Area Network andchoose only some features from the
packet header of all raw packet performances that is major and necessary to predict or classify the data. The
preprocessing and feature selection technique will be shown in details in Section B. After the preprocessing part,
separate the obtained datasets into two parts; one for training and the other one for testing. In the classification part,
Internet Worm Classification and Detection using Data Mining Techniques
DOI: 10.9790/0661-17317681 www.iosrjournals.org 79 | Page
using data mining algorithms to classify the features of Worm, Http flood, UDP flood, Port Scan and Normal
network behavior. These will be discussed in more detail in Section C.
5.2 Preprocessing Part
Each source IP address togetherat one second is one record. Moreover, each record has 13 features that
mine from entire packets in 1 second. Detail of the features is shown below.
• Number of individually source IP address in 1 second
• Numeral of destination IP address
• Number of TCP header packet
• Number of ICMP header packet
• Number of UDP header packet
• Number of SYN (Synchronization) flag (bit 1)
• Number of ACK (Acknowledgement) flag (bit 1)
• Number of RST flag (bit 1)
• Total of source port
• Total of destination port
• Number of difference packet size
• Port ratio is the number of source port separated by number of destination port
• SYN ratio is the number of SYN flag bit 1 shared by number of destination IP
In Preprocessing is the major task in data mining. After preprocessing the data we can split the data into
two set one is training set and another one is testing set. We can perform the preprocessing in the worm detection
dataset. And the importing the dataset, then perform preprocessing.In preprocessing part, we can extract the training
test based on the source IP address collected at 1 second is one record Moreover, each record has 13 features that
extract from all packets in 1 second.Finally, the preprocessing part creates a training dataset and testing dataset. The
testing dataset has half size of the training set.
5.3 Classification Part
In this part, first we train the data mining techniques which are Random Forest, C4.5 Decision tree and
Bayesian Network using the WEKA tool [7] with training dataset and then testing these techniques with a different
testing data set. Here, test our models by classifying normal data, UDP flood, HTTP flood ,Blaster worm and Port
Scan, using 13-features of preprocessed dataset.
VI. Experimental Evaluation
6.1 Parameter Evaluation
The performance of each classification model is compared and measured by using the detection rates, which are
True Positive and False Alarm defined as follows:
 True Positive: a process classifies the input data correctly.
 False Alarm: a process misclassifies normal input data, and reports it as having anomalousperformance.
6.2 Experimental Results
For our experiment, our classification outcomes in terms of detection rate and false-alarm rate. Three
different data mining techniques are considered and estimated one by one. From Table I, with our 13-feature input
data, each of the techniques can classify normal internet data, UDP flood, internet worm, HTTP flood and Port Scan
attacks with a detection rate over 97.8% data. In particular, the Decision tree ,Random Forest and Bayesian Network
techniques give 99.4% , 99.6%and 97.8 detection rates, respectively. Additionally, Bayesian Network offers the
lowest true-positive rate in worm detectionthat is 91.6%, while the UDP flood detection is perfect with
100% true-positive detection rate. From Table II, with our 13-feature input data, the Random Forest,
Decision tree and Bayesian Network models can detect and classify internet worm giving false-alarm rates equal to
0.3% ,0.2% and 1.9%, respectively. Essentially, each of the techniques can classify network attacks which are HTTP
flood, UDP flood and Port Scan attacks, giving the false-alarm rate equal to zero.
Internet Worm Classification and Detection using Data Mining Techniques
DOI: 10.9790/0661-17317681 www.iosrjournals.org 80 | Page
Table I. Detection Rate And True Positive
Model
Detection Rate
(%)
True Positive
Normal
(%)
Worm(%)
UDP
Flood(%)
HTTP
Flood(%)
Port Scan
(%)
Bayesian Network 97.8 98.2 91.6 100.0 98.0 99.8
C4.5 Decision Tree 99.4 99.6 99.0 100.0 98.2 99.8
Random Forest 99.6 99.7 99.2 100.0 98.8 99.8
Table II. False Alaram Rate
Model
False Alarm
Worm(%) UDP Flood (%) HTTP Flood(%) Port Scan (%)
Bayesian Network 1.9 0.0 0.0 0.0
C4.5 Decision Tree 0.2 0.0 0.0 0.0
Random Forest 0.3 0.0 0.0 0.0
From Table I and Table II the resulte of the classification techniques in term of worm detectionmodel true
positive and false rate is shown in the Figure 2.
Fig.2. Performance of Worm Detection Model
VII. Conclusion
In this paper, our worm detection model consists of preprocessing and classification techniques. The
propose model consist of a preprocessing method with 13 features mined from the network packets.
Three data mining algorithms which are Random Forest, Bayesian Network and Decision tree are measured
to classify performances of Normal network data, UDP flood, Http flood, Blaster Worm and Port Scan. Most
internet worms have performances similar to Port scan and DoS attack. So proposed model not only has efficiency
to detect internet worms, but also can classify attack types such as HTTP flood, UDP flood and Port Scan with low
false-alarm rate and high detection rate. Especially, Bayesian Network gives the percentage of internet worm
classification less than 99% as 91.6% and percentage of false-alarm as 1.9% so that in practice, 1.9% of false-alarm
rate is very high. However, we found that the Random Forest and the Decision Tree algorithms can detect internet
worm and classify DOS and Port Scan attacks with a detection rate over 99% and false-alarm rate close to zero.
References
[1]. N. Weaver, V. Paxson, S. Staniford and R. Cunningham, “Taxonomy of computer worms,” Proc of the ACM workshop on Rapid
malcode, WORM03, 2003, pp. 11-18.
[2]. C. Smith, A. Matrawy, S. Chow and B. Abdelaziz, “Computer Worms: Architecture, Evasion Strategies, and Detection Mechanisms,” J.
of Information Assurance and Security, 2009, pp. 69-83.
[3]. M. M. Rasheed, N. M. Norwawi, O. Ghazali, M. M. Kadhum, “Intelligent Failure Connection Algorithm for Detecting Internet Worms”,
International Journal of Computer Science and Network Security, Vol. 9, No. 5, 2009, pp. 280-285.
0
10
20
30
40
50
60
70
80
90
100
Bayesian Network Decision Tree Random Forest
True Positive
False Positive
Internet Worm Classification and Detection using Data Mining Techniques
DOI: 10.9790/0661-17317681 www.iosrjournals.org 81 | Page
[4]. D. R. Ellis, J. G. Aiken, K. S. Attwood, S. D.Tenaglia, “A Behavioral Approach to Worm Detection,” Proceedings of the 2004 ACM
workshop on Rapid malcode, 2004, pp. 43-53.
[5]. S. A. Khayam, H. Radha and D. Loguinov, “Worm Detection at Network Endpoints Using Information-Theoretic Traffic Perturbations”,
IEEE Inter Conf on Communications (ICC), 2008, pp. 1561-1565.
[6]. M. Siddiqui, M. C. Wang and J, Lee, "Detecting Internet Worms Using Data Mining Techniques", Cybernetics and Information
Technologies, Systems and Applications: CITSA, 2008.
[7]. Weka 3.7.0 tools [Online], Available: www.cs.waikato.ac.nz/ml/weka/ [2009, July 2]
[8]. N. Pater, “Enhancing Random Forest Implementation in Weka”, Machine Learning Conference Paper for ECE591Q, 2005
[9]. D. Heckerman, “A Tutorial on Learning with Bayesian Networks” Microsoft Research Advanced Technology Division Microsoft
Corporation, 1996.
[10]. Classification via Trees in WEKA [online], Avilable : http:/maya.cs.depaul.edu/~classses/ect584/WEKA/classify.html
[11]. Tawfeeq S. Barhoom, Hanaa A. Qeshta,” Adaptive Worm Detection Model Based on Multi classifiers” 978-0-7695-4984-2/13, 2013
Palestinian International Conference on Information and Communication Technology

More Related Content

What's hot (19)

PDF
An approach for ids by combining svm and ant colony algorithm
eSAT Journals
 
PDF
IJAEIT 20
Jackson Christian
 
PDF
An analysis of Network Intrusion Detection System using SNORT
ijsrd.com
 
PDF
Current issues - International Journal of Network Security & Its Applications...
IJNSA Journal
 
PDF
Low Priced And Efficient Energy Replica Detection In WSN
IRJET Journal
 
PDF
L018118083.new ramya publication (1)
IOSR Journals
 
PDF
A NOVEL HEADER MATCHING ALGORITHM FOR INTRUSION DETECTION SYSTEMS
IJNSA Journal
 
PDF
Icacci presentation-cnn intrusion
vinaykumar R
 
PDF
IRJET- Review on Intrusion Detection System using Recurrent Neural Network wi...
IRJET Journal
 
PDF
IRJET - Securing Computers from Remote Access Trojans using Deep Learning...
IRJET Journal
 
PDF
A Study on Data Mining Based Intrusion Detection System
AM Publications
 
PDF
Ant Colony Optimization for Wireless Sensor Network: A Review
iosrjce
 
PDF
Utilizing Data Mining Approches in the Detection of Intrusion in IPv6 Network...
IDES Editor
 
PDF
Image Based Relational Database Watermarking: A Survey
iosrjce
 
PDF
Evaluation of network intrusion detection using markov chain
IJCI JOURNAL
 
PDF
A honeynet framework to promote enterprise network security
IAEME Publication
 
PDF
Review of Intrusion and Anomaly Detection Techniques
IJMER
 
PDF
Ijetr012045
ER Publication.org
 
An approach for ids by combining svm and ant colony algorithm
eSAT Journals
 
An analysis of Network Intrusion Detection System using SNORT
ijsrd.com
 
Current issues - International Journal of Network Security & Its Applications...
IJNSA Journal
 
Low Priced And Efficient Energy Replica Detection In WSN
IRJET Journal
 
L018118083.new ramya publication (1)
IOSR Journals
 
A NOVEL HEADER MATCHING ALGORITHM FOR INTRUSION DETECTION SYSTEMS
IJNSA Journal
 
Icacci presentation-cnn intrusion
vinaykumar R
 
IRJET- Review on Intrusion Detection System using Recurrent Neural Network wi...
IRJET Journal
 
IRJET - Securing Computers from Remote Access Trojans using Deep Learning...
IRJET Journal
 
A Study on Data Mining Based Intrusion Detection System
AM Publications
 
Ant Colony Optimization for Wireless Sensor Network: A Review
iosrjce
 
Utilizing Data Mining Approches in the Detection of Intrusion in IPv6 Network...
IDES Editor
 
Image Based Relational Database Watermarking: A Survey
iosrjce
 
Evaluation of network intrusion detection using markov chain
IJCI JOURNAL
 
A honeynet framework to promote enterprise network security
IAEME Publication
 
Review of Intrusion and Anomaly Detection Techniques
IJMER
 
Ijetr012045
ER Publication.org
 

Similar to Internet Worm Classification and Detection using Data Mining Techniques (20)

PDF
BOTNET DETECTION USING VARIOUS MACHINE LEARNING ALGORITHMS: A REVIEW
IRJET Journal
 
PPTX
Synopsis viva presentation
kirubavenkat
 
PDF
Bu24478485
IJERA Editor
 
PDF
Eh34803812
IJERA Editor
 
PPTX
2 dc meet new
kirubavenkat
 
PDF
Classification Rule Discovery Using Ant-Miner Algorithm: An Application Of N...
IJMER
 
PDF
Machine Learning Techniques Used for the Detection and Analysis of Modern Typ...
IRJET Journal
 
PDF
Intrusion detection with Parameterized Methods for Wireless Sensor Networks
rahulmonikasharma
 
PDF
Limiting Self-Propagating Malware Based on Connection Failure Behavior
csandit
 
PDF
2011 modeling and detection of camouflaging worm
deepikareddy123
 
PDF
2011 modeling and detection of camouflaging worm
deepikareddy123
 
PDF
A Dynamic Botnet Detection Model based on Behavior Analysis
idescitation
 
PDF
Botnet detection using ensemble classifiers of network flow
IJECEIAES
 
PDF
Limiting self propagating malware based
IJNSA Journal
 
PDF
Detection of malicious attacks by Meta classification algorithms
Eswar Publications
 
PDF
DETECTION OF PEER-TO-PEER BOTNETS USING GRAPH MINING
IJCNCJournal
 
PDF
Detection of Peer-to-Peer Botnets using Graph Mining
IJCNCJournal
 
PPT
CISC 879 - Machine Learning for Solving Systems Problems
butest
 
PDF
IRJET- Machine Learning based Network Security
IRJET Journal
 
PDF
Balasaraswathi2017 article feature_selectiontechniquesfori
boloKiKa
 
BOTNET DETECTION USING VARIOUS MACHINE LEARNING ALGORITHMS: A REVIEW
IRJET Journal
 
Synopsis viva presentation
kirubavenkat
 
Bu24478485
IJERA Editor
 
Eh34803812
IJERA Editor
 
2 dc meet new
kirubavenkat
 
Classification Rule Discovery Using Ant-Miner Algorithm: An Application Of N...
IJMER
 
Machine Learning Techniques Used for the Detection and Analysis of Modern Typ...
IRJET Journal
 
Intrusion detection with Parameterized Methods for Wireless Sensor Networks
rahulmonikasharma
 
Limiting Self-Propagating Malware Based on Connection Failure Behavior
csandit
 
2011 modeling and detection of camouflaging worm
deepikareddy123
 
2011 modeling and detection of camouflaging worm
deepikareddy123
 
A Dynamic Botnet Detection Model based on Behavior Analysis
idescitation
 
Botnet detection using ensemble classifiers of network flow
IJECEIAES
 
Limiting self propagating malware based
IJNSA Journal
 
Detection of malicious attacks by Meta classification algorithms
Eswar Publications
 
DETECTION OF PEER-TO-PEER BOTNETS USING GRAPH MINING
IJCNCJournal
 
Detection of Peer-to-Peer Botnets using Graph Mining
IJCNCJournal
 
CISC 879 - Machine Learning for Solving Systems Problems
butest
 
IRJET- Machine Learning based Network Security
IRJET Journal
 
Balasaraswathi2017 article feature_selectiontechniquesfori
boloKiKa
 
Ad

More from iosrjce (20)

PDF
An Examination of Effectuation Dimension as Financing Practice of Small and M...
iosrjce
 
PDF
Does Goods and Services Tax (GST) Leads to Indian Economic Development?
iosrjce
 
PDF
Childhood Factors that influence success in later life
iosrjce
 
PDF
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...
iosrjce
 
PDF
Customer’s Acceptance of Internet Banking in Dubai
iosrjce
 
PDF
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...
iosrjce
 
PDF
Consumer Perspectives on Brand Preference: A Choice Based Model Approach
iosrjce
 
PDF
Student`S Approach towards Social Network Sites
iosrjce
 
PDF
Broadcast Management in Nigeria: The systems approach as an imperative
iosrjce
 
PDF
A Study on Retailer’s Perception on Soya Products with Special Reference to T...
iosrjce
 
PDF
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...
iosrjce
 
PDF
Consumers’ Behaviour on Sony Xperia: A Case Study on Bangladesh
iosrjce
 
PDF
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...
iosrjce
 
PDF
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...
iosrjce
 
PDF
Media Innovations and its Impact on Brand awareness & Consideration
iosrjce
 
PDF
Customer experience in supermarkets and hypermarkets – A comparative study
iosrjce
 
PDF
Social Media and Small Businesses: A Combinational Strategic Approach under t...
iosrjce
 
PDF
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...
iosrjce
 
PDF
Implementation of Quality Management principles at Zimbabwe Open University (...
iosrjce
 
PDF
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...
iosrjce
 
An Examination of Effectuation Dimension as Financing Practice of Small and M...
iosrjce
 
Does Goods and Services Tax (GST) Leads to Indian Economic Development?
iosrjce
 
Childhood Factors that influence success in later life
iosrjce
 
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...
iosrjce
 
Customer’s Acceptance of Internet Banking in Dubai
iosrjce
 
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...
iosrjce
 
Consumer Perspectives on Brand Preference: A Choice Based Model Approach
iosrjce
 
Student`S Approach towards Social Network Sites
iosrjce
 
Broadcast Management in Nigeria: The systems approach as an imperative
iosrjce
 
A Study on Retailer’s Perception on Soya Products with Special Reference to T...
iosrjce
 
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...
iosrjce
 
Consumers’ Behaviour on Sony Xperia: A Case Study on Bangladesh
iosrjce
 
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...
iosrjce
 
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...
iosrjce
 
Media Innovations and its Impact on Brand awareness & Consideration
iosrjce
 
Customer experience in supermarkets and hypermarkets – A comparative study
iosrjce
 
Social Media and Small Businesses: A Combinational Strategic Approach under t...
iosrjce
 
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...
iosrjce
 
Implementation of Quality Management principles at Zimbabwe Open University (...
iosrjce
 
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...
iosrjce
 
Ad

Recently uploaded (20)

PPTX
Comparison of Flexible and Rigid Pavements in Bangladesh
Arifur Rahman
 
PDF
Designing for Tomorrow – Architecture’s Role in the Sustainability Movement
BIM Services
 
PDF
Rapid Prototyping for XR: Lecture 1 Introduction to Prototyping
Mark Billinghurst
 
PDF
Rapid Prototyping for XR: Lecture 2 - Low Fidelity Prototyping.
Mark Billinghurst
 
PDF
Rapid Prototyping for XR: Lecture 3 - Video and Paper Prototyping
Mark Billinghurst
 
PDF
Rapid Prototyping for XR: Lecture 4 - High Level Prototyping.
Mark Billinghurst
 
PPTX
How to Un-Obsolete Your Legacy Keypad Design
Epec Engineered Technologies
 
PDF
Rapid Prototyping for XR: Lecture 6 - AI for Prototyping and Research Directi...
Mark Billinghurst
 
PPTX
Functions in Python Programming Language
BeulahS2
 
PDF
輪読会資料_Miipher and Miipher2 .
NABLAS株式会社
 
PPTX
CST413 KTU S7 CSE Machine Learning Clustering K Means Hierarchical Agglomerat...
resming1
 
PPTX
Tesla-Stock-Analysis-and-Forecast.pptx (1).pptx
moonsony54
 
PPT
دراسة حاله لقرية تقع في جنوب غرب السودان
محمد قصص فتوتة
 
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
PDF
Decision support system in machine learning models for a face recognition-bas...
TELKOMNIKA JOURNAL
 
PPSX
OOPS Concepts in Python and Exception Handling
Dr. A. B. Shinde
 
PPTX
Mobile database systems 20254545645.pptx
herosh1968
 
PPTX
Introduction to Python Programming Language
merlinjohnsy
 
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
PDF
CLIP_Internals_and_Architecture.pdf sdvsdv sdv
JoseLuisCahuanaRamos3
 
Comparison of Flexible and Rigid Pavements in Bangladesh
Arifur Rahman
 
Designing for Tomorrow – Architecture’s Role in the Sustainability Movement
BIM Services
 
Rapid Prototyping for XR: Lecture 1 Introduction to Prototyping
Mark Billinghurst
 
Rapid Prototyping for XR: Lecture 2 - Low Fidelity Prototyping.
Mark Billinghurst
 
Rapid Prototyping for XR: Lecture 3 - Video and Paper Prototyping
Mark Billinghurst
 
Rapid Prototyping for XR: Lecture 4 - High Level Prototyping.
Mark Billinghurst
 
How to Un-Obsolete Your Legacy Keypad Design
Epec Engineered Technologies
 
Rapid Prototyping for XR: Lecture 6 - AI for Prototyping and Research Directi...
Mark Billinghurst
 
Functions in Python Programming Language
BeulahS2
 
輪読会資料_Miipher and Miipher2 .
NABLAS株式会社
 
CST413 KTU S7 CSE Machine Learning Clustering K Means Hierarchical Agglomerat...
resming1
 
Tesla-Stock-Analysis-and-Forecast.pptx (1).pptx
moonsony54
 
دراسة حاله لقرية تقع في جنوب غرب السودان
محمد قصص فتوتة
 
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
Decision support system in machine learning models for a face recognition-bas...
TELKOMNIKA JOURNAL
 
OOPS Concepts in Python and Exception Handling
Dr. A. B. Shinde
 
Mobile database systems 20254545645.pptx
herosh1968
 
Introduction to Python Programming Language
merlinjohnsy
 
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
CLIP_Internals_and_Architecture.pdf sdvsdv sdv
JoseLuisCahuanaRamos3
 

Internet Worm Classification and Detection using Data Mining Techniques

  • 1. IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 3, Ver. 1 (May – Jun. 2015), PP 76-81 www.iosrjournals.org DOI: 10.9790/0661-17317681 www.iosrjournals.org 76 | Page Internet Worm Classification and Detection using Data Mining Techniques Dipali Kharche1 , Anuradha Thakare2 1 (Department of Computer Engg., PCCOE, Savitribai Phule Pune University, Pune India) 2 (Department of Computer Engg., PCCOE, Savitribai Phule Pune University, Pune India) Abstract: Internet worm means separate malware computer programs that repeated itself and in order to spread one computer to another computer. Malware includes computer viruses, worms, root kits, key loggers, Trojan horse, and dialers, adware, malicious, spyware, rogue security software and other malicious programs. It is programmed by attackers to interrupt computer process, gatherDelicate Information, or gain entry to private computer systems. We need to detect a worm on the internet, because it may create network vulnerabilities and also it will reduce the system performance. We can detect the various types of Internet worm the worm like, Port scan worm, Udp worm, http worm, User to Root Worm and Remote to Local Worm. In existing process it is not easy to detect the worm, there is difficult to detect the worm process. In our proposed systems, internet worm is a critical threat in computer networks. Internet worm is fast spreading and self propagating. We need to detect the worm and classify the worm using data mining algorithms. For use data mining, machine learning algorithm like Random Forest, Decision Tree, Bayesian Network we can effectively classify the worm in internet. Keywords: Bayesian Network, Classification,Data Mining, Decision Tree, Random Forest, Worm Detection. I. Introduction Internet worm is a critical threat in computer networks. Internet worm is self propagating, and fast scattering. The internet worm [1] was released for the first time and more over hundred hosts were infected. After that the threat of internet worm has been increasing and causing more harm to network systems. Many research methods for internet worm detection have been projected. Most of internet worm detection is based on intrusion detection system (IDS) [2]. Automatic detection is challenging because it is tough to predict what form the next worm will take so, an automatic response and detection is becoming an imperative because a afresh released worm can infect lots of hosts in a substance of seconds. Internet worm based IDS can be divided into twocategories. That are network-based and host-based. The network-based internet worm detection reflects network packets before they spread to an end-host, whereas the host-based internet worm detection reflects network packets that already spread to the end-host. Moreover, the host-based detection studiesencoded network packets so that the stroke of the internet worm may be struck. When we focus on the network packet without encoding, we must studythe performances of traffic in the network. Numerous different types of machine learning techniques were used in the field of intrusion detection in general and worm detection. Data Mining has an important role and is essential in worm detection systems, which using different data mining techniques to build several models have been proposed to detect worms. In this paper, we provide a new method for network-based internet worm detection. We preprocess the network packet data by mining a certain number of features of abnormal/normal traffic data and use three different data mining algorithms for data classification. Our model can detect internet worm with a detection ratenear to 99.6%, and false alarm is nearly zero. The paper is structured as follows. In section II, provide related methods of internet worm detection. In section III, present details of irregularbehavior/patterns in the network traffic data. In section IV & V, present related study and our proposed model, respectively. In section VI & VII, experimental results and conclusion. II. Related Work Several recent researches in the few last years were proposed “Worms Detection” are based on data mining as an efficient ways to increase the security of networks. Classification techniques were the best for many recent researches. Some data mining algorithms are operative to classify behaviors of internet worms. For example, internet worms by mining their features [6] from cleaned/infected platform. They made a data mining model and train it with
  • 2. Internet Worm Classification and Detection using Data Mining Techniques DOI: 10.9790/0661-17317681 www.iosrjournals.org 77 | Page these performances and set up results of internet worm detection with greater overall accuracy and low false-positive rate. Amethod [3] using association behavior to detect the internet worm. They considered the change of normal connections and worm connections. The worm connections were predictable to have a high number of failed connections. Moreover, the failure networks can be occurred when a source IP sends a request linking a packet to an unused IP address or some ports that no longer in service. After that, SYN/ACK packet, ICMP packet, and TCP RESET will be returned. So the amount of these packets will be high [4]. Anew method of internet worm detection[5] that categorizedalarm in source-destination ports that worms use for scattering themselves. They use K-L divergence to identify features of abnormal actions and use Support Vector Machine (SVM) to organize these actions. They obtain good results with a 90% detection rate for all endpoints and with false-alarm rate nearby zero. We emphasis on a idea of network-based internet worm detection. We preprocess fresh network packets before it influences to an end user and consider association of source-destination IP addresses, association of source- destination ports and number of some abnormal packets that occur when some users produce internet worm traffic. Here, we use three different kinds of data mining algorithms that are Bayesian Network, Decision Tree and Random Forest to classify data into worm, normal data or network attack data (i.e., DOS and Port Scan). III. Attack And Worm Characteristics In this paper, we consider Blaster worm, which is one type of the public worms. Most worms have performances similar to those of the Port Scan and Denial of Service (DoS) attacks. Thus, our method is to classify and detect the Blaster worm, Port Scan and DoS attack performances.We consider UDP flood and HTTP flood in a DoS attack. Particulars about data type are presented below. • Blaster worm activities a buffer overflow susceptibility of the DCOM RPC on Windows platforms by spreading to ports 135 and 4444 on TCP protocol and port 69 on UDP protocol. This worm can transfer and operate by itself. After that, the worm creates DoS attacks to escape patching update by makinga SYN flood to port 80. • UDP flood is a sort of DoS attack. This attack will refer a lot of UDP packets to any target operators or a network system. This performance will consume more bandwidth. • HTTP flood is a kind of DoS attack as well. This attack is as analogous as the UDP flood. The HTTP flood will send a lot of unusable packets to any target operators to consume high bandwidth on Web Server. • Port Scan is a procedure to scan for accessible port or service that runs on any ports from any users. IV. Classification Algorithms 4.1 C4.5 Decision Tree [8]: It is famous data mining algorithm that classifies data set by using numerous nodes of the tree. It forms a tree by using a divide-and-conquer procedure. A Decision tree is approached with over-fitting on large datasets. The classification model of Decision tree is created by mining rules from the training set. These rules are used to calculate and classify a new or anonymous dataset called a testing set. The Decision tree will discover asolution class by starting at the root and crossing to a leafnode. The result of prediction and classification can be found in a leaf node. Moreover C4.5 Decision tree is an algorithm that is well-known and has an efficiency in classification. 4.2 Random Forest [9]: It is an operational data mining algorithm since it can fix problem of over-fitting on large dataset and can train/test rapidly on large and complex data set. A tree is constructed using random data from a training dataset through replacement; major of these datasets is used for training, and the remaining of dataset is used for testing or result assessment. This model can calculate important features used in classification and un-pruned rules that are formed and estimated by the training dataset. There are many classification trees included in Random Forest model. Each classification tree is exclusive and is voted for a class. Finally, an solution class is assigned constructed on the maximum vote. 4.3 Bayesian network [10]: It is a graphical model and a probabilistic model. A Bayesian network uses numerous nodes or positions that have probabilistic relation with each other. The Bayesian network studiesunexpected relation from the training dataset to classify or predict unknown cases. Moreover, it can avoid over-fitting with large data.
  • 3. Internet Worm Classification and Detection using Data Mining Techniques DOI: 10.9790/0661-17317681 www.iosrjournals.org 78 | Page 4.4 Information Gain: Itis a proposition of feature selection. Information Gain computes for an entropy cost of each attribute. An entropy cost can be called as a rank. Rank of each feature represents its importance or association with an solution class that is used to recognize the data. So a feature with comparatively high rank will be one of the most important features for classification. V. Proposed Model 5.1 Overview Our worm detection model divides into preprocessing and classification part as shown in Figure 1.In the preprocessing, we insert the actual Blaster worm, obtained from a consistent online source, into a local area network (LAN). At the same time, we also produce UDP flood, HTTP flood and Port Scan attacks into a LAN (local area network). Fig. 1. Worm Detection Model Here,snort raw network packets from the Local Area Network andchoose only some features from the packet header of all raw packet performances that is major and necessary to predict or classify the data. The preprocessing and feature selection technique will be shown in details in Section B. After the preprocessing part, separate the obtained datasets into two parts; one for training and the other one for testing. In the classification part,
  • 4. Internet Worm Classification and Detection using Data Mining Techniques DOI: 10.9790/0661-17317681 www.iosrjournals.org 79 | Page using data mining algorithms to classify the features of Worm, Http flood, UDP flood, Port Scan and Normal network behavior. These will be discussed in more detail in Section C. 5.2 Preprocessing Part Each source IP address togetherat one second is one record. Moreover, each record has 13 features that mine from entire packets in 1 second. Detail of the features is shown below. • Number of individually source IP address in 1 second • Numeral of destination IP address • Number of TCP header packet • Number of ICMP header packet • Number of UDP header packet • Number of SYN (Synchronization) flag (bit 1) • Number of ACK (Acknowledgement) flag (bit 1) • Number of RST flag (bit 1) • Total of source port • Total of destination port • Number of difference packet size • Port ratio is the number of source port separated by number of destination port • SYN ratio is the number of SYN flag bit 1 shared by number of destination IP In Preprocessing is the major task in data mining. After preprocessing the data we can split the data into two set one is training set and another one is testing set. We can perform the preprocessing in the worm detection dataset. And the importing the dataset, then perform preprocessing.In preprocessing part, we can extract the training test based on the source IP address collected at 1 second is one record Moreover, each record has 13 features that extract from all packets in 1 second.Finally, the preprocessing part creates a training dataset and testing dataset. The testing dataset has half size of the training set. 5.3 Classification Part In this part, first we train the data mining techniques which are Random Forest, C4.5 Decision tree and Bayesian Network using the WEKA tool [7] with training dataset and then testing these techniques with a different testing data set. Here, test our models by classifying normal data, UDP flood, HTTP flood ,Blaster worm and Port Scan, using 13-features of preprocessed dataset. VI. Experimental Evaluation 6.1 Parameter Evaluation The performance of each classification model is compared and measured by using the detection rates, which are True Positive and False Alarm defined as follows:  True Positive: a process classifies the input data correctly.  False Alarm: a process misclassifies normal input data, and reports it as having anomalousperformance. 6.2 Experimental Results For our experiment, our classification outcomes in terms of detection rate and false-alarm rate. Three different data mining techniques are considered and estimated one by one. From Table I, with our 13-feature input data, each of the techniques can classify normal internet data, UDP flood, internet worm, HTTP flood and Port Scan attacks with a detection rate over 97.8% data. In particular, the Decision tree ,Random Forest and Bayesian Network techniques give 99.4% , 99.6%and 97.8 detection rates, respectively. Additionally, Bayesian Network offers the lowest true-positive rate in worm detectionthat is 91.6%, while the UDP flood detection is perfect with 100% true-positive detection rate. From Table II, with our 13-feature input data, the Random Forest, Decision tree and Bayesian Network models can detect and classify internet worm giving false-alarm rates equal to 0.3% ,0.2% and 1.9%, respectively. Essentially, each of the techniques can classify network attacks which are HTTP flood, UDP flood and Port Scan attacks, giving the false-alarm rate equal to zero.
  • 5. Internet Worm Classification and Detection using Data Mining Techniques DOI: 10.9790/0661-17317681 www.iosrjournals.org 80 | Page Table I. Detection Rate And True Positive Model Detection Rate (%) True Positive Normal (%) Worm(%) UDP Flood(%) HTTP Flood(%) Port Scan (%) Bayesian Network 97.8 98.2 91.6 100.0 98.0 99.8 C4.5 Decision Tree 99.4 99.6 99.0 100.0 98.2 99.8 Random Forest 99.6 99.7 99.2 100.0 98.8 99.8 Table II. False Alaram Rate Model False Alarm Worm(%) UDP Flood (%) HTTP Flood(%) Port Scan (%) Bayesian Network 1.9 0.0 0.0 0.0 C4.5 Decision Tree 0.2 0.0 0.0 0.0 Random Forest 0.3 0.0 0.0 0.0 From Table I and Table II the resulte of the classification techniques in term of worm detectionmodel true positive and false rate is shown in the Figure 2. Fig.2. Performance of Worm Detection Model VII. Conclusion In this paper, our worm detection model consists of preprocessing and classification techniques. The propose model consist of a preprocessing method with 13 features mined from the network packets. Three data mining algorithms which are Random Forest, Bayesian Network and Decision tree are measured to classify performances of Normal network data, UDP flood, Http flood, Blaster Worm and Port Scan. Most internet worms have performances similar to Port scan and DoS attack. So proposed model not only has efficiency to detect internet worms, but also can classify attack types such as HTTP flood, UDP flood and Port Scan with low false-alarm rate and high detection rate. Especially, Bayesian Network gives the percentage of internet worm classification less than 99% as 91.6% and percentage of false-alarm as 1.9% so that in practice, 1.9% of false-alarm rate is very high. However, we found that the Random Forest and the Decision Tree algorithms can detect internet worm and classify DOS and Port Scan attacks with a detection rate over 99% and false-alarm rate close to zero. References [1]. N. Weaver, V. Paxson, S. Staniford and R. Cunningham, “Taxonomy of computer worms,” Proc of the ACM workshop on Rapid malcode, WORM03, 2003, pp. 11-18. [2]. C. Smith, A. Matrawy, S. Chow and B. Abdelaziz, “Computer Worms: Architecture, Evasion Strategies, and Detection Mechanisms,” J. of Information Assurance and Security, 2009, pp. 69-83. [3]. M. M. Rasheed, N. M. Norwawi, O. Ghazali, M. M. Kadhum, “Intelligent Failure Connection Algorithm for Detecting Internet Worms”, International Journal of Computer Science and Network Security, Vol. 9, No. 5, 2009, pp. 280-285. 0 10 20 30 40 50 60 70 80 90 100 Bayesian Network Decision Tree Random Forest True Positive False Positive
  • 6. Internet Worm Classification and Detection using Data Mining Techniques DOI: 10.9790/0661-17317681 www.iosrjournals.org 81 | Page [4]. D. R. Ellis, J. G. Aiken, K. S. Attwood, S. D.Tenaglia, “A Behavioral Approach to Worm Detection,” Proceedings of the 2004 ACM workshop on Rapid malcode, 2004, pp. 43-53. [5]. S. A. Khayam, H. Radha and D. Loguinov, “Worm Detection at Network Endpoints Using Information-Theoretic Traffic Perturbations”, IEEE Inter Conf on Communications (ICC), 2008, pp. 1561-1565. [6]. M. Siddiqui, M. C. Wang and J, Lee, "Detecting Internet Worms Using Data Mining Techniques", Cybernetics and Information Technologies, Systems and Applications: CITSA, 2008. [7]. Weka 3.7.0 tools [Online], Available: www.cs.waikato.ac.nz/ml/weka/ [2009, July 2] [8]. N. Pater, “Enhancing Random Forest Implementation in Weka”, Machine Learning Conference Paper for ECE591Q, 2005 [9]. D. Heckerman, “A Tutorial on Learning with Bayesian Networks” Microsoft Research Advanced Technology Division Microsoft Corporation, 1996. [10]. Classification via Trees in WEKA [online], Avilable : http:/maya.cs.depaul.edu/~classses/ect584/WEKA/classify.html [11]. Tawfeeq S. Barhoom, Hanaa A. Qeshta,” Adaptive Worm Detection Model Based on Multi classifiers” 978-0-7695-4984-2/13, 2013 Palestinian International Conference on Information and Communication Technology