SlideShare a Scribd company logo
International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012
DOI : 10.5121/ijctcm.2012.2101 1
An Empirical Comparison and Feature Reduction
Performance Analysis of Intrusion Detection
Upendra1
and Yogendra Kumar Jain2
1
Research Scholar M.Tech, Department of Computer Science & Engineering, Samrat
Ashok Technological Institute, Vidisha, M.P., India
upendra.chaurasiya@gmail.com
2
Head of Department, Computer Science & Engineering, Samrat Ashok Technological
Institute, Vidisha, M.P., India
ABSTRACT
This paper reports on the empirical evaluation of five machine learning algorithm such as J48, BayesNet,
OneR, NB and ZeroR using ten performance criteria: accuracy, precision, recall, F-Measure, incorrectly
classified instances, kappa statistic, mean absolute error, root mean squared error, relative absolute error,
root relative squared error. The aim of this paper is to find out which classifier is better in its performance
for intrusion detection system. Machine Learning is one of the methods used in the intrusion detection
system (IDS).Based on this study, it can be concluded that J48 decision tree is the most suitable associated
algorithm than the other four algorithms. In this paper we compared the performance of Intrusion
Detection System (IDS) Classifiers using seven feature reduction techniques.
KEYWORDS
Intrusion Detection, Machine Learning, BayesNet, C 4.5, NB, ZeroR, KDD 99
1. INTRODUCTION
Empirical studies indicate that feature reduction technique is capable of reducing the size of
dataset [28]. Recently research on machine learning for intrusion detection has standard much
attention in the computational intelligence community. In intrusion detection algorithm, immense
strengths of audit data must be analyzed in order to conception new detection rules for increasing
number of novel attacks in high speed network. Intrusion detection algorithm should consider the
composite properties of attack behaviors to improve the detection speed and detection accuracy.
Analyze the large volume of network dataset and the better performances of detection accuracy,
intrusion detection become an important research field for machine learning. In this work we
have presented J48 decision tree algorithm for intrusion detection based on machine learning. The
Intrusion Detection System (IDS) is Process of monitoring the events occurring in a computer
system or network and analyzing them for signs of possible incidents. IDS was first introduced in
1980 by James. P. Anderson [3] and then improved by D. Denning [4] in 1987.
International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012
2
Various paradigms namely Support Vector Machine [30], Neural Networks[31], K-means based
clustering[32] have been applied to intrusion detection because it has the advantage of
discovering useful knowledge that describes a user’s or program’s behavior. They are two basic
approaches for Intrusion Detection techniques, i.e. Anomaly Detection and Misuse Detection
(signature-based ID) [17]. Anomaly Detection is basically based on assumption that attacker
behavior is different from normal user's behavior [1]. In this paper, we present the application of
machine learning to intrusion detection. We analyse five learning algorithms (J48, BayesNet,
OneR,NB and ZeroR) for the task of detecting intrusions and compare their relative
performances. There is only available data set is KDD data set for the purpose of experiment for
intrusion detection.KDD data set [2] contain 42 attributes. The classes in KDD99 [18] dataset can
be categorized into five main classes (one normal class and four main intrusion classes: probe,
Dos, U2R and R2L).Many feature reduction methods use information theory based metrics to
measure the relevance of features.
2. RELATED WORK
Intrusion detection started in 1980’s and since then a number of techniques have been introduced
to build intrusion detection systems [12], [13], [14]. In 2007, Panda and Patra [10] determined a
method using naive Bayes to detect signatures of specific attacks. They used KDD99 dataset for
experiment, in the early 1980’s, Stanford Research Institute (SRI) developed an Intrusion
Detection Expert System (IDES) that monitors user behavior and detects suspicious events. Meng
Jianliang [6] used the K Mean algorithm to cluster and analyze the data. He used the
unsupervised learning technique for the intrusion detection. Mohammadreza Ektefa et al., [8] in
2010, compared C4.5 with SVM and the results revealed that C4.5 algorithms better than SVM in
detecting network intrusions and false alarm rate. Zubair A.Baig et al. (2011) proposed An
AODE-based Intrusion Detection System for Computer Networks. They suggested that the Naive
Bayes (NB) does not accurately detect network intrusions [7]. In 2010, Hai Nguyen et al. [5]
applied C4.5 and BayesNet for intrusion detection on KDD CUP’99 Dataset. Jiong Zhang and
Mohammad Zulkernine [9] done the intrusion detection using the random forest algorithms in
anomaly based NIDS. Cuixio Zhang, Guobing Zhang, Shanshan Sun [15] used the missed
approach for the intrusion detection. He designed the mixed combining the anomaly detection and
misuse detection in this model the anomaly detection module is built using unsupervised
clustering method and the algorithm is an improved algorithm of K means clustering algorithm.
The new algorithm learns the strong points from the k-means and improved relations trilateral
triangle theorem. Gary Stein [11] applied the genetic algorithm and the decision tree algorithm
for the intrusion detection. He used the genetic algorithm technique for the feature reduction.
W.Lee et al. [29] propose a technique to measure the performance of an intrusion detection
system by quantifying the benefits and costs of detection rules.
3. METHODOLOGICAL APPROACH
Decision tree technology is a common, intuitionist and fast classification method [21]. Its
construction process is top-down, divide-and-rule. Essentially it is a greedy algorithm. Starting
from root node, for each non-leaf node, firstly choose an attribute to test the sample set; Secondly
divide training sample set into several sub-sample sets according to testing results, each sub-
sample set constitutes a new leaf node; Thirdly repeat the above division process, until having
reached specific end conditions. In the process of constructing decision tree, selecting testing
International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012
3
attribute and how to divide sample set are very crucial. Different decision tree algorithm uses
different technology. In practice, because the size of training sample set is usually large, the
branches and layers of generated tree are also more. In addition, abnormity and noise existed in
training sample set will also cause some abnormal branches, so we need to prune decision tree.
One of the greatest advantages of decision tree classification algorithm is that: It does not require
users to know a lot of background knowledge in the learning process. As long as training samples
can be expressed as the form of attribute-conclusion, you can use this algorithm to study. But
decision tree technology also has a lot of deficiency, such as: When there are too many
categories, classification accuracy is significantly reduced; It is difficult to find rules based on the
combination of several variables. At present, there are a lot of decision algorithms, such as: ID3,
SLIQ, CART, CHAID and so on. But J48 algorithm is the most representative and widely used. It
is proposed by Quinlan in 1993.
A Naive Bayes classifier [19] is a simple probabilistic classifier based on applying Bayes'
theorem (from Bayesian statistics) with strong (naive) independence assumptions. A more
descriptive term for the underlying probability model would be "independent feature model". In
simple terms, a naive Bayes classifier assumes that the presence (or absence) of a particular
feature of a class is unrelated to the presence (or absence) of any other feature. For example, a
fruit may be considered to be an apple if it is red, round, and about 4" in diameter. Even if these
features depend on each other or upon the existence of the other features, a naive Bayes classifier
considers all of these properties to independently contribute to the probability that this fruit is an
apple Depending on the precise nature of the probability model; naive Bayes classifiers can be
trained very efficiently in a supervised learning setting. In many practical applications, parameter
estimation for naive Bayes models uses the method of maximum likelihood; in other words, one
can work with the naive Bayes model without believing in Bayesian probability or using any
Bayesian methods.
3.1 INFORMATION GAIN BY AN EXAMPLE DATA SET
The proposed feature reduction technique can be easily understood by the following example. To
demonstrate efficiency of the proposed technique, we have used weather database [20] to
calculate information gain.
TABLE I. WEATHER EXAMPLE DATASET
day Outlook Temperature Humidity Windy Class:Play
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rainy Mild High Weak Yes
D5 Rainy Cool Normal Weak Yes
D6 Rainy Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012
4
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rainy Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rainy Mild High Strong No
Table I. presents a training set, D, of class-labelled tuples randomly selected from the All
Electronics weather database. In this example, each attribute is discrete-valued. The class label
attribute, play compute, has two distinct values (namely, yes, no); therefore, there are two distinct
classes (that is, m = 2).Let class C1 correspond to yes and class C2 correspond to no. There are
nine tuples of class yes and five tuples of class no. A (root) node N is created for the tuples in D.
We compute the information gain of each attribute. We first compute the expected information
needed to classify a tuple in D:
Info (D)= - 9/14 log2 (9/14) –5/14 log2 (5/14)=0.940 bits....(1)
Next, we need to compute the expected information requirement for each attribute. Let’s start
with the attribute outlook.We need to look at the distribution of yes and no tuples for each
category of outlook. For the outlook category sunny, there are two yes tuples and three no tuples.
For the category overcast, there are four yes tuples and zero no tuples. For the category rainy,
there are three yes tuples and two no tuples. Now we calculate the Info for an attribute Outlook.
The expected information needed to classify a tuple in D if the tuples are partitioned according to
outlook is:
Info outlook (D) = 5/14 × (- 2/5 log2 2/5 – 3/5 log2 3/5) + 4/14 ×
(-4/4 log2 4/4 -0/4 log2 0/4) + 5/14 × (-3/5 log2
3/5 – 2/5 log2 2/5)
= 0.694 bits................................................... (2)
Hence, the gain in information from such a partitioning would be equation (1) – (2)
Gain (outlook) = Info (D) – Info outlook (D)
= 0.940 – 0.694
= 0.246 bits
Similarly, we can compute Gain (temperature) = 0.029 bits,
Gain(humidity) = 0.151bits, and Gain (windy) = 0.048 bits
International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012
5
Using the method above for calculation of information gain, we calculate the info gain of the all
the attribute of the KDD99 data set. The info gain of the all the attribute is given below in table I.
In our proposed technique we are using the KDD99 dataset with these selected features and train
and test the algorithm. For the testing we are using the 10 fold cross validation. Features selection
techniques have been employed by Researchers. In other domain to extract important
features.Skurichina and Duin [16] suggested that predictive accuracy can be improved by
combining feature sets.
TABLE II. A SAMPLE CONFUSION MATRIX
Predicted Class
Positive
Predicted Class
Negative
Actual Class
Positive
a b
Actual Class
Negative
c d
In this confusion matrix, the value a is called a true positive and the value d is called a true
negative. The value b is referred to as a false negative and c is known as false positive.
3.2 True Positive Rate,Fale Positive Rate
In the context of intrusion detection, a true positive is an instance which is normal and is also
classified as normal by the intrusion detector. For a good IDS TP rate should be high. False
positive means no attack but IDS detect the attack. For a good IDS FP should be low.
3.3 Accuracy
This is the most basic measure of the performance of a learning method. This measure determines
the percentage of correctly classified instances. From the confusion matrix, we can say that:
Accuracy =
ࢇ ା ࢊ
ࢇ ା ࢈ ା ࢉ ା ࢊ
This metric gives the number of instances from the dataset which are classified correctly i.e. the
ratio of true positives and true negatives to the total number of instances.
1 Mean Absolute Error: In statistics, the mean absolute error (MAE) is a quantity used to
measure how close forecasts or predictions are to the eventual outcomes. The mean absolute
error (MAE) is given by
1
n
෍ | f୧ିy୧|
୬
୧ୀଵ
=
1
n
෍ |e୧|
୬
୧ୀଵ
The mean absolute error is an average of the absolute errors ݁௜=݂௜ି‫ݕ‬௜ , where f୧ is the prediction
and ‫ݕ‬௜ the true value.
International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012
6
2 Root Mean Squared Error (RMSE): or Root-Mean-Square deviation (RMSD). It is a
frequently-used measure of the differences between values predicted by a model or an
estimator and the values actually observed from the thing being modeled or estimated
ඨ
∑ ൫ࢌ ሺ࢞࢏ሻ − ࢟࢏ ൯૛
࢔
3 Relative Absolute Error: The relative absolute error Ei of an individual program i is
evaluated by the equation:
‫ܧ‬௜ୀ
∑ ห ܲሺ௜௝ሻ − ܶ௝ ห௡
௝ୀଵ
∑ ห ܶ௝ − ܶഥ ห௡
௝ୀଵ
where P(ij) is the value predicted by the individual program i for sample case j (out of n sample
cases); Tj is the target value for sample case j; and ܶത is given by the formula:
ࢀ =
૚
࢔
∑ ࢀ࢐
࢔
࢐ୀ૚
For a perfect fit, the numerator is equal to 0 and Ei = 0. So, the Ei index ranges from 0 to infinity,
with 0 corresponding to the ideal. Root Relative Squared error: The root relative squared error Ei
of an individual program i is evaluated by the equation:
۳ܑ = ඨ
∑ ൫ࡼሺ࢏࢐ሻ − ࢀ࢐൯ ૛࢔
࢐ୀ૚
∑ ൫ࢀ࢐ − ࢀ൯࢔
࢐ୀ૚ ૛
3.4 J48
Decision tree J48 developed by Johan Ross Quinlan [25]. C4.5 is an extension of Quinlan's earlier
the Interactive Dichotomizer3 (ID3) Algorithm. J48 builds decision trees from a set of labelled
training data using the concept of information entropy.The Decision tree is a classifier expressed
as a recursive partition of the instance space, consists of nodes that form a rooted tree, meaning it
is a directed tree with a node called a root that has no incoming edges referred to as an internal or
test node. All other nodes are called leaves (also known as terminal or decision nodes). Decision
trees [22], [23], [24] are one of the most commonly classification methods used in supervised
learning approaches.
3.5 KDD Cup 1999 Intrusion Detection Data
the KDD 99 intrusion detection benchmark in the International Knowledge Discovery and Data
Mining.The data used in this paper are those proposed in the KDD’99 for intrusion detection [2]
which are generally used for benchmarking intrusion detection problems and subversion of
DARPA (Defense Advanced Research Projects Agency) 1998 dataset. The 1999 KDDCup data
set [18] contains a set of records that represent connections to a military computer network where
there have been multiple intrusions and attacks. KDD dataset contains symbolic as well as
continuous features. attacks fall into four main categories DoS (Denial of Service), R2L (Remote
International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012
7
to Local), U2R (User to Root) and Probe. This data set was obtained from the UCI KDD archive
[26]. The training data set has 65535 data instances with 32 continuous attributes and 9
categorical attributes and dataset includes a set of 41 features. The testing data set is smaller and
contains several new intrusions that were not present in the training data set. KDD dataset is
divided into training and testing record sets.
4. PERFORMANCE EVALUATION AND RESULT
The Tables III, IV and V Shows the performance of five classification methods based on correctly
classified Instances, incorrectly classified Instances ,Kappa statistic, Mean absolute error, Root
Mean Squared Error, Relative Absolute Error,Root Relative Squared error,Coverage of
cases(0.95 level) and Time taken to build the models respectively. The comparison is performed
for 41 and 7 attributes. The five classifier models on the dataset are built and tested by means of
10-fold cross-validation. The Java Heap size was set to 1024 MB for WEKA 3.6.2, the simulation
platform is an Intel™ Core i3-2100 processor system with 3 GB RAM under Microsoft Windows
XP™ Service Pack-2 operating system, 3.10 GHz with 500 GB memory. the mapped &
normalized dataset is further discretized to obtain discrete values for continuous features using
WEKA [27].
TABLE III. COMPARISION OF THE RESULTS FOR J48, BAYESNET, ONER,NB AND
ZEROR WITH ALL ATTRIBUTE
Parameter Classifier
J48 BayesNet OneR NB ZeroR
Correctly
Classified
Instances
99.5594% 96.5624% 96.18% 89.591% 53.3%
Incorrectly
Classified
Instances
0.4406 % 3.4376 % 3.810% 10.408% 46.6%
Kappa statistic 0.9911 0.9307 0.923 0.7906 0
Mean absolute
error
0.0064 0.0378 0.038 0.1034 0.49
Root mean
squared error
0.0651 0.175 0.195 0.3152 0.49
Relative
absolute error
1.2854 % 7.6037 % 7.656% 20.781% 100%
Root relative
squared error
13.059 % 35.0792% 39.13% 63.189% 100%
Coverage of
cases(0.95 level)
99.6229% 97.781 % 96.18% 90.9654 % 54.1%
International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012
TABLE IV. COMPARISION OF THE R
Z
Parameter
J48
Correctly
Classified
Instances
99.890
Incorrectly
Classified
Instances
0.1099 %
Kappa statistic 0.9978
Mean absolute
error
0.0007
Root mean
squared error
0.0206
Relative
absolute error
0.3358 %
Root relative
squared error
6.5596
Coverage of
cases (0.95
level)
99.91 %
From table III and IV.It is clear that
Now we compare the result of the J48, BayesNet, OneR,
compare the result after run the algorithm with all attribute. Secondly we compare the result after
run the algorithm with reduced 7
good best for the intrusion detection.
Figure 1. Comparison of a
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
Accuracy
International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012
COMPARISION OF THE RESULTS FOR J48, BAYESNET, ONER,NB
ZEROR WITH 7 ATTRIBUTE
Classifier
J48 BayesNet OneR NB ZeroR
99.890% 99.243% 97.6761 % 93.569% 59.9649 %
0.1099 % 0.7568 % 2.3239 % 6.4302 % 40.0351 %
0.9978 0.9846 0.9529 0.8708 0
0.0007 0.0032 0.0093 0.0266 0.1963
0.0206 0.0488 0.0964 0.1538 0.3133
0.3358 % 1.6253 % 4.7346 % 13.556% 100 %
6.5596 % 15.5687% 30.7728% 49.080% 100 %
99.91 % 99.6414% 97.6761% 94.700% 89.6221 %
It is clear that The J48 gave the best performance.
of the J48, BayesNet, OneR, NB and ZeroR algorithms. Firstly we
compare the result after run the algorithm with all attribute. Secondly we compare the result after
un the algorithm with reduced 7 attribute than only we conclude that which one algorithm is
etection.
Comparison of accuracy for J48, BayesNet, OneR, NB and ZeroR.
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
Accuracy
International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012
8
NB AND
ZeroR
59.9649 %
40.0351 %
0.1963
0.3133
100 %
100 %
89.6221 %
algorithms. Firstly we
compare the result after run the algorithm with all attribute. Secondly we compare the result after
attribute than only we conclude that which one algorithm is
NB and ZeroR.
International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012
From above figure 1. It is clear that information gain feature reduction method gives the better
accuracy which is desirable for good Intrusion Detection System. Especially in the c
accuracy is 99.9%.
Now we compare the TPR of the J48, BayesNet,
attribute and with selected 7 attributes.
Figure 2. TPR comparison of J48, BayesNet,
For a good IDS TP Rate should be high. Above
is higher when we reduce the feature of the data set using information gain. Especially in the case
of J48 TPR is 1
Figure 2 and Figure 3 above shows the TPR (True Positive Rate) and FPR (False Positive Ra
of the J48, BayesNet, OneR,NB and ZeroR
set. Figure 2. Shows that TPR of the J48 is
desirable. Figure 3. Also shows that FPR of the J48 is almost
intrusion detection algorithm.
0.2
0.4
0.6
0.8
1.2
TruePositiveRate
International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012
From above figure 1. It is clear that information gain feature reduction method gives the better
accuracy which is desirable for good Intrusion Detection System. Especially in the c
of the J48, BayesNet, OneR, NB and ZeroR algorithm with al
attributes.
TPR comparison of J48, BayesNet, OneR,NB and ZeroR
For a good IDS TP Rate should be high. Above figure 2. Shows that TP Rate of the J48 algorithm
is higher when we reduce the feature of the data set using information gain. Especially in the case
Figure 2 and Figure 3 above shows the TPR (True Positive Rate) and FPR (False Positive Ra
and ZeroR algorithm when run with the all attributes of the data
set. Figure 2. Shows that TPR of the J48 is higher than the remaining four algorithms which is
desirable. Figure 3. Also shows that FPR of the J48 is almost zero which is desirable for a good
0
0.2
0.4
0.6
0.8
1
1.2
TP Rate
International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012
9
From above figure 1. It is clear that information gain feature reduction method gives the better
accuracy which is desirable for good Intrusion Detection System. Especially in the case of J48
algorithm with all
figure 2. Shows that TP Rate of the J48 algorithm
is higher when we reduce the feature of the data set using information gain. Especially in the case
Figure 2 and Figure 3 above shows the TPR (True Positive Rate) and FPR (False Positive Rate)
algorithm when run with the all attributes of the data
algorithms which is
zero which is desirable for a good
International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012
Figure 3. FPR compar
For a good IDS FPR should be low. Above figure 3. Shows that FPR of the J48 algorithm is
lower when we reduce the feature of the data set
J48 FPR is 0.In the case of BayesNet, OneR,NB and ZeroR
From above figures 1, 2 and 3 it is clear that J48 algorithm Accuracy, TPR an
other four algorithms. So we can say that reduction of the feature using information gain is better
technique.
Figure 4. Error rate compar
The experimental results shows that
have much better performance than other four
performance of J48 classification has increased their performance using feature reduction method
a notable improvement in their classification, means the classifica
after feature selection.
0
0.2
0.4
0.6
0.8
1
1.2
FalsePositiveRate
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
40.00%
45.00%
ErrorRate
International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012
FPR comparison of J48, BayesNet, OneR,NB and ZeroR
For a good IDS FPR should be low. Above figure 3. Shows that FPR of the J48 algorithm is
lower when we reduce the feature of the data set using information gain. Especially in the case of
the case of BayesNet, OneR,NB and ZeroR algorithm FPR of the greater than 0.
From above figures 1, 2 and 3 it is clear that J48 algorithm Accuracy, TPR and FPR is better than
orithms. So we can say that reduction of the feature using information gain is better
comparison of J48, BayesNet, OneR,NB and ZeroR
The experimental results shows that Performance Evaluation of five classification models, J48
ter performance than other four methods and it is also observed that the overall
performance of J48 classification has increased their performance using feature reduction method
a notable improvement in their classification, means the classification accuracy increases better
Classifier
Incorrectly
Classified
Instances
International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012
10
For a good IDS FPR should be low. Above figure 3. Shows that FPR of the J48 algorithm is
using information gain. Especially in the case of
algorithm FPR of the greater than 0.
d FPR is better than
orithms. So we can say that reduction of the feature using information gain is better
classification models, J48
methods and it is also observed that the overall
performance of J48 classification has increased their performance using feature reduction method
tion accuracy increases better
International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012
Figure 5. Kapa statistic
In this paper, the performance of four well known data mining classifier algorithms
BayesNet, OneR,Naïve Bayes and ZeroR
Experimental results using the KDD CUP99 IDS data set demonstrate that while J48 is one of the
most effective inductive learning algorithms, decision trees are more interesting as far as the
detection of new attacks is concerned
From above figure 4 and 5, it is clear from our evaluation that J48 has relatively detection rates
and Kappa statistic; Incorrectly Classified Instances are compared and found that J48 is excellent
in performance than other classifiers.
TABLE V. COMPARISION OF THE R
0
0.2
0.4
0.6
0.8
1
1.2
International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012
Kapa statistic comparison of J48, BayesNet, OneR,NB and ZeroR
In this paper, the performance of four well known data mining classifier algorithms
and ZeroR are evaluated based on the 10-fold cross validation test,
Experimental results using the KDD CUP99 IDS data set demonstrate that while J48 is one of the
most effective inductive learning algorithms, decision trees are more interesting as far as the
n of new attacks is concerned.
is clear from our evaluation that J48 has relatively detection rates
Incorrectly Classified Instances are compared and found that J48 is excellent
classifiers.
COMPARISION OF THE RESULTS FOR J48, BAYESNET, ONER, NB AND ZERO
ATTRIBUTE
Kappa statistic
International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012
11
ison of J48, BayesNet, OneR,NB and ZeroR
namely J48,
fold cross validation test,
Experimental results using the KDD CUP99 IDS data set demonstrate that while J48 is one of the
most effective inductive learning algorithms, decision trees are more interesting as far as the
is clear from our evaluation that J48 has relatively detection rates
Incorrectly Classified Instances are compared and found that J48 is excellent
EROR WITH 7
International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012
12
5. CONCLUSIONS
In this paper we compared the performance measure of five machine learning classifiers such as
Decision tree J48,BayesNet,OneR,Naive Bayes and ZeroR.The results are compared and found
that J48 is excellent in performance than other classifiers with respect to accuracy.
we reduced the features of the data set using information gain of the attributes. This study is
approached to discover the best classification algorithm for the applications of machine learning
to intrusion detection. Our simulation results show that, in general, the J48 has the highest
classification accuracy performance with the lowest error rate. On the other hand, we also found
that drastically decreased in learning time of the algorithm and increase in accuracy and TPR.
Comparison shows that reduction of the feature using information gain technique is suitable for
the feature reduction. Using Weka, we analysed five algorithms towards their suitability for
detecting intrusions from KDD99 dataset. We showed that machine learning can be effectively
applied to detect novel intrusions and focused on anomaly detection. The five learning algorithms
J48, BayesNet, OneR, Naïve Bayes and ZeroR were compared at the task of detecting intrusions.
J48 with an accuracy rate of approximately 99% was found to perform much better at detecting
intrusions than BayesNet, OneR, NB and ZeroR Based on the experiments done in the paper and
their Corresponding results, we can state the following: J48 classifier shows better performance
for all the classes (Normal, DOS, R2L, U2R, Prob)
REFERENCES
[1] Lida Rashidi,Sattar Hashem and Ali Hamzeh, “Anomaly detection in categorical datasets using
bayesian networks,” AICI’11 Proceedings of the Third International Conference on Artificial
Intelligence and Computational Intelligence, Volume Part II, Springer-Verlag, Berlin ,Heidelberg,
2011, pp.610–619.
[2] Knowledge Discovery in Databases DARPA archive. Task Description ,KDDCUP 1999 DataSet,
http://www.kdd.ics.uci.edu/databases/kddcup99/task.html
[3] James P. Anderson, “Computer Security Threat Monitoring and Surveillance,” Technical Report,
James P.Anderson Co.,Fort Washington, Pennsylvania, USA , pp.98–17, April 1980.
[4] Dorothy E. Denning,“An Intrusion Detection Model,” IEEE Transaction on Software Engineering
(TSE), volume–13, No.2, pp.222–232,February 1987.
[5] Hai Nguyen, Katrin Franke and Slobodan Petrovi’c, “Improving Effectiveness of Intrusion Detection
by Correlation Feature Selection,” International Conference on Availability, Reliability and Security,
pp. 17–24, IEEE 2010.
[6] Meng Jianliang, Shang Haikun, “The application on intrusion detection based on K-Means cluster
algorithm,” International Forum on Information Technology and Application, 2009.
[7] Zubair A. Baig, Abdulrhman S. Shaheen, and Radwan AbdelAal, “An AODE-based Intrusion
Detection System for Computer Networks,” pp. 28–35, IEEE 2011.
[8] Mohammadreza Ektefa, Sara Memar, Fatimah Sidi, Lilly Suriani Affendey,“Intrusion Detection
Using Data Mining Techniques,” Proceedings Of IEEE International Conference on Information
Retrieval & Knowledge Management,Exploring Invisible World, CAMP’10,2010, pp.200-203.
International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012
13
[9] Jiong Zhang and Mohhammad Zulkernine, “Anomaly based Network Intrusion detection with
unsupervised outlier detection,” School of Computing Queen’s University, Kingston, Ontario,
Canada. IEEE International Conference ICC 2006, Volume-9, pp. 2388-2393, 11-15 June 2006.
[10] M. Panda, and M. R. Patra, “Network intrusion detection using naive Bayes,” International Journal of
Computer Science and Network Security (IJCSNS), Volume -7, No. 12, December 2007, pp. 258–
263.
[11] Gary Stein, Bing Chen,“ Decision Tree Classifier for network intrusion detection with GA based
feature selection,” University of Central Florida. ACM-SE 43, Proceedings of 43rd annual Southeast
regional Conference. Volume-2,2005.ACM,New York,USA.
[12] Shai Rubin, Somesh Jha, and Barton P. Miller, “Protomatching Network Traffic for High Throughput
Network Intrusion Detection,” In Proceedings of the Proceedings of the 13th ACM conference on
Computer and Communications Security, pages 47–58. ACM, 2006.
[13] Marco Cova, Davide Balzarotti, Viktoria Felmetsger, and Giovanni Vigna. Swaddler, “An Approach
for the Anomaly-Based Detection,” Symposium on Recent Advances in Intrusion Detection(RAID),
pages 63–86. Springer, 2007.
[14] Pavel Kachurka, Vladimir Golovko, “Neural Network Approach to Real-Time Network Intrusion
Detection and Recognition,” The 6th IEEE International Conference on Intelligent Data Acquisition
and Advanced Computing Systems: Technology and Application,15-17 September 2011, pp. 393–
397, IEEE 2011.
[15] Cuixiao Zhang, Guobing Zhang, Shanshan Sen., “A mixed unsupervised clustering based Intrusion
detection model,” Third International Conference on Genetic and Evolutionary Computing, 2009.
[16] M. Skurichina and R.P.W. Duin, “Combining feature subsets in feature selection,” Lecture Notes in
Computer Science, Vol. 3541,pp-165-175,Springer Verlag, Berlin, 2005.
[17] LI Min and Wang Dongliang, “Anomaly Intrusion Detection Based on SOM,” IEEE WASE
International Conference on Information Engineering, IEEE Computer Society, 2009, pp. 40-44.
[18] Mahbod Tavallaee,Ebrahim Bagheri,Wei Lu, and Ali A.Ghorbani, “A Detailed Analysis of the KDD
CUP 99 Data Set,” Proceedings of the 2009 IEEE Symposium on Computational Intelligence in
Security and Defense Application(CISDA 2009),IEEE 2009.
[19] R.Dogaru,“A modified Naive Bayes classifier for efficient implementations in embedded systems,”
Signals Circuits and Systems (ISSCS), IEEE 10th International Symposium on Lasi, June 30,2011-
July 1, 2011, pp.1–4.
[20] Jiawei Han and Micheline kamber, “Data Mining Concepts and Techniques,”Second
Edition,University of Illinois at Urbana-Champaign The Morgan Kaufmann Series in Data
Management Systems,Elsevier 2007.
[21] Juan Wang, Qiren Yang,Dasen Ren, “An intrusion detection algorithm based on decision tree
Technology,”Asia-Pacific Conference on Information Processing,APCIP 2009 ,Shenzhen,IEEE 18-19
July 2009. pp. 333–335.
[22] John Ross Quinlan, (1992) “Learning with Continuous Classes”,5th Australian Joint Conference on
Artificial Intelligence, Singapore, pp.343-348.
International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012
14
[23] Kamarulrifin Abd Jalil and Mohamad Noorman Masrek,“Comparison of Machine learning
Algorithms Performance in Detecting Network Intrusion”, IEEE 2010 International Conference on
Networking and Information Technology,pp.221-226.
[24] G.Meera Gandhi,Kumaravel Appavoo,S.K. Srivatsa, “Effective Network Intrusion Detection using
Classifiers Decision Trees and Decision rules”, International Journal of Advanced Networking and
Applications, Volume: 2, Issue: 3, pp: 686-692, 2010.
[25] John Ross Quinlan, (1993) “C4.5: Programs for Machine Learning”, Morgan Kaufmann Publishers,
San Mateo,CA.1993.
[26] C. Blake, E. Keogh and C. Merz, UCI repository of machine learning databases,1998.
[27] H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques. Morgan
Kaufmann, San Francisco, Second Edition, 2005.
[28] A.K. Jain, D. Zongker, “Feature Selection: Evaluation, Application, and Small Sample Performance”,
IEEE Trans. Pattern Analysis and Machine Intelligence, 19(2) pp-153-158, 1997.
[29] W. Lee, J. Cabrera, A. Thomas, N. Balwalli, S. Saluja, and Y. Zhang, “Performance adaptation in
real-time intrusion detection systems,” in Recent Advances in Intrusion Detection. Springer, RAID,
2002.
[30] Jiaqi Jiang, Ru Li,Tianhong Zheng,Feiqin Su, Haicheng Li, “A new intrusion detection system using
Class and Sample Weighted C-Support Vector Machine”, Third International Conference on
Communications and Mobile Computing, IEEE Computer Society,2011,pp-51-54.
[31] E. T. Ferreira, G. A. Carrijo, R. de Oliveira and N. V. S. Araujo, “Intrusion Detection System with
Wavelet and Neural Artifical Network Approach for Network Computers,” IEEE Latin America
Transactions, Vol. 9, No. 5, September 2011,pp-832-837.
[32] Yang Zhong, Hirohumi Yamaki, Hiroki Takakura, “A Grid-Based Clustering for Low-Overhead
Anomaly Intrusion Detection,” IEEE 2011, pp-17-24.

More Related Content

PDF
ANALYSIS OF MACHINE LEARNING ALGORITHMS WITH FEATURE SELECTION FOR INTRUSION ...
PDF
A novel ensemble modeling for intrusion detection system
PDF
IRJET - A Novel Approach for Software Defect Prediction based on Dimensio...
PDF
Benchmarks for Evaluating Anomaly Based Intrusion Detection Solutions
PDF
COMPUTER INTRUSION DETECTION BY TWOOBJECTIVE FUZZY GENETIC ALGORITHM
PDF
Adapting New Data In Intrusion Detection Systems
PDF
GROUP FUZZY TOPSIS METHODOLOGY IN COMPUTER SECURITY SOFTWARE SELECTION
PDF
A new clutering approach for anomaly intrusion detection
ANALYSIS OF MACHINE LEARNING ALGORITHMS WITH FEATURE SELECTION FOR INTRUSION ...
A novel ensemble modeling for intrusion detection system
IRJET - A Novel Approach for Software Defect Prediction based on Dimensio...
Benchmarks for Evaluating Anomaly Based Intrusion Detection Solutions
COMPUTER INTRUSION DETECTION BY TWOOBJECTIVE FUZZY GENETIC ALGORITHM
Adapting New Data In Intrusion Detection Systems
GROUP FUZZY TOPSIS METHODOLOGY IN COMPUTER SECURITY SOFTWARE SELECTION
A new clutering approach for anomaly intrusion detection

What's hot (17)

PDF
Comparative Study on Machine Learning Algorithms for Network Intrusion Detect...
PDF
Comparative Analysis: Effective Information Retrieval Using Different Learnin...
PDF
K-Medoids Clustering Using Partitioning Around Medoids for Performing Face Re...
PDF
A Defect Prediction Model for Software Product based on ANFIS
PDF
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
PDF
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
PDF
A Software Measurement Using Artificial Neural Network and Support Vector Mac...
PDF
IRJET- Credit Card Fraud Detection using Isolation Forest
PDF
ADDRESSING IMBALANCED CLASSES PROBLEM OF INTRUSION DETECTION SYSTEM USING WEI...
PDF
Improving the performance of Intrusion detection systems
PDF
IRJET- A Detailed Study on Classification Techniques for Data Mining
PDF
Classification of medical datasets using back propagation neural network powe...
PDF
EFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATA
PDF
Minkowski Distance based Feature Selection Algorithm for Effective Intrusion ...
PDF
Disease Identification and Detection in Apple Tree
PDF
Gene Selection Based on Rough Set Applications of Rough Set on Computational ...
Comparative Study on Machine Learning Algorithms for Network Intrusion Detect...
Comparative Analysis: Effective Information Retrieval Using Different Learnin...
K-Medoids Clustering Using Partitioning Around Medoids for Performing Face Re...
A Defect Prediction Model for Software Product based on ANFIS
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
A Software Measurement Using Artificial Neural Network and Support Vector Mac...
IRJET- Credit Card Fraud Detection using Isolation Forest
ADDRESSING IMBALANCED CLASSES PROBLEM OF INTRUSION DETECTION SYSTEM USING WEI...
Improving the performance of Intrusion detection systems
IRJET- A Detailed Study on Classification Techniques for Data Mining
Classification of medical datasets using back propagation neural network powe...
EFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATA
Minkowski Distance based Feature Selection Algorithm for Effective Intrusion ...
Disease Identification and Detection in Apple Tree
Gene Selection Based on Rough Set Applications of Rough Set on Computational ...
Ad

Similar to An Empirical Comparison and Feature Reduction Performance Analysis of Intrusion Detection (20)

PDF
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
PDF
Layering Based Network Intrusion Detection System to Enhance Network Attacks ...
PDF
Survey of network anomaly detection using markov chain
PDF
International Journal of Computer Science, Engineering and Information Techno...
PDF
SURVEY OF NETWORK ANOMALY DETECTION USING MARKOV CHAIN
PDF
Ijcet 06 07_002
PDF
FORTIFICATION OF HYBRID INTRUSION DETECTION SYSTEM USING VARIANTS OF NEURAL ...
PDF
PDF
Intrusion Detection System(IDS) Development Using Tree-Based Machine Learning...
PDF
Intrusion Detection System (IDS) Development Using Tree-Based Machine Learnin...
PDF
IRJET- Intrusion Detection based on J48 Algorithm
PDF
PERFORMANCE EVALUATION OF J48 AND BAYES ALGORITHMS FOR INTRUSION DETECTION SY...
PDF
Decision Tree Based Algorithm for Intrusion Detection
PDF
New Fuzzy Logic Based Intrusion Detection System
PDF
Constructing a predictive model for an intelligent network intrusion detection
PDF
COMBINING NAIVE BAYES AND DECISION TREE FOR ADAPTIVE INTRUSION DETECTION
PDF
1725 1731
PDF
1725 1731
PDF
Critical analysis of genetic algorithm based IDS and an approach for detecti...
PDF
An Investigation into the Effectiveness of Machine Learning Techniques for In...
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
Layering Based Network Intrusion Detection System to Enhance Network Attacks ...
Survey of network anomaly detection using markov chain
International Journal of Computer Science, Engineering and Information Techno...
SURVEY OF NETWORK ANOMALY DETECTION USING MARKOV CHAIN
Ijcet 06 07_002
FORTIFICATION OF HYBRID INTRUSION DETECTION SYSTEM USING VARIANTS OF NEURAL ...
Intrusion Detection System(IDS) Development Using Tree-Based Machine Learning...
Intrusion Detection System (IDS) Development Using Tree-Based Machine Learnin...
IRJET- Intrusion Detection based on J48 Algorithm
PERFORMANCE EVALUATION OF J48 AND BAYES ALGORITHMS FOR INTRUSION DETECTION SY...
Decision Tree Based Algorithm for Intrusion Detection
New Fuzzy Logic Based Intrusion Detection System
Constructing a predictive model for an intelligent network intrusion detection
COMBINING NAIVE BAYES AND DECISION TREE FOR ADAPTIVE INTRUSION DETECTION
1725 1731
1725 1731
Critical analysis of genetic algorithm based IDS and an approach for detecti...
An Investigation into the Effectiveness of Machine Learning Techniques for In...
Ad

Recently uploaded (20)

PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Trump Administration's workforce development strategy
PDF
Computing-Curriculum for Schools in Ghana
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPTX
History, Philosophy and sociology of education (1).pptx
PDF
Yogi Goddess Pres Conference Studio Updates
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
A systematic review of self-coping strategies used by university students to ...
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Lesson notes of climatology university.
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PDF
01-Introduction-to-Information-Management.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Supply Chain Operations Speaking Notes -ICLT Program
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Module 4: Burden of Disease Tutorial Slides S2 2025
Trump Administration's workforce development strategy
Computing-Curriculum for Schools in Ghana
Weekly quiz Compilation Jan -July 25.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
History, Philosophy and sociology of education (1).pptx
Yogi Goddess Pres Conference Studio Updates
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Final Presentation General Medicine 03-08-2024.pptx
A systematic review of self-coping strategies used by university students to ...
Microbial disease of the cardiovascular and lymphatic systems
STATICS OF THE RIGID BODIES Hibbelers.pdf
Lesson notes of climatology university.
Practical Manual AGRO-233 Principles and Practices of Natural Farming
01-Introduction-to-Information-Management.pdf
Anesthesia in Laparoscopic Surgery in India
Supply Chain Operations Speaking Notes -ICLT Program

An Empirical Comparison and Feature Reduction Performance Analysis of Intrusion Detection

  • 1. International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012 DOI : 10.5121/ijctcm.2012.2101 1 An Empirical Comparison and Feature Reduction Performance Analysis of Intrusion Detection Upendra1 and Yogendra Kumar Jain2 1 Research Scholar M.Tech, Department of Computer Science & Engineering, Samrat Ashok Technological Institute, Vidisha, M.P., India [email protected] 2 Head of Department, Computer Science & Engineering, Samrat Ashok Technological Institute, Vidisha, M.P., India ABSTRACT This paper reports on the empirical evaluation of five machine learning algorithm such as J48, BayesNet, OneR, NB and ZeroR using ten performance criteria: accuracy, precision, recall, F-Measure, incorrectly classified instances, kappa statistic, mean absolute error, root mean squared error, relative absolute error, root relative squared error. The aim of this paper is to find out which classifier is better in its performance for intrusion detection system. Machine Learning is one of the methods used in the intrusion detection system (IDS).Based on this study, it can be concluded that J48 decision tree is the most suitable associated algorithm than the other four algorithms. In this paper we compared the performance of Intrusion Detection System (IDS) Classifiers using seven feature reduction techniques. KEYWORDS Intrusion Detection, Machine Learning, BayesNet, C 4.5, NB, ZeroR, KDD 99 1. INTRODUCTION Empirical studies indicate that feature reduction technique is capable of reducing the size of dataset [28]. Recently research on machine learning for intrusion detection has standard much attention in the computational intelligence community. In intrusion detection algorithm, immense strengths of audit data must be analyzed in order to conception new detection rules for increasing number of novel attacks in high speed network. Intrusion detection algorithm should consider the composite properties of attack behaviors to improve the detection speed and detection accuracy. Analyze the large volume of network dataset and the better performances of detection accuracy, intrusion detection become an important research field for machine learning. In this work we have presented J48 decision tree algorithm for intrusion detection based on machine learning. The Intrusion Detection System (IDS) is Process of monitoring the events occurring in a computer system or network and analyzing them for signs of possible incidents. IDS was first introduced in 1980 by James. P. Anderson [3] and then improved by D. Denning [4] in 1987.
  • 2. International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012 2 Various paradigms namely Support Vector Machine [30], Neural Networks[31], K-means based clustering[32] have been applied to intrusion detection because it has the advantage of discovering useful knowledge that describes a user’s or program’s behavior. They are two basic approaches for Intrusion Detection techniques, i.e. Anomaly Detection and Misuse Detection (signature-based ID) [17]. Anomaly Detection is basically based on assumption that attacker behavior is different from normal user's behavior [1]. In this paper, we present the application of machine learning to intrusion detection. We analyse five learning algorithms (J48, BayesNet, OneR,NB and ZeroR) for the task of detecting intrusions and compare their relative performances. There is only available data set is KDD data set for the purpose of experiment for intrusion detection.KDD data set [2] contain 42 attributes. The classes in KDD99 [18] dataset can be categorized into five main classes (one normal class and four main intrusion classes: probe, Dos, U2R and R2L).Many feature reduction methods use information theory based metrics to measure the relevance of features. 2. RELATED WORK Intrusion detection started in 1980’s and since then a number of techniques have been introduced to build intrusion detection systems [12], [13], [14]. In 2007, Panda and Patra [10] determined a method using naive Bayes to detect signatures of specific attacks. They used KDD99 dataset for experiment, in the early 1980’s, Stanford Research Institute (SRI) developed an Intrusion Detection Expert System (IDES) that monitors user behavior and detects suspicious events. Meng Jianliang [6] used the K Mean algorithm to cluster and analyze the data. He used the unsupervised learning technique for the intrusion detection. Mohammadreza Ektefa et al., [8] in 2010, compared C4.5 with SVM and the results revealed that C4.5 algorithms better than SVM in detecting network intrusions and false alarm rate. Zubair A.Baig et al. (2011) proposed An AODE-based Intrusion Detection System for Computer Networks. They suggested that the Naive Bayes (NB) does not accurately detect network intrusions [7]. In 2010, Hai Nguyen et al. [5] applied C4.5 and BayesNet for intrusion detection on KDD CUP’99 Dataset. Jiong Zhang and Mohammad Zulkernine [9] done the intrusion detection using the random forest algorithms in anomaly based NIDS. Cuixio Zhang, Guobing Zhang, Shanshan Sun [15] used the missed approach for the intrusion detection. He designed the mixed combining the anomaly detection and misuse detection in this model the anomaly detection module is built using unsupervised clustering method and the algorithm is an improved algorithm of K means clustering algorithm. The new algorithm learns the strong points from the k-means and improved relations trilateral triangle theorem. Gary Stein [11] applied the genetic algorithm and the decision tree algorithm for the intrusion detection. He used the genetic algorithm technique for the feature reduction. W.Lee et al. [29] propose a technique to measure the performance of an intrusion detection system by quantifying the benefits and costs of detection rules. 3. METHODOLOGICAL APPROACH Decision tree technology is a common, intuitionist and fast classification method [21]. Its construction process is top-down, divide-and-rule. Essentially it is a greedy algorithm. Starting from root node, for each non-leaf node, firstly choose an attribute to test the sample set; Secondly divide training sample set into several sub-sample sets according to testing results, each sub- sample set constitutes a new leaf node; Thirdly repeat the above division process, until having reached specific end conditions. In the process of constructing decision tree, selecting testing
  • 3. International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012 3 attribute and how to divide sample set are very crucial. Different decision tree algorithm uses different technology. In practice, because the size of training sample set is usually large, the branches and layers of generated tree are also more. In addition, abnormity and noise existed in training sample set will also cause some abnormal branches, so we need to prune decision tree. One of the greatest advantages of decision tree classification algorithm is that: It does not require users to know a lot of background knowledge in the learning process. As long as training samples can be expressed as the form of attribute-conclusion, you can use this algorithm to study. But decision tree technology also has a lot of deficiency, such as: When there are too many categories, classification accuracy is significantly reduced; It is difficult to find rules based on the combination of several variables. At present, there are a lot of decision algorithms, such as: ID3, SLIQ, CART, CHAID and so on. But J48 algorithm is the most representative and widely used. It is proposed by Quinlan in 1993. A Naive Bayes classifier [19] is a simple probabilistic classifier based on applying Bayes' theorem (from Bayesian statistics) with strong (naive) independence assumptions. A more descriptive term for the underlying probability model would be "independent feature model". In simple terms, a naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature. For example, a fruit may be considered to be an apple if it is red, round, and about 4" in diameter. Even if these features depend on each other or upon the existence of the other features, a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple Depending on the precise nature of the probability model; naive Bayes classifiers can be trained very efficiently in a supervised learning setting. In many practical applications, parameter estimation for naive Bayes models uses the method of maximum likelihood; in other words, one can work with the naive Bayes model without believing in Bayesian probability or using any Bayesian methods. 3.1 INFORMATION GAIN BY AN EXAMPLE DATA SET The proposed feature reduction technique can be easily understood by the following example. To demonstrate efficiency of the proposed technique, we have used weather database [20] to calculate information gain. TABLE I. WEATHER EXAMPLE DATASET day Outlook Temperature Humidity Windy Class:Play D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rainy Mild High Weak Yes D5 Rainy Cool Normal Weak Yes D6 Rainy Cool Normal Strong No D7 Overcast Cool Normal Strong Yes
  • 4. International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012 4 D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rainy Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rainy Mild High Strong No Table I. presents a training set, D, of class-labelled tuples randomly selected from the All Electronics weather database. In this example, each attribute is discrete-valued. The class label attribute, play compute, has two distinct values (namely, yes, no); therefore, there are two distinct classes (that is, m = 2).Let class C1 correspond to yes and class C2 correspond to no. There are nine tuples of class yes and five tuples of class no. A (root) node N is created for the tuples in D. We compute the information gain of each attribute. We first compute the expected information needed to classify a tuple in D: Info (D)= - 9/14 log2 (9/14) –5/14 log2 (5/14)=0.940 bits....(1) Next, we need to compute the expected information requirement for each attribute. Let’s start with the attribute outlook.We need to look at the distribution of yes and no tuples for each category of outlook. For the outlook category sunny, there are two yes tuples and three no tuples. For the category overcast, there are four yes tuples and zero no tuples. For the category rainy, there are three yes tuples and two no tuples. Now we calculate the Info for an attribute Outlook. The expected information needed to classify a tuple in D if the tuples are partitioned according to outlook is: Info outlook (D) = 5/14 × (- 2/5 log2 2/5 – 3/5 log2 3/5) + 4/14 × (-4/4 log2 4/4 -0/4 log2 0/4) + 5/14 × (-3/5 log2 3/5 – 2/5 log2 2/5) = 0.694 bits................................................... (2) Hence, the gain in information from such a partitioning would be equation (1) – (2) Gain (outlook) = Info (D) – Info outlook (D) = 0.940 – 0.694 = 0.246 bits Similarly, we can compute Gain (temperature) = 0.029 bits, Gain(humidity) = 0.151bits, and Gain (windy) = 0.048 bits
  • 5. International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012 5 Using the method above for calculation of information gain, we calculate the info gain of the all the attribute of the KDD99 data set. The info gain of the all the attribute is given below in table I. In our proposed technique we are using the KDD99 dataset with these selected features and train and test the algorithm. For the testing we are using the 10 fold cross validation. Features selection techniques have been employed by Researchers. In other domain to extract important features.Skurichina and Duin [16] suggested that predictive accuracy can be improved by combining feature sets. TABLE II. A SAMPLE CONFUSION MATRIX Predicted Class Positive Predicted Class Negative Actual Class Positive a b Actual Class Negative c d In this confusion matrix, the value a is called a true positive and the value d is called a true negative. The value b is referred to as a false negative and c is known as false positive. 3.2 True Positive Rate,Fale Positive Rate In the context of intrusion detection, a true positive is an instance which is normal and is also classified as normal by the intrusion detector. For a good IDS TP rate should be high. False positive means no attack but IDS detect the attack. For a good IDS FP should be low. 3.3 Accuracy This is the most basic measure of the performance of a learning method. This measure determines the percentage of correctly classified instances. From the confusion matrix, we can say that: Accuracy = ࢇ ା ࢊ ࢇ ା ࢈ ା ࢉ ା ࢊ This metric gives the number of instances from the dataset which are classified correctly i.e. the ratio of true positives and true negatives to the total number of instances. 1 Mean Absolute Error: In statistics, the mean absolute error (MAE) is a quantity used to measure how close forecasts or predictions are to the eventual outcomes. The mean absolute error (MAE) is given by 1 n ෍ | f୧ିy୧| ୬ ୧ୀଵ = 1 n ෍ |e୧| ୬ ୧ୀଵ The mean absolute error is an average of the absolute errors ݁௜=݂௜ି‫ݕ‬௜ , where f୧ is the prediction and ‫ݕ‬௜ the true value.
  • 6. International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012 6 2 Root Mean Squared Error (RMSE): or Root-Mean-Square deviation (RMSD). It is a frequently-used measure of the differences between values predicted by a model or an estimator and the values actually observed from the thing being modeled or estimated ඨ ∑ ൫ࢌ ሺ࢞࢏ሻ − ࢟࢏ ൯૛ ࢔ 3 Relative Absolute Error: The relative absolute error Ei of an individual program i is evaluated by the equation: ‫ܧ‬௜ୀ ∑ ห ܲሺ௜௝ሻ − ܶ௝ ห௡ ௝ୀଵ ∑ ห ܶ௝ − ܶഥ ห௡ ௝ୀଵ where P(ij) is the value predicted by the individual program i for sample case j (out of n sample cases); Tj is the target value for sample case j; and ܶത is given by the formula: ࢀ = ૚ ࢔ ∑ ࢀ࢐ ࢔ ࢐ୀ૚ For a perfect fit, the numerator is equal to 0 and Ei = 0. So, the Ei index ranges from 0 to infinity, with 0 corresponding to the ideal. Root Relative Squared error: The root relative squared error Ei of an individual program i is evaluated by the equation: ۳ܑ = ඨ ∑ ൫ࡼሺ࢏࢐ሻ − ࢀ࢐൯ ૛࢔ ࢐ୀ૚ ∑ ൫ࢀ࢐ − ࢀ൯࢔ ࢐ୀ૚ ૛ 3.4 J48 Decision tree J48 developed by Johan Ross Quinlan [25]. C4.5 is an extension of Quinlan's earlier the Interactive Dichotomizer3 (ID3) Algorithm. J48 builds decision trees from a set of labelled training data using the concept of information entropy.The Decision tree is a classifier expressed as a recursive partition of the instance space, consists of nodes that form a rooted tree, meaning it is a directed tree with a node called a root that has no incoming edges referred to as an internal or test node. All other nodes are called leaves (also known as terminal or decision nodes). Decision trees [22], [23], [24] are one of the most commonly classification methods used in supervised learning approaches. 3.5 KDD Cup 1999 Intrusion Detection Data the KDD 99 intrusion detection benchmark in the International Knowledge Discovery and Data Mining.The data used in this paper are those proposed in the KDD’99 for intrusion detection [2] which are generally used for benchmarking intrusion detection problems and subversion of DARPA (Defense Advanced Research Projects Agency) 1998 dataset. The 1999 KDDCup data set [18] contains a set of records that represent connections to a military computer network where there have been multiple intrusions and attacks. KDD dataset contains symbolic as well as continuous features. attacks fall into four main categories DoS (Denial of Service), R2L (Remote
  • 7. International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012 7 to Local), U2R (User to Root) and Probe. This data set was obtained from the UCI KDD archive [26]. The training data set has 65535 data instances with 32 continuous attributes and 9 categorical attributes and dataset includes a set of 41 features. The testing data set is smaller and contains several new intrusions that were not present in the training data set. KDD dataset is divided into training and testing record sets. 4. PERFORMANCE EVALUATION AND RESULT The Tables III, IV and V Shows the performance of five classification methods based on correctly classified Instances, incorrectly classified Instances ,Kappa statistic, Mean absolute error, Root Mean Squared Error, Relative Absolute Error,Root Relative Squared error,Coverage of cases(0.95 level) and Time taken to build the models respectively. The comparison is performed for 41 and 7 attributes. The five classifier models on the dataset are built and tested by means of 10-fold cross-validation. The Java Heap size was set to 1024 MB for WEKA 3.6.2, the simulation platform is an Intel™ Core i3-2100 processor system with 3 GB RAM under Microsoft Windows XP™ Service Pack-2 operating system, 3.10 GHz with 500 GB memory. the mapped & normalized dataset is further discretized to obtain discrete values for continuous features using WEKA [27]. TABLE III. COMPARISION OF THE RESULTS FOR J48, BAYESNET, ONER,NB AND ZEROR WITH ALL ATTRIBUTE Parameter Classifier J48 BayesNet OneR NB ZeroR Correctly Classified Instances 99.5594% 96.5624% 96.18% 89.591% 53.3% Incorrectly Classified Instances 0.4406 % 3.4376 % 3.810% 10.408% 46.6% Kappa statistic 0.9911 0.9307 0.923 0.7906 0 Mean absolute error 0.0064 0.0378 0.038 0.1034 0.49 Root mean squared error 0.0651 0.175 0.195 0.3152 0.49 Relative absolute error 1.2854 % 7.6037 % 7.656% 20.781% 100% Root relative squared error 13.059 % 35.0792% 39.13% 63.189% 100% Coverage of cases(0.95 level) 99.6229% 97.781 % 96.18% 90.9654 % 54.1%
  • 8. International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012 TABLE IV. COMPARISION OF THE R Z Parameter J48 Correctly Classified Instances 99.890 Incorrectly Classified Instances 0.1099 % Kappa statistic 0.9978 Mean absolute error 0.0007 Root mean squared error 0.0206 Relative absolute error 0.3358 % Root relative squared error 6.5596 Coverage of cases (0.95 level) 99.91 % From table III and IV.It is clear that Now we compare the result of the J48, BayesNet, OneR, compare the result after run the algorithm with all attribute. Secondly we compare the result after run the algorithm with reduced 7 good best for the intrusion detection. Figure 1. Comparison of a 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 120.00% Accuracy International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012 COMPARISION OF THE RESULTS FOR J48, BAYESNET, ONER,NB ZEROR WITH 7 ATTRIBUTE Classifier J48 BayesNet OneR NB ZeroR 99.890% 99.243% 97.6761 % 93.569% 59.9649 % 0.1099 % 0.7568 % 2.3239 % 6.4302 % 40.0351 % 0.9978 0.9846 0.9529 0.8708 0 0.0007 0.0032 0.0093 0.0266 0.1963 0.0206 0.0488 0.0964 0.1538 0.3133 0.3358 % 1.6253 % 4.7346 % 13.556% 100 % 6.5596 % 15.5687% 30.7728% 49.080% 100 % 99.91 % 99.6414% 97.6761% 94.700% 89.6221 % It is clear that The J48 gave the best performance. of the J48, BayesNet, OneR, NB and ZeroR algorithms. Firstly we compare the result after run the algorithm with all attribute. Secondly we compare the result after un the algorithm with reduced 7 attribute than only we conclude that which one algorithm is etection. Comparison of accuracy for J48, BayesNet, OneR, NB and ZeroR. 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 120.00% Accuracy International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012 8 NB AND ZeroR 59.9649 % 40.0351 % 0.1963 0.3133 100 % 100 % 89.6221 % algorithms. Firstly we compare the result after run the algorithm with all attribute. Secondly we compare the result after attribute than only we conclude that which one algorithm is NB and ZeroR.
  • 9. International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012 From above figure 1. It is clear that information gain feature reduction method gives the better accuracy which is desirable for good Intrusion Detection System. Especially in the c accuracy is 99.9%. Now we compare the TPR of the J48, BayesNet, attribute and with selected 7 attributes. Figure 2. TPR comparison of J48, BayesNet, For a good IDS TP Rate should be high. Above is higher when we reduce the feature of the data set using information gain. Especially in the case of J48 TPR is 1 Figure 2 and Figure 3 above shows the TPR (True Positive Rate) and FPR (False Positive Ra of the J48, BayesNet, OneR,NB and ZeroR set. Figure 2. Shows that TPR of the J48 is desirable. Figure 3. Also shows that FPR of the J48 is almost intrusion detection algorithm. 0.2 0.4 0.6 0.8 1.2 TruePositiveRate International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012 From above figure 1. It is clear that information gain feature reduction method gives the better accuracy which is desirable for good Intrusion Detection System. Especially in the c of the J48, BayesNet, OneR, NB and ZeroR algorithm with al attributes. TPR comparison of J48, BayesNet, OneR,NB and ZeroR For a good IDS TP Rate should be high. Above figure 2. Shows that TP Rate of the J48 algorithm is higher when we reduce the feature of the data set using information gain. Especially in the case Figure 2 and Figure 3 above shows the TPR (True Positive Rate) and FPR (False Positive Ra and ZeroR algorithm when run with the all attributes of the data set. Figure 2. Shows that TPR of the J48 is higher than the remaining four algorithms which is desirable. Figure 3. Also shows that FPR of the J48 is almost zero which is desirable for a good 0 0.2 0.4 0.6 0.8 1 1.2 TP Rate International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012 9 From above figure 1. It is clear that information gain feature reduction method gives the better accuracy which is desirable for good Intrusion Detection System. Especially in the case of J48 algorithm with all figure 2. Shows that TP Rate of the J48 algorithm is higher when we reduce the feature of the data set using information gain. Especially in the case Figure 2 and Figure 3 above shows the TPR (True Positive Rate) and FPR (False Positive Rate) algorithm when run with the all attributes of the data algorithms which is zero which is desirable for a good
  • 10. International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012 Figure 3. FPR compar For a good IDS FPR should be low. Above figure 3. Shows that FPR of the J48 algorithm is lower when we reduce the feature of the data set J48 FPR is 0.In the case of BayesNet, OneR,NB and ZeroR From above figures 1, 2 and 3 it is clear that J48 algorithm Accuracy, TPR an other four algorithms. So we can say that reduction of the feature using information gain is better technique. Figure 4. Error rate compar The experimental results shows that have much better performance than other four performance of J48 classification has increased their performance using feature reduction method a notable improvement in their classification, means the classifica after feature selection. 0 0.2 0.4 0.6 0.8 1 1.2 FalsePositiveRate 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00% 45.00% ErrorRate International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012 FPR comparison of J48, BayesNet, OneR,NB and ZeroR For a good IDS FPR should be low. Above figure 3. Shows that FPR of the J48 algorithm is lower when we reduce the feature of the data set using information gain. Especially in the case of the case of BayesNet, OneR,NB and ZeroR algorithm FPR of the greater than 0. From above figures 1, 2 and 3 it is clear that J48 algorithm Accuracy, TPR and FPR is better than orithms. So we can say that reduction of the feature using information gain is better comparison of J48, BayesNet, OneR,NB and ZeroR The experimental results shows that Performance Evaluation of five classification models, J48 ter performance than other four methods and it is also observed that the overall performance of J48 classification has increased their performance using feature reduction method a notable improvement in their classification, means the classification accuracy increases better Classifier Incorrectly Classified Instances International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012 10 For a good IDS FPR should be low. Above figure 3. Shows that FPR of the J48 algorithm is using information gain. Especially in the case of algorithm FPR of the greater than 0. d FPR is better than orithms. So we can say that reduction of the feature using information gain is better classification models, J48 methods and it is also observed that the overall performance of J48 classification has increased their performance using feature reduction method tion accuracy increases better
  • 11. International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012 Figure 5. Kapa statistic In this paper, the performance of four well known data mining classifier algorithms BayesNet, OneR,Naïve Bayes and ZeroR Experimental results using the KDD CUP99 IDS data set demonstrate that while J48 is one of the most effective inductive learning algorithms, decision trees are more interesting as far as the detection of new attacks is concerned From above figure 4 and 5, it is clear from our evaluation that J48 has relatively detection rates and Kappa statistic; Incorrectly Classified Instances are compared and found that J48 is excellent in performance than other classifiers. TABLE V. COMPARISION OF THE R 0 0.2 0.4 0.6 0.8 1 1.2 International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012 Kapa statistic comparison of J48, BayesNet, OneR,NB and ZeroR In this paper, the performance of four well known data mining classifier algorithms and ZeroR are evaluated based on the 10-fold cross validation test, Experimental results using the KDD CUP99 IDS data set demonstrate that while J48 is one of the most effective inductive learning algorithms, decision trees are more interesting as far as the n of new attacks is concerned. is clear from our evaluation that J48 has relatively detection rates Incorrectly Classified Instances are compared and found that J48 is excellent classifiers. COMPARISION OF THE RESULTS FOR J48, BAYESNET, ONER, NB AND ZERO ATTRIBUTE Kappa statistic International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012 11 ison of J48, BayesNet, OneR,NB and ZeroR namely J48, fold cross validation test, Experimental results using the KDD CUP99 IDS data set demonstrate that while J48 is one of the most effective inductive learning algorithms, decision trees are more interesting as far as the is clear from our evaluation that J48 has relatively detection rates Incorrectly Classified Instances are compared and found that J48 is excellent EROR WITH 7
  • 12. International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012 12 5. CONCLUSIONS In this paper we compared the performance measure of five machine learning classifiers such as Decision tree J48,BayesNet,OneR,Naive Bayes and ZeroR.The results are compared and found that J48 is excellent in performance than other classifiers with respect to accuracy. we reduced the features of the data set using information gain of the attributes. This study is approached to discover the best classification algorithm for the applications of machine learning to intrusion detection. Our simulation results show that, in general, the J48 has the highest classification accuracy performance with the lowest error rate. On the other hand, we also found that drastically decreased in learning time of the algorithm and increase in accuracy and TPR. Comparison shows that reduction of the feature using information gain technique is suitable for the feature reduction. Using Weka, we analysed five algorithms towards their suitability for detecting intrusions from KDD99 dataset. We showed that machine learning can be effectively applied to detect novel intrusions and focused on anomaly detection. The five learning algorithms J48, BayesNet, OneR, Naïve Bayes and ZeroR were compared at the task of detecting intrusions. J48 with an accuracy rate of approximately 99% was found to perform much better at detecting intrusions than BayesNet, OneR, NB and ZeroR Based on the experiments done in the paper and their Corresponding results, we can state the following: J48 classifier shows better performance for all the classes (Normal, DOS, R2L, U2R, Prob) REFERENCES [1] Lida Rashidi,Sattar Hashem and Ali Hamzeh, “Anomaly detection in categorical datasets using bayesian networks,” AICI’11 Proceedings of the Third International Conference on Artificial Intelligence and Computational Intelligence, Volume Part II, Springer-Verlag, Berlin ,Heidelberg, 2011, pp.610–619. [2] Knowledge Discovery in Databases DARPA archive. Task Description ,KDDCUP 1999 DataSet, http://www.kdd.ics.uci.edu/databases/kddcup99/task.html [3] James P. Anderson, “Computer Security Threat Monitoring and Surveillance,” Technical Report, James P.Anderson Co.,Fort Washington, Pennsylvania, USA , pp.98–17, April 1980. [4] Dorothy E. Denning,“An Intrusion Detection Model,” IEEE Transaction on Software Engineering (TSE), volume–13, No.2, pp.222–232,February 1987. [5] Hai Nguyen, Katrin Franke and Slobodan Petrovi’c, “Improving Effectiveness of Intrusion Detection by Correlation Feature Selection,” International Conference on Availability, Reliability and Security, pp. 17–24, IEEE 2010. [6] Meng Jianliang, Shang Haikun, “The application on intrusion detection based on K-Means cluster algorithm,” International Forum on Information Technology and Application, 2009. [7] Zubair A. Baig, Abdulrhman S. Shaheen, and Radwan AbdelAal, “An AODE-based Intrusion Detection System for Computer Networks,” pp. 28–35, IEEE 2011. [8] Mohammadreza Ektefa, Sara Memar, Fatimah Sidi, Lilly Suriani Affendey,“Intrusion Detection Using Data Mining Techniques,” Proceedings Of IEEE International Conference on Information Retrieval & Knowledge Management,Exploring Invisible World, CAMP’10,2010, pp.200-203.
  • 13. International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012 13 [9] Jiong Zhang and Mohhammad Zulkernine, “Anomaly based Network Intrusion detection with unsupervised outlier detection,” School of Computing Queen’s University, Kingston, Ontario, Canada. IEEE International Conference ICC 2006, Volume-9, pp. 2388-2393, 11-15 June 2006. [10] M. Panda, and M. R. Patra, “Network intrusion detection using naive Bayes,” International Journal of Computer Science and Network Security (IJCSNS), Volume -7, No. 12, December 2007, pp. 258– 263. [11] Gary Stein, Bing Chen,“ Decision Tree Classifier for network intrusion detection with GA based feature selection,” University of Central Florida. ACM-SE 43, Proceedings of 43rd annual Southeast regional Conference. Volume-2,2005.ACM,New York,USA. [12] Shai Rubin, Somesh Jha, and Barton P. Miller, “Protomatching Network Traffic for High Throughput Network Intrusion Detection,” In Proceedings of the Proceedings of the 13th ACM conference on Computer and Communications Security, pages 47–58. ACM, 2006. [13] Marco Cova, Davide Balzarotti, Viktoria Felmetsger, and Giovanni Vigna. Swaddler, “An Approach for the Anomaly-Based Detection,” Symposium on Recent Advances in Intrusion Detection(RAID), pages 63–86. Springer, 2007. [14] Pavel Kachurka, Vladimir Golovko, “Neural Network Approach to Real-Time Network Intrusion Detection and Recognition,” The 6th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Application,15-17 September 2011, pp. 393– 397, IEEE 2011. [15] Cuixiao Zhang, Guobing Zhang, Shanshan Sen., “A mixed unsupervised clustering based Intrusion detection model,” Third International Conference on Genetic and Evolutionary Computing, 2009. [16] M. Skurichina and R.P.W. Duin, “Combining feature subsets in feature selection,” Lecture Notes in Computer Science, Vol. 3541,pp-165-175,Springer Verlag, Berlin, 2005. [17] LI Min and Wang Dongliang, “Anomaly Intrusion Detection Based on SOM,” IEEE WASE International Conference on Information Engineering, IEEE Computer Society, 2009, pp. 40-44. [18] Mahbod Tavallaee,Ebrahim Bagheri,Wei Lu, and Ali A.Ghorbani, “A Detailed Analysis of the KDD CUP 99 Data Set,” Proceedings of the 2009 IEEE Symposium on Computational Intelligence in Security and Defense Application(CISDA 2009),IEEE 2009. [19] R.Dogaru,“A modified Naive Bayes classifier for efficient implementations in embedded systems,” Signals Circuits and Systems (ISSCS), IEEE 10th International Symposium on Lasi, June 30,2011- July 1, 2011, pp.1–4. [20] Jiawei Han and Micheline kamber, “Data Mining Concepts and Techniques,”Second Edition,University of Illinois at Urbana-Champaign The Morgan Kaufmann Series in Data Management Systems,Elsevier 2007. [21] Juan Wang, Qiren Yang,Dasen Ren, “An intrusion detection algorithm based on decision tree Technology,”Asia-Pacific Conference on Information Processing,APCIP 2009 ,Shenzhen,IEEE 18-19 July 2009. pp. 333–335. [22] John Ross Quinlan, (1992) “Learning with Continuous Classes”,5th Australian Joint Conference on Artificial Intelligence, Singapore, pp.343-348.
  • 14. International Journal of Control Theory and Computer Modelling (IJCTCM) Vol.2, No.1, January 2012 14 [23] Kamarulrifin Abd Jalil and Mohamad Noorman Masrek,“Comparison of Machine learning Algorithms Performance in Detecting Network Intrusion”, IEEE 2010 International Conference on Networking and Information Technology,pp.221-226. [24] G.Meera Gandhi,Kumaravel Appavoo,S.K. Srivatsa, “Effective Network Intrusion Detection using Classifiers Decision Trees and Decision rules”, International Journal of Advanced Networking and Applications, Volume: 2, Issue: 3, pp: 686-692, 2010. [25] John Ross Quinlan, (1993) “C4.5: Programs for Machine Learning”, Morgan Kaufmann Publishers, San Mateo,CA.1993. [26] C. Blake, E. Keogh and C. Merz, UCI repository of machine learning databases,1998. [27] H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco, Second Edition, 2005. [28] A.K. Jain, D. Zongker, “Feature Selection: Evaluation, Application, and Small Sample Performance”, IEEE Trans. Pattern Analysis and Machine Intelligence, 19(2) pp-153-158, 1997. [29] W. Lee, J. Cabrera, A. Thomas, N. Balwalli, S. Saluja, and Y. Zhang, “Performance adaptation in real-time intrusion detection systems,” in Recent Advances in Intrusion Detection. Springer, RAID, 2002. [30] Jiaqi Jiang, Ru Li,Tianhong Zheng,Feiqin Su, Haicheng Li, “A new intrusion detection system using Class and Sample Weighted C-Support Vector Machine”, Third International Conference on Communications and Mobile Computing, IEEE Computer Society,2011,pp-51-54. [31] E. T. Ferreira, G. A. Carrijo, R. de Oliveira and N. V. S. Araujo, “Intrusion Detection System with Wavelet and Neural Artifical Network Approach for Network Computers,” IEEE Latin America Transactions, Vol. 9, No. 5, September 2011,pp-832-837. [32] Yang Zhong, Hirohumi Yamaki, Hiroki Takakura, “A Grid-Based Clustering for Low-Overhead Anomaly Intrusion Detection,” IEEE 2011, pp-17-24.