SlideShare a Scribd company logo
David C. Wyld, et al. (Eds): CCSEA, SEA, CLOUD, DKMP, CS & IT 05, pp. 359–369, 2012.
© CS & IT-CSCP 2012 DOI : 10.5121/csit.2012.2236
Analysis of Bayes, Neural Network and Tree
Classifier of Classification Technique in Data
Mining using WEKA
Yugal kumar1
and G. Sahoo2
1
Assistant Professor in CSE/IT Dept., Hindu College of Engineering, Industrial
Area, Sonepat, Haryana, India.
yugalkumar.14@gmail.com
2
Professor in Dept. of Information Technology, Birla Institute of Technology,
Mesra, Ranchi, Jhrakhand, India.
gsahoo@bitmesra.ac.in
ABSTRACT
In today’s world, gigantic amount of data is available in science, industry, business and many
other areas. This data can provide valuable information which can be used by management for
making important decisions. But problem is that how can find valuable information. The answer
is data mining. Data Mining is popular topic among researchers. There is lot of work that
cannot be explored till now. But, this paper focuses on the fundamental concept of the Data
mining i.e. Classification Techniques. In this paper BayesNet, NavieBayes, NavieBayes Uptable,
Multilayer perceptron, Voted perceptron and J48 classifiers are used for the classification of
data set. The performance of these classifiers analyzed with the help of Mean Absolute Error,
Root Mean-Squared Error and Time Taken to build the model and the result can be shown
statistical as well as graphically. For this purpose the WEKA data mining tool is used.
KEY TERM’S
BayesNet, J48, Mean Absolute Error, NavieBayes, Root Mean-Squared Error
1. INTRODUCTION
In recent years, there is the incremental growth in the electronic data management methods. Each
companies whether it is large, medium or small, having its own database system that are used for
collecting and managing the information, these information are used in the decision process.
Database of any firm consist the thousands of the instance and hundreds of attributes. So, it is
quite difficult to process the data and retrieving meaning full information from the data set in
short span of time. The same problem is faced by researchers and scientists how to process the
large data set for further research. To overcome this problem the term data mining come into
existence. Data mining refers to the process of retrieving information from large sets of data. A
number of algorithms and tools have been developed and implemented to retrieve information
and discover knowledge patterns that may be useful for decision support [2]. The term Data
Mining, also known as Knowledge Discovery in Databases (KDD) refers to the nontrivial
extraction of implicit, previously unknown and potentially useful information from data in
databases [1]. Several data mining techniques are pattern recognition, clustering, association and
classification [4]. Classification has been identified as an important problem in the emerging
field of data mining [3] as they try to find meaningful ways to interpret data sets. Some ethical
360 Computer Science & Information Technology (CS & IT)
issue also related with Data mining for example process a data set that are belongs to racial,
sexual, religious may occur some discernment.
2. CLASSIFICATION
Classification of data is very typical task in data mining. There are large number of classifiers that
are used to classify the data such as bayes, function, rule based and Tree etc. The goal of
classification is to correctly predict the value of a designated discrete class variable, given a
vector of predictors or attributes [5].
2.1. BayseNet
BayesNet based on the bayes theorm. So, in BayesNet conditional probability on each node is
calculated and formed a Bayesian Network. Bayesian Network is a directed acyclic graph. In
BayesNet, it is assume that all attributes are nominal and there are no missing values any such
value replaced globally. Different types of algorithms are used to estimate conditional probability
such as Hill Climbing, Tabu Search, Simulated Annealing, Genetic Algorithm and K2. The
output of the BayesNet can be visualized in terms of graph. Figure 1 shows the visualized graph
of the BayesNet for a bank data set [9]. Visualize graph is formed by using the children attribute
of the bank data set. In this graph, each node represents the probability distribution table within it.
Fig. 1 Visualize Graph of the BayesNet for a bank data set
2.2. NaiveBayes
NaiveBayes is widely used for the classification due to its simplicity, elegance, and robustness.
NavieBayes can be characterized as Navie and Bayes. Navie stands for independence i.e. true to
multiply probabilities when the events are independent and Bayes is used for the bayes rule. This
technique assumes that attributes of a class are independent in real life. The performance of the
NavieBayes is better when the data set is actual. Kernel density estimators can be used to measure
the probability in NavieBayes that improve the performance of the model. A large number of
modifications have been introduced, by the statistical, data mining, machine learning, and pattern
recognition communities, in an attempt to make it more flexible, but one has to recognize that
such modifications are necessarily complications, which detract from its basic simplicity.
2.3. Navie Bayes Updatable
This is the updateable version of NaiveBayes. This classifier will use a default precision of 0.1 for
numeric attributes when buildClassifier is called with zero training instances and also known as
incremental update.
2.4. Multi Layer Percpeptron
Multi Layer Perceptron can be defined as Neural Network and Artificial intelligence without
qualification. A Multi Layer perceptron (MLP) is a feedforward neural network with one or more
Computer Science & Information Technology (CS & IT) 361
layers between input and output layer. The following diagram illustrates a perceptron network
with three layers:
Each neuron in each layer is connected to every neuron in the adjacent layers. The training or
testing vectors are presented to the input layer, and processed by the hidden and output layers. A
Detailed analysis of multi-layer perceptrons has been presented by Hassoun [11] and by Żak[10].
2.5. Voted Perceptron
Voted Perceptron (VP) proposed by Collins can be viewed as a simplified version of CRF[1] and
suggests that the voted perceptron is preferable in cases of noisy or un separable data[3]. Voted
perceptron approaches to small sample analysis and taking advantage of the boundary data of
largest margin. Voted perceptron method is based on the perceptron algorithm of Rosenblatt and
Frank [2].
2.6. J48
J48 are the improved versions of C4.5 algorithms or can be called as optimized implementation
of the C4.5. The output of J48 is the Decision tree. A Decision tree is similar to the tree structure
having root node, intermediate nodes and leaf node. Each node in the tree consist a decision and
that decision leads to our result. Decision tree divide the input space of a data set into mutually
exclusive areas, each area having a label, a value or an action to describe its data points. Splitting
criterion is used to calculate which attribute is the best to split that portion tree of the training data
that reaches a particular node. Fig. 2 shows the decision tree using J48 for a bank data set whether
a bank provide loan to a person or not. Decision tree is formed by using the children attribute of
the bank data set.
362 Computer Science & Information Technology (CS & IT)
Fig. 2 Decision Tree using J48 for Bank Data Set
3. TOOL
The WEKA toolkit is used to analyze the dataset with the data mining algorithms [7]. WEKA is
an assembly of tools of data classification, regression, clustering, association rules and
visualization. The toolkit is developed in Java and is open source software issued under the GNU
General public License [8]. The WEKA tool incorporates the four applications within it.
• Weka Explorer
• Weka Experiment
• Weka Knowledge Flow
• Simple CLI
For the Classification of Data set, weka explorer is used to generate the result or statistics. Weka
Explorer incorporates the following features within it:-
Computer Science & Information Technology (CS & IT) 363
Fig. 3 Pre process of data using weka
• Preprocess: It is used to process the input data. For this purpose the filters are used that
can transform the data from one form to another form. Basically two types of filters are
used i.e. supervised and unsupervised.
• Classify. Classify tab are used for the classification purpose. A large number of
classifiers are used in weka such as bayes, function, rule, tree and meta etc. Four type of
test option are mentioned within it.
• Cluster: It is used for the clustering of the data.
• Associate: Establish the association rules for the data.
• Select attributes: It is used to select the most relevant attributes in the data.
• Visualize: View an interactive 2D plot of the data.
Data set used in Weka is in Attribute-Relation File Format (ARFF) file format that consist of
special tags to indicate different things in the dataset such as attribute names, attribute types,
attribute values and the data. This paper includes the two data sets such as sick.arff and breast-
cancer-wisconsin. Sick.arff data set has been taken from the weka tool website while the brest
cancer data set has been taken from the UCI repository i.e. real time multivariate data set [7, 9].
Brest cancer data set is in the form of text file. Firstly it converts into the .xls format; .xls format
to .csv format and then .csv format convert into the .arff format. The .arff format of both data sets
given as:-
364 Computer Science & Information Technology (CS & IT)
Sick.arff Data Set:
@relation sick.nm
@attribute age real
@attribute sex {M,F}
@attribute on_thyroxine {f,t}
@attribute query_on_thyroxine {f,t}
@attribute on_antithyroid_medication {f,t}
@attribute sick {f,t}
@attribute pregnant {f,t}
@attribute thyroid_surgery {f,t}
@attribute I131_treatment {f,t}
@attribute query_hypothyroid {f,t}
@attribute query_hyperthyroid {f,t}
@attribute lithium {f,t}
@attribute goitre {f,t}
@attribute tumor {f,t}
@attribute hypopituitary {f,t}
@attribute psych {f,t}
@attribute TSHmeasured {f,t}
@attribute TSH real
@attribute T3measured {f,t}
@attribute T3 real
@attribute TT4measured {f,t}
@attribute TT4 real
@attribute T4Umeasured {f,t}
@attribute T4U real
@attribute FTImeasured {f,t}
@attribute FTI real
@attribute TBGmeasured {f,t}
@attribute TBG real
@attribute referral_source {WEST,STMW,SVHC,SVI,SVHD,other}
@attribute class {sick,negative}
@data
Breast-cancer-wisconsin_data,arff Data Set:
@relation breast-cancer
@attribute age {'10-19','20-29','30-39','40-49','50-59','60-69','70-79','80-89','90-99'}
@attribute menopause {'lt40','ge40','premeno'}
@attribute tumor-size {'0-4','5-9','10-14','15-19','20-24','25-29','30-34','35-39','40-44','45-49','50-
54','55-59'}
@attribute inv-nodes {'0-2','3-5','6-8','9-11','12-14','15-17','18-20','21-23','24-26','27-29','30-
32','33-35','36-39'}
@attribute node-caps {'yes','no'}
@attribute deg-malig {'1','2','3'}
@attribute breast {'left','right'}
@attribute breast-quad {'left_up','left_low','right_up','right_low','central'}
@attribute 'irradiat' {'yes','no'}
@attribute 'Class' {'no-recurrence-events','recurrence-events'}
@data
Computer Science & Information Technology (CS & IT) 365
4. RESULT & DISCUSION
In this paper, the following parameters are used to evaluate the performance of above mentioned
classification techniques:
• Mean Absolute Error (MAE): It can define as statistical measure of how far an estimate
from actual values i.e. the average of the absolute magnitude of the individual errors. It is
usually similar in magnitude but slightly smaller than the root mean squared error.
• Root Mean-Squared Error (RMSE): The root mean square error (RMSE)) calculates the
differences between values predicted by a model / an estimator and the values actually
observed from the thing being modeled/ estimated. RMSE is used to measure the
accuracy. It is ideal if it is small.
• Time: The amount of time required to build the model.
Table 1 Comparison of the different classifiers
366 Computer Science & Information Technology (CS & IT)
97.14 97.28 92.57 97.82 93.64 99.67
72.02 71.67 71.67 64.68 71.32 75.52
0
20
40
60
80
100
120
correctly Classfied
Instance 2800
correctly Classfied
Instance 286
Fig. 4 Comparison of Correctly Classified Parameter of Datasets
0.2
0.13 0.03
110.94
0.77 0.3
0.13 0.02 0 8.91 0.03 0.02
0
20
40
60
80
100
120
Time Taken for 2800
Instance
Time Taken for 286
Instance
Fig. 5 Comparison of Time Taken Parameter of Datasets
0.047
0.045
0.088
0.026
0.063
0.006
0.329 0.327 0.327 0.355
0.284
0.367
0
0.1
0.2
0.3
0.4
Mean Absolute Error for
2800 Instance
Mean Absolute Error for
286 Instance
Fig. 6 Comparison of Mean Absolute Error Parameter
Computer Science & Information Technology (CS & IT) 367
0.16 0.15
0.22
0.13
0.25
0.05
0.45 0.45 0.45
0.54 0.53
0.43
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Root Mean Squared Error
for 2800 Instance
Root Mean Squared Error
for 286 Instance
Fig. 7 Comparison of Root Mean Squared Error Parameter
Table 1 shows the comparison of the BayesNet, NavieBayes NavieBayes Uptable, Multilayer
perceptron, Voted perceptron and J48. For the analysis of discussed classifiers the two data sets
has been used in which breast cancer data set has 286 instance and 10 attributes while the sick
data set has 2800 instance and 30 attributes. From the table 1, it is clear that the time taken by the
NavieBayes uptable classifiers to build the model is smallest for both of data set i.e. 0.03s and
0.0s whereas the time taken by the multilayer perceptron is the largest. So, in terms of time taken
the NavieBayes uptable classifier is the best among these. But the analysis of another two
parameter i.e. MAE and RMSE, the model formed by J48 classifier is better. J48 classfier
classified the instance more correctly as compare to BayesNet and Navie Bayes. It is also seen
that the performance of Naviebayes uptable and navie bayes classifiers almost same when the
dataset is small.
5. CONCLUSION
In this paper, six different classifiers are used for the classification of data. These techniques are
applied on two dataset in which one of data set has one tenth of instance and one third attribute as
compare to another data set. The fundamental concept to take two datasets is to analyze the
performance of the discussed classifiers for small as well as large dataset. But, it cannot say easily
which one is better. For example, mean absolute error of J48 is minimum for breast cancer data
set (i.e. small data set) but not minimum for sick data set (i.e. large data set) for from the table 1,
it says that the performance of the J48 classifier/technique is better as compare to another
classifier/technique.
6. FUTURE WORK
In weka, there are the large numbers of classifiers such as fuzzy rules, REP tree, Random tree,
Gaussian Function, Regression and so on. So the future work will be based on these classifiers i.e.
apply these classifiers on the data set and analyze the performance of these classifiers. In this
paper, six parameters are used for the analysis the performance of the classifiers. In future,
numbers of parameter will be increased such that better result will be obtained.
368 Computer Science & Information Technology (CS & IT)
REFERENCES
[1] J. Han and M. Kamber, (2000) “Data Mining: Concepts and Techniques,” Morgan Kaufmann.
[2] Desouza, K.C. (2001) ,Artificial intelligence for healthcare management In Proceedings of the First
International Conference on Management of Healthcare and Medical Technology Enschede,
Netherlands Institute for Healthcare Technology Management.
[3] Rakesh Agrawal,Tomasz Imielinski and Arun Swami, (1993)” Data mining : A Performance
perspective“. IEEE Transactions on Knowledge and Data Engineering , 5(6):914-925.
[4] Ritu Chauhan, Harleen Kaur, M.Afshar Alam, (2010) “Data Clustering Method for Discovering
Clusters in Spatial Cancer Databases”, International Journal of Computer Applications (0975 – 8887)
Volume 10– No.6.
[5] Daniel Grossman and Pedro Domingos (2004). Learning Bayesian Network Classifiers by
Maximizing Conditional Likelihood. In Press of Proceedings of the 21st International Conference on
Machine Learning, Banff, Canada.
[6] Ridgeway G, Madigan D, Richardson T (1998) Interpretable boosted naive Bayes classification. In:
Agrawal R, StolorzP, Piatetsky-Shapiro G (eds) Proceedings of the fourth international conference on
knowledge discovery and data mining.. AAAI Press, Menlo Park pp 101–104.
[7] Weka: Data Mining Software in Java http://www.cs.waikato.ac.nz/ml/weka/
[8] Ian H.Witten and Elbe Frank, (2005) "Datamining Practical Machine Learning Tools and
Techniques," Second Edition, Morgan Kaufmann, San Fransisco.
[9] www.ics.uci.edu/~mlearn
[10] Zak S.H., (2003), “ Systems and Control” NY: Oxford Uniniversity Press.
[11] Hassoun M.H, (1999), “ Fundamentals of Artificial Neural Networks”, Cambridge, MA: MIT press.
[12] Yoav Freund, Robert E. Schapire, (1999) "Large Margin Classification Using the Perceptron
Algorithm." In: Machine Learning, 37(3).
[13] Yunhua Hu, Hang Li, Yunbo Cao, Li Teng, Dmitriy Meyerzon, Qinghua Zheng, (2006), ” Automatic
extraction of titles from general documents using machine learning”, in Information Processing and
Management(publisheb by elesvier) 42, 1276–1293.
[14] Michael Collins and Nigel Duffy, (2002), “New Ranking Algorithms for Parsing and Tagging:
Kernels over Discrete Structures, and the Voted Perceptron” in Proceedings of the 40th Annual
Meeting of the Association for Computational Linguistics (ACL), Philadelphia, pp. 263-270.
Computer Science & Information Technology (CS & IT) 369
Authors Bibliography
G. Sahoo received his MSc in Mathematics from Utkal University in the year
1980 and PhD in the Area of Computational Mathematics from Indian Institute of
Technology, Kharagpur in the year 1987. He has been associated with Birla
Institute of Technology, Mesra, Ranchi, India since 1988, and currently, he is
working as a Professor and Head in the Department of Information Technology.
His research interest includes theoretical computer science, parallel and
distributed computing, cloud computing, evolutionary computing, information
security, image processing and pattern recognition.
Mr. Yugal Kumar received his B.Tech in Information Technology from
Maharishi Dayanand University, Rohtak, (India) in 2006 & M.Tech in
Computer Engineering from Maharishi Dayanand University, Rohtak, India
in 2009. His research interests include fuzzy logic, computer network and
Data Mining & Swarm Intelligence system. At present, he has been worked
as working as an Assistant Professor in Department of Computer Science and
Engineering, Hindu College of Engineering, Sonepat, Haryana, India.
Ad

Recommended

INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
IJDKP
 
A statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environment
IJDKP
 
Recommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduce
IJDKP
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
IJERA Editor
 
Enabling Use of Dynamic Anonymization for Enhanced Security in Cloud
Enabling Use of Dynamic Anonymization for Enhanced Security in Cloud
IOSR Journals
 
Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic Web
Editor IJCATR
 
Granularity analysis of classification and estimation for complex datasets wi...
Granularity analysis of classification and estimation for complex datasets wi...
IJECEIAES
 
An Analysis of Outlier Detection through clustering method
An Analysis of Outlier Detection through clustering method
IJAEMSJORNAL
 
winbis1005
winbis1005
vamshi batchu
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
theijes
 
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...
IRJET Journal
 
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
IJDKP
 
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
IJDKP
 
31 34
31 34
Ijarcsee Journal
 
Enhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging area
IJDKP
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data mining
eSAT Publishing House
 
Application of data mining tools for
Application of data mining tools for
IJDKP
 
Data mining Algorithm’s Variant Analysis
Data mining Algorithm’s Variant Analysis
IOSR Journals
 
Application of KDD & its future scope
Application of KDD & its future scope
Tanmay Sethi
 
Seminar Presentation
Seminar Presentation
Vaibhav Dhattarwal
 
130509
130509
International Journal of Technical Research & Application
 
U0 vqmtq3m tc=
U0 vqmtq3m tc=
International Journal of Science and Research (IJSR)
 
Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining
14894
 
Ijariie1184
Ijariie1184
IJARIIE JOURNAL
 
Bs31267274
Bs31267274
IJMER
 
J48 and JRIP Rules for E-Governance Data
J48 and JRIP Rules for E-Governance Data
CSCJournals
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining Algorithms
IJERA Editor
 
PREDICTING PERFORMANCE OF CLASSIFICATION ALGORITHMS
PREDICTING PERFORMANCE OF CLASSIFICATION ALGORITHMS
Samsung Electronics
 
Predicting performance of classification algorithms
Predicting performance of classification algorithms
IAEME Publication
 
IRJET- Medical Data Mining
IRJET- Medical Data Mining
IRJET Journal
 

More Related Content

What's hot (17)

winbis1005
winbis1005
vamshi batchu
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
theijes
 
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...
IRJET Journal
 
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
IJDKP
 
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
IJDKP
 
31 34
31 34
Ijarcsee Journal
 
Enhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging area
IJDKP
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data mining
eSAT Publishing House
 
Application of data mining tools for
Application of data mining tools for
IJDKP
 
Data mining Algorithm’s Variant Analysis
Data mining Algorithm’s Variant Analysis
IOSR Journals
 
Application of KDD & its future scope
Application of KDD & its future scope
Tanmay Sethi
 
Seminar Presentation
Seminar Presentation
Vaibhav Dhattarwal
 
130509
130509
International Journal of Technical Research & Application
 
U0 vqmtq3m tc=
U0 vqmtq3m tc=
International Journal of Science and Research (IJSR)
 
Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining
14894
 
Ijariie1184
Ijariie1184
IJARIIE JOURNAL
 
Bs31267274
Bs31267274
IJMER
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
theijes
 
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...
IRJET Journal
 
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
IJDKP
 
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
IJDKP
 
Enhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging area
IJDKP
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data mining
eSAT Publishing House
 
Application of data mining tools for
Application of data mining tools for
IJDKP
 
Data mining Algorithm’s Variant Analysis
Data mining Algorithm’s Variant Analysis
IOSR Journals
 
Application of KDD & its future scope
Application of KDD & its future scope
Tanmay Sethi
 
Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining
14894
 
Bs31267274
Bs31267274
IJMER
 

Similar to Analysis of Bayes, Neural Network and Tree Classifier of Classification Technique in Data Mining using WEKA (20)

J48 and JRIP Rules for E-Governance Data
J48 and JRIP Rules for E-Governance Data
CSCJournals
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining Algorithms
IJERA Editor
 
PREDICTING PERFORMANCE OF CLASSIFICATION ALGORITHMS
PREDICTING PERFORMANCE OF CLASSIFICATION ALGORITHMS
Samsung Electronics
 
Predicting performance of classification algorithms
Predicting performance of classification algorithms
IAEME Publication
 
IRJET- Medical Data Mining
IRJET- Medical Data Mining
IRJET Journal
 
IJCSI-10-6-1-288-292
IJCSI-10-6-1-288-292
HARDIK SINGH
 
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
Editor IJMTER
 
Data mining weka
Data mining weka
prashant 100702007
 
Data Mining in Market Research
Data Mining in Market Research
butest
 
Data Mining In Market Research
Data Mining In Market Research
jim
 
Data Mining In Market Research
Data Mining In Market Research
kevinlan
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET Journal
 
Ijetcas14 338
Ijetcas14 338
Iasir Journals
 
Introduction to Data Mining
Introduction to Data Mining
Kai Koenig
 
Classification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining Techniques
inventionjournals
 
IRJET- Study and Evaluation of Classification Algorithms in Data Mining
IRJET- Study and Evaluation of Classification Algorithms in Data Mining
IRJET Journal
 
Data mining techniques using weka
Data mining techniques using weka
Prashant Menon
 
Data mining
Data mining
Jhadesunil
 
Itb weka nikhil
Itb weka nikhil
nikhilyagnic
 
Data mining with weka
Data mining with weka
Hein Min Htike
 
J48 and JRIP Rules for E-Governance Data
J48 and JRIP Rules for E-Governance Data
CSCJournals
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining Algorithms
IJERA Editor
 
PREDICTING PERFORMANCE OF CLASSIFICATION ALGORITHMS
PREDICTING PERFORMANCE OF CLASSIFICATION ALGORITHMS
Samsung Electronics
 
Predicting performance of classification algorithms
Predicting performance of classification algorithms
IAEME Publication
 
IRJET- Medical Data Mining
IRJET- Medical Data Mining
IRJET Journal
 
IJCSI-10-6-1-288-292
IJCSI-10-6-1-288-292
HARDIK SINGH
 
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
Editor IJMTER
 
Data Mining in Market Research
Data Mining in Market Research
butest
 
Data Mining In Market Research
Data Mining In Market Research
jim
 
Data Mining In Market Research
Data Mining In Market Research
kevinlan
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET Journal
 
Introduction to Data Mining
Introduction to Data Mining
Kai Koenig
 
Classification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining Techniques
inventionjournals
 
IRJET- Study and Evaluation of Classification Algorithms in Data Mining
IRJET- Study and Evaluation of Classification Algorithms in Data Mining
IRJET Journal
 
Data mining techniques using weka
Data mining techniques using weka
Prashant Menon
 
Ad

More from cscpconf (20)

ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR
ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR
cscpconf
 
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATION
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATION
cscpconf
 
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...
cscpconf
 
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIES
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIES
cscpconf
 
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGIC
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGIC
cscpconf
 
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
cscpconf
 
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
cscpconf
 
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTIC
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTIC
cscpconf
 
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAIN
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAIN
cscpconf
 
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...
cscpconf
 
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEM
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEM
cscpconf
 
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...
cscpconf
 
AUTOMATED PENETRATION TESTING: AN OVERVIEW
AUTOMATED PENETRATION TESTING: AN OVERVIEW
cscpconf
 
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORK
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORK
cscpconf
 
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
cscpconf
 
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATA
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATA
cscpconf
 
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCH
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCH
cscpconf
 
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...
cscpconf
 
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
cscpconf
 
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT
cscpconf
 
ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR
ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR
cscpconf
 
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATION
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATION
cscpconf
 
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...
cscpconf
 
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIES
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIES
cscpconf
 
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGIC
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGIC
cscpconf
 
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
cscpconf
 
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
cscpconf
 
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTIC
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTIC
cscpconf
 
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAIN
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAIN
cscpconf
 
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...
cscpconf
 
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEM
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEM
cscpconf
 
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...
cscpconf
 
AUTOMATED PENETRATION TESTING: AN OVERVIEW
AUTOMATED PENETRATION TESTING: AN OVERVIEW
cscpconf
 
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORK
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORK
cscpconf
 
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
cscpconf
 
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATA
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATA
cscpconf
 
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCH
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCH
cscpconf
 
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...
cscpconf
 
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
cscpconf
 
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT
cscpconf
 
Ad

Recently uploaded (20)

This is why students from these 44 institutions have not received National Se...
This is why students from these 44 institutions have not received National Se...
Kweku Zurek
 
Pests of Maize: An comprehensive overview.pptx
Pests of Maize: An comprehensive overview.pptx
Arshad Shaikh
 
VCE Literature Section A Exam Response Guide
VCE Literature Section A Exam Response Guide
jpinnuck
 
M&A5 Q1 1 differentiate evolving early Philippine conventional and contempora...
M&A5 Q1 1 differentiate evolving early Philippine conventional and contempora...
ErlizaRosete
 
Values Education 10 Quarter 1 Module .pptx
Values Education 10 Quarter 1 Module .pptx
JBPafin
 
ENGLISH-5 Q1 Lesson 1.pptx - Story Elements
ENGLISH-5 Q1 Lesson 1.pptx - Story Elements
Mayvel Nadal
 
English 3 Quarter 1_LEwithLAS_Week 1.pdf
English 3 Quarter 1_LEwithLAS_Week 1.pdf
DeAsisAlyanajaneH
 
A Visual Introduction to the Prophet Jeremiah
A Visual Introduction to the Prophet Jeremiah
Steve Thomason
 
GREAT QUIZ EXCHANGE 2025 - GENERAL QUIZ.pptx
GREAT QUIZ EXCHANGE 2025 - GENERAL QUIZ.pptx
Ronisha Das
 
OBSESSIVE COMPULSIVE DISORDER.pptx IN 5TH SEMESTER B.SC NURSING, 2ND YEAR GNM...
OBSESSIVE COMPULSIVE DISORDER.pptx IN 5TH SEMESTER B.SC NURSING, 2ND YEAR GNM...
parmarjuli1412
 
Paper 107 | From Watchdog to Lapdog: Ishiguro’s Fiction and the Rise of “Godi...
Paper 107 | From Watchdog to Lapdog: Ishiguro’s Fiction and the Rise of “Godi...
Rajdeep Bavaliya
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 6-14-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 6-14-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
Q1_ENGLISH_PPT_WEEK 1 power point grade 3 Quarter 1 week 1
Q1_ENGLISH_PPT_WEEK 1 power point grade 3 Quarter 1 week 1
jutaydeonne
 
Photo chemistry Power Point Presentation
Photo chemistry Power Point Presentation
mprpgcwa2024
 
Code Profiling in Odoo 18 - Odoo 18 Slides
Code Profiling in Odoo 18 - Odoo 18 Slides
Celine George
 
Peer Teaching Observations During School Internship
Peer Teaching Observations During School Internship
AjayaMohanty7
 
University of Ghana Cracks Down on Misconduct: Over 100 Students Sanctioned
University of Ghana Cracks Down on Misconduct: Over 100 Students Sanctioned
Kweku Zurek
 
Aprendendo Arquitetura Framework Salesforce - Dia 02
Aprendendo Arquitetura Framework Salesforce - Dia 02
Mauricio Alexandre Silva
 
Romanticism in Love and Sacrifice An Analysis of Oscar Wilde’s The Nightingal...
Romanticism in Love and Sacrifice An Analysis of Oscar Wilde’s The Nightingal...
KaryanaTantri21
 
LDMMIA Yoga S10 Free Workshop Grad Level
LDMMIA Yoga S10 Free Workshop Grad Level
LDM & Mia eStudios
 
This is why students from these 44 institutions have not received National Se...
This is why students from these 44 institutions have not received National Se...
Kweku Zurek
 
Pests of Maize: An comprehensive overview.pptx
Pests of Maize: An comprehensive overview.pptx
Arshad Shaikh
 
VCE Literature Section A Exam Response Guide
VCE Literature Section A Exam Response Guide
jpinnuck
 
M&A5 Q1 1 differentiate evolving early Philippine conventional and contempora...
M&A5 Q1 1 differentiate evolving early Philippine conventional and contempora...
ErlizaRosete
 
Values Education 10 Quarter 1 Module .pptx
Values Education 10 Quarter 1 Module .pptx
JBPafin
 
ENGLISH-5 Q1 Lesson 1.pptx - Story Elements
ENGLISH-5 Q1 Lesson 1.pptx - Story Elements
Mayvel Nadal
 
English 3 Quarter 1_LEwithLAS_Week 1.pdf
English 3 Quarter 1_LEwithLAS_Week 1.pdf
DeAsisAlyanajaneH
 
A Visual Introduction to the Prophet Jeremiah
A Visual Introduction to the Prophet Jeremiah
Steve Thomason
 
GREAT QUIZ EXCHANGE 2025 - GENERAL QUIZ.pptx
GREAT QUIZ EXCHANGE 2025 - GENERAL QUIZ.pptx
Ronisha Das
 
OBSESSIVE COMPULSIVE DISORDER.pptx IN 5TH SEMESTER B.SC NURSING, 2ND YEAR GNM...
OBSESSIVE COMPULSIVE DISORDER.pptx IN 5TH SEMESTER B.SC NURSING, 2ND YEAR GNM...
parmarjuli1412
 
Paper 107 | From Watchdog to Lapdog: Ishiguro’s Fiction and the Rise of “Godi...
Paper 107 | From Watchdog to Lapdog: Ishiguro’s Fiction and the Rise of “Godi...
Rajdeep Bavaliya
 
Q1_ENGLISH_PPT_WEEK 1 power point grade 3 Quarter 1 week 1
Q1_ENGLISH_PPT_WEEK 1 power point grade 3 Quarter 1 week 1
jutaydeonne
 
Photo chemistry Power Point Presentation
Photo chemistry Power Point Presentation
mprpgcwa2024
 
Code Profiling in Odoo 18 - Odoo 18 Slides
Code Profiling in Odoo 18 - Odoo 18 Slides
Celine George
 
Peer Teaching Observations During School Internship
Peer Teaching Observations During School Internship
AjayaMohanty7
 
University of Ghana Cracks Down on Misconduct: Over 100 Students Sanctioned
University of Ghana Cracks Down on Misconduct: Over 100 Students Sanctioned
Kweku Zurek
 
Aprendendo Arquitetura Framework Salesforce - Dia 02
Aprendendo Arquitetura Framework Salesforce - Dia 02
Mauricio Alexandre Silva
 
Romanticism in Love and Sacrifice An Analysis of Oscar Wilde’s The Nightingal...
Romanticism in Love and Sacrifice An Analysis of Oscar Wilde’s The Nightingal...
KaryanaTantri21
 
LDMMIA Yoga S10 Free Workshop Grad Level
LDMMIA Yoga S10 Free Workshop Grad Level
LDM & Mia eStudios
 

Analysis of Bayes, Neural Network and Tree Classifier of Classification Technique in Data Mining using WEKA

  • 1. David C. Wyld, et al. (Eds): CCSEA, SEA, CLOUD, DKMP, CS & IT 05, pp. 359–369, 2012. © CS & IT-CSCP 2012 DOI : 10.5121/csit.2012.2236 Analysis of Bayes, Neural Network and Tree Classifier of Classification Technique in Data Mining using WEKA Yugal kumar1 and G. Sahoo2 1 Assistant Professor in CSE/IT Dept., Hindu College of Engineering, Industrial Area, Sonepat, Haryana, India. [email protected] 2 Professor in Dept. of Information Technology, Birla Institute of Technology, Mesra, Ranchi, Jhrakhand, India. [email protected] ABSTRACT In today’s world, gigantic amount of data is available in science, industry, business and many other areas. This data can provide valuable information which can be used by management for making important decisions. But problem is that how can find valuable information. The answer is data mining. Data Mining is popular topic among researchers. There is lot of work that cannot be explored till now. But, this paper focuses on the fundamental concept of the Data mining i.e. Classification Techniques. In this paper BayesNet, NavieBayes, NavieBayes Uptable, Multilayer perceptron, Voted perceptron and J48 classifiers are used for the classification of data set. The performance of these classifiers analyzed with the help of Mean Absolute Error, Root Mean-Squared Error and Time Taken to build the model and the result can be shown statistical as well as graphically. For this purpose the WEKA data mining tool is used. KEY TERM’S BayesNet, J48, Mean Absolute Error, NavieBayes, Root Mean-Squared Error 1. INTRODUCTION In recent years, there is the incremental growth in the electronic data management methods. Each companies whether it is large, medium or small, having its own database system that are used for collecting and managing the information, these information are used in the decision process. Database of any firm consist the thousands of the instance and hundreds of attributes. So, it is quite difficult to process the data and retrieving meaning full information from the data set in short span of time. The same problem is faced by researchers and scientists how to process the large data set for further research. To overcome this problem the term data mining come into existence. Data mining refers to the process of retrieving information from large sets of data. A number of algorithms and tools have been developed and implemented to retrieve information and discover knowledge patterns that may be useful for decision support [2]. The term Data Mining, also known as Knowledge Discovery in Databases (KDD) refers to the nontrivial extraction of implicit, previously unknown and potentially useful information from data in databases [1]. Several data mining techniques are pattern recognition, clustering, association and classification [4]. Classification has been identified as an important problem in the emerging field of data mining [3] as they try to find meaningful ways to interpret data sets. Some ethical
  • 2. 360 Computer Science & Information Technology (CS & IT) issue also related with Data mining for example process a data set that are belongs to racial, sexual, religious may occur some discernment. 2. CLASSIFICATION Classification of data is very typical task in data mining. There are large number of classifiers that are used to classify the data such as bayes, function, rule based and Tree etc. The goal of classification is to correctly predict the value of a designated discrete class variable, given a vector of predictors or attributes [5]. 2.1. BayseNet BayesNet based on the bayes theorm. So, in BayesNet conditional probability on each node is calculated and formed a Bayesian Network. Bayesian Network is a directed acyclic graph. In BayesNet, it is assume that all attributes are nominal and there are no missing values any such value replaced globally. Different types of algorithms are used to estimate conditional probability such as Hill Climbing, Tabu Search, Simulated Annealing, Genetic Algorithm and K2. The output of the BayesNet can be visualized in terms of graph. Figure 1 shows the visualized graph of the BayesNet for a bank data set [9]. Visualize graph is formed by using the children attribute of the bank data set. In this graph, each node represents the probability distribution table within it. Fig. 1 Visualize Graph of the BayesNet for a bank data set 2.2. NaiveBayes NaiveBayes is widely used for the classification due to its simplicity, elegance, and robustness. NavieBayes can be characterized as Navie and Bayes. Navie stands for independence i.e. true to multiply probabilities when the events are independent and Bayes is used for the bayes rule. This technique assumes that attributes of a class are independent in real life. The performance of the NavieBayes is better when the data set is actual. Kernel density estimators can be used to measure the probability in NavieBayes that improve the performance of the model. A large number of modifications have been introduced, by the statistical, data mining, machine learning, and pattern recognition communities, in an attempt to make it more flexible, but one has to recognize that such modifications are necessarily complications, which detract from its basic simplicity. 2.3. Navie Bayes Updatable This is the updateable version of NaiveBayes. This classifier will use a default precision of 0.1 for numeric attributes when buildClassifier is called with zero training instances and also known as incremental update. 2.4. Multi Layer Percpeptron Multi Layer Perceptron can be defined as Neural Network and Artificial intelligence without qualification. A Multi Layer perceptron (MLP) is a feedforward neural network with one or more
  • 3. Computer Science & Information Technology (CS & IT) 361 layers between input and output layer. The following diagram illustrates a perceptron network with three layers: Each neuron in each layer is connected to every neuron in the adjacent layers. The training or testing vectors are presented to the input layer, and processed by the hidden and output layers. A Detailed analysis of multi-layer perceptrons has been presented by Hassoun [11] and by Żak[10]. 2.5. Voted Perceptron Voted Perceptron (VP) proposed by Collins can be viewed as a simplified version of CRF[1] and suggests that the voted perceptron is preferable in cases of noisy or un separable data[3]. Voted perceptron approaches to small sample analysis and taking advantage of the boundary data of largest margin. Voted perceptron method is based on the perceptron algorithm of Rosenblatt and Frank [2]. 2.6. J48 J48 are the improved versions of C4.5 algorithms or can be called as optimized implementation of the C4.5. The output of J48 is the Decision tree. A Decision tree is similar to the tree structure having root node, intermediate nodes and leaf node. Each node in the tree consist a decision and that decision leads to our result. Decision tree divide the input space of a data set into mutually exclusive areas, each area having a label, a value or an action to describe its data points. Splitting criterion is used to calculate which attribute is the best to split that portion tree of the training data that reaches a particular node. Fig. 2 shows the decision tree using J48 for a bank data set whether a bank provide loan to a person or not. Decision tree is formed by using the children attribute of the bank data set.
  • 4. 362 Computer Science & Information Technology (CS & IT) Fig. 2 Decision Tree using J48 for Bank Data Set 3. TOOL The WEKA toolkit is used to analyze the dataset with the data mining algorithms [7]. WEKA is an assembly of tools of data classification, regression, clustering, association rules and visualization. The toolkit is developed in Java and is open source software issued under the GNU General public License [8]. The WEKA tool incorporates the four applications within it. • Weka Explorer • Weka Experiment • Weka Knowledge Flow • Simple CLI For the Classification of Data set, weka explorer is used to generate the result or statistics. Weka Explorer incorporates the following features within it:-
  • 5. Computer Science & Information Technology (CS & IT) 363 Fig. 3 Pre process of data using weka • Preprocess: It is used to process the input data. For this purpose the filters are used that can transform the data from one form to another form. Basically two types of filters are used i.e. supervised and unsupervised. • Classify. Classify tab are used for the classification purpose. A large number of classifiers are used in weka such as bayes, function, rule, tree and meta etc. Four type of test option are mentioned within it. • Cluster: It is used for the clustering of the data. • Associate: Establish the association rules for the data. • Select attributes: It is used to select the most relevant attributes in the data. • Visualize: View an interactive 2D plot of the data. Data set used in Weka is in Attribute-Relation File Format (ARFF) file format that consist of special tags to indicate different things in the dataset such as attribute names, attribute types, attribute values and the data. This paper includes the two data sets such as sick.arff and breast- cancer-wisconsin. Sick.arff data set has been taken from the weka tool website while the brest cancer data set has been taken from the UCI repository i.e. real time multivariate data set [7, 9]. Brest cancer data set is in the form of text file. Firstly it converts into the .xls format; .xls format to .csv format and then .csv format convert into the .arff format. The .arff format of both data sets given as:-
  • 6. 364 Computer Science & Information Technology (CS & IT) Sick.arff Data Set: @relation sick.nm @attribute age real @attribute sex {M,F} @attribute on_thyroxine {f,t} @attribute query_on_thyroxine {f,t} @attribute on_antithyroid_medication {f,t} @attribute sick {f,t} @attribute pregnant {f,t} @attribute thyroid_surgery {f,t} @attribute I131_treatment {f,t} @attribute query_hypothyroid {f,t} @attribute query_hyperthyroid {f,t} @attribute lithium {f,t} @attribute goitre {f,t} @attribute tumor {f,t} @attribute hypopituitary {f,t} @attribute psych {f,t} @attribute TSHmeasured {f,t} @attribute TSH real @attribute T3measured {f,t} @attribute T3 real @attribute TT4measured {f,t} @attribute TT4 real @attribute T4Umeasured {f,t} @attribute T4U real @attribute FTImeasured {f,t} @attribute FTI real @attribute TBGmeasured {f,t} @attribute TBG real @attribute referral_source {WEST,STMW,SVHC,SVI,SVHD,other} @attribute class {sick,negative} @data Breast-cancer-wisconsin_data,arff Data Set: @relation breast-cancer @attribute age {'10-19','20-29','30-39','40-49','50-59','60-69','70-79','80-89','90-99'} @attribute menopause {'lt40','ge40','premeno'} @attribute tumor-size {'0-4','5-9','10-14','15-19','20-24','25-29','30-34','35-39','40-44','45-49','50- 54','55-59'} @attribute inv-nodes {'0-2','3-5','6-8','9-11','12-14','15-17','18-20','21-23','24-26','27-29','30- 32','33-35','36-39'} @attribute node-caps {'yes','no'} @attribute deg-malig {'1','2','3'} @attribute breast {'left','right'} @attribute breast-quad {'left_up','left_low','right_up','right_low','central'} @attribute 'irradiat' {'yes','no'} @attribute 'Class' {'no-recurrence-events','recurrence-events'} @data
  • 7. Computer Science & Information Technology (CS & IT) 365 4. RESULT & DISCUSION In this paper, the following parameters are used to evaluate the performance of above mentioned classification techniques: • Mean Absolute Error (MAE): It can define as statistical measure of how far an estimate from actual values i.e. the average of the absolute magnitude of the individual errors. It is usually similar in magnitude but slightly smaller than the root mean squared error. • Root Mean-Squared Error (RMSE): The root mean square error (RMSE)) calculates the differences between values predicted by a model / an estimator and the values actually observed from the thing being modeled/ estimated. RMSE is used to measure the accuracy. It is ideal if it is small. • Time: The amount of time required to build the model. Table 1 Comparison of the different classifiers
  • 8. 366 Computer Science & Information Technology (CS & IT) 97.14 97.28 92.57 97.82 93.64 99.67 72.02 71.67 71.67 64.68 71.32 75.52 0 20 40 60 80 100 120 correctly Classfied Instance 2800 correctly Classfied Instance 286 Fig. 4 Comparison of Correctly Classified Parameter of Datasets 0.2 0.13 0.03 110.94 0.77 0.3 0.13 0.02 0 8.91 0.03 0.02 0 20 40 60 80 100 120 Time Taken for 2800 Instance Time Taken for 286 Instance Fig. 5 Comparison of Time Taken Parameter of Datasets 0.047 0.045 0.088 0.026 0.063 0.006 0.329 0.327 0.327 0.355 0.284 0.367 0 0.1 0.2 0.3 0.4 Mean Absolute Error for 2800 Instance Mean Absolute Error for 286 Instance Fig. 6 Comparison of Mean Absolute Error Parameter
  • 9. Computer Science & Information Technology (CS & IT) 367 0.16 0.15 0.22 0.13 0.25 0.05 0.45 0.45 0.45 0.54 0.53 0.43 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Root Mean Squared Error for 2800 Instance Root Mean Squared Error for 286 Instance Fig. 7 Comparison of Root Mean Squared Error Parameter Table 1 shows the comparison of the BayesNet, NavieBayes NavieBayes Uptable, Multilayer perceptron, Voted perceptron and J48. For the analysis of discussed classifiers the two data sets has been used in which breast cancer data set has 286 instance and 10 attributes while the sick data set has 2800 instance and 30 attributes. From the table 1, it is clear that the time taken by the NavieBayes uptable classifiers to build the model is smallest for both of data set i.e. 0.03s and 0.0s whereas the time taken by the multilayer perceptron is the largest. So, in terms of time taken the NavieBayes uptable classifier is the best among these. But the analysis of another two parameter i.e. MAE and RMSE, the model formed by J48 classifier is better. J48 classfier classified the instance more correctly as compare to BayesNet and Navie Bayes. It is also seen that the performance of Naviebayes uptable and navie bayes classifiers almost same when the dataset is small. 5. CONCLUSION In this paper, six different classifiers are used for the classification of data. These techniques are applied on two dataset in which one of data set has one tenth of instance and one third attribute as compare to another data set. The fundamental concept to take two datasets is to analyze the performance of the discussed classifiers for small as well as large dataset. But, it cannot say easily which one is better. For example, mean absolute error of J48 is minimum for breast cancer data set (i.e. small data set) but not minimum for sick data set (i.e. large data set) for from the table 1, it says that the performance of the J48 classifier/technique is better as compare to another classifier/technique. 6. FUTURE WORK In weka, there are the large numbers of classifiers such as fuzzy rules, REP tree, Random tree, Gaussian Function, Regression and so on. So the future work will be based on these classifiers i.e. apply these classifiers on the data set and analyze the performance of these classifiers. In this paper, six parameters are used for the analysis the performance of the classifiers. In future, numbers of parameter will be increased such that better result will be obtained.
  • 10. 368 Computer Science & Information Technology (CS & IT) REFERENCES [1] J. Han and M. Kamber, (2000) “Data Mining: Concepts and Techniques,” Morgan Kaufmann. [2] Desouza, K.C. (2001) ,Artificial intelligence for healthcare management In Proceedings of the First International Conference on Management of Healthcare and Medical Technology Enschede, Netherlands Institute for Healthcare Technology Management. [3] Rakesh Agrawal,Tomasz Imielinski and Arun Swami, (1993)” Data mining : A Performance perspective“. IEEE Transactions on Knowledge and Data Engineering , 5(6):914-925. [4] Ritu Chauhan, Harleen Kaur, M.Afshar Alam, (2010) “Data Clustering Method for Discovering Clusters in Spatial Cancer Databases”, International Journal of Computer Applications (0975 – 8887) Volume 10– No.6. [5] Daniel Grossman and Pedro Domingos (2004). Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood. In Press of Proceedings of the 21st International Conference on Machine Learning, Banff, Canada. [6] Ridgeway G, Madigan D, Richardson T (1998) Interpretable boosted naive Bayes classification. In: Agrawal R, StolorzP, Piatetsky-Shapiro G (eds) Proceedings of the fourth international conference on knowledge discovery and data mining.. AAAI Press, Menlo Park pp 101–104. [7] Weka: Data Mining Software in Java http://www.cs.waikato.ac.nz/ml/weka/ [8] Ian H.Witten and Elbe Frank, (2005) "Datamining Practical Machine Learning Tools and Techniques," Second Edition, Morgan Kaufmann, San Fransisco. [9] www.ics.uci.edu/~mlearn [10] Zak S.H., (2003), “ Systems and Control” NY: Oxford Uniniversity Press. [11] Hassoun M.H, (1999), “ Fundamentals of Artificial Neural Networks”, Cambridge, MA: MIT press. [12] Yoav Freund, Robert E. Schapire, (1999) "Large Margin Classification Using the Perceptron Algorithm." In: Machine Learning, 37(3). [13] Yunhua Hu, Hang Li, Yunbo Cao, Li Teng, Dmitriy Meyerzon, Qinghua Zheng, (2006), ” Automatic extraction of titles from general documents using machine learning”, in Information Processing and Management(publisheb by elesvier) 42, 1276–1293. [14] Michael Collins and Nigel Duffy, (2002), “New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, pp. 263-270.
  • 11. Computer Science & Information Technology (CS & IT) 369 Authors Bibliography G. Sahoo received his MSc in Mathematics from Utkal University in the year 1980 and PhD in the Area of Computational Mathematics from Indian Institute of Technology, Kharagpur in the year 1987. He has been associated with Birla Institute of Technology, Mesra, Ranchi, India since 1988, and currently, he is working as a Professor and Head in the Department of Information Technology. His research interest includes theoretical computer science, parallel and distributed computing, cloud computing, evolutionary computing, information security, image processing and pattern recognition. Mr. Yugal Kumar received his B.Tech in Information Technology from Maharishi Dayanand University, Rohtak, (India) in 2006 & M.Tech in Computer Engineering from Maharishi Dayanand University, Rohtak, India in 2009. His research interests include fuzzy logic, computer network and Data Mining & Swarm Intelligence system. At present, he has been worked as working as an Assistant Professor in Department of Computer Science and Engineering, Hindu College of Engineering, Sonepat, Haryana, India.