Analysis of Imbalanced Classification Algorithms A Perspective View

International Journal of Trend in Scientific Research and Development (IJTSRD)
Volume: 3 | Issue: 2 | Jan-Feb 2019 Available Online: www.ijtsrd.com e-ISSN: 2456 - 6470
@ IJTSRD | Unique Reference Paper ID – IJTSRD21574 | Volume – 3 | Issue – 2 | Jan-Feb 2019 Page: 974
Analysis of Imbalanced Classification Algorithms:
A Perspective View
Priyanka Singh1, Prof. Avinash Sharma2
1PG Scholar, 2Assistant Professor
1,2Department of CSE, MITS, Bhopal, Madhya Pradesh, India
ABSTRACT
Classification of data has become an important research area. The process of classifying documentsintopredefined categories
Unbalanced data set, a problem often found in real world application, can cause seriously negative effect on classification
performance of machine learning algorithms. There have been many attemptsatdealingwith classificationofunbalanceddata
sets. In this paper we present a brief review of existing solutions to the class-imbalance problemproposed bothatthedata and
algorithmic levels. Even though a common practice to handle the problem of imbalanced data is to rebalance them artificially
by oversampling and/or under-sampling, some researchers proved that modified support vector machine, rough set based
minority class oriented rule learning methods, cost sensitive classifier perform goodonimbalanceddataset. Weobserved that
current research in imbalance data problem is moving to hybrid algorithms.
Keywords: cost-sensitive learning, imbalanced data set, modified SVM, oversampling, undersampling
I. INTRODUCTION
A data set is called imbalanced if it contains many more
samples from one class than from the rest of theclasses.Data
sets are unbalanced when atleast one class is representedby
only a small number of trainingexamples(calledtheminority
class) while other classes make up the majority. In this
scenario, classifiers can have good accuracy on the majority
class but very poor accuracy on the minority class(es) due to
the influence that the larger majority class has on traditional
training criteria. Most original classification algorithms
pursue to minimize the error rate: the percentage of the
incorrect prediction ofclasslabels.Theyignorethedifference
between types of misclassification errors. In particular, they
implicitly assume that all misclassification errors cost
equally.
In many real-world applications, this assumption is not true.
Thedifferencesbetweendifferentmisclassificationerrorscan
be quite large. For example, in medical diagnosis of a certain
cancer, if the cancer is regarded as the positive class, and
non-cancer (healthy) as negative, then missing a cancer (the
patientis actually positive but is classified asnegative;thusit
is also called ―false negative‖) is much more serious (thus
expensive) than the false-positive error. The patient could
lose his/her life because of the delay in the correct diagnosis
and treatment. Similarly, if carryingabombispositive,thenit
is much more expensive to miss a terrorist who carries a
bomb to a flight than searching an innocent person.
The unbalanced data set problem appearsinmanyrealworld
applications like text categorization, fault detection, fraud
detection, oil-spills detection in satellite images, toxicology,
cultural modeling, medical diagnosis.[1] Many research
papers on imbalanced data sets have commonly agreed that
because of this unequal classdistribution,theperformanceof
the existing classifiers tends to be biased towards the
majority class. The reasons for poor performance of the
existing classificationalgorithmsonimbalanceddatasetsare:
1. They are accuracy driven i.e.,theirgoalistominimizethe
overall error to whichtheminorityclasscontributesvery
little.
2. They assume that there is equal distribution of data for
all the classes.
3. They also assume that the errors coming from different
classes have the same cost[2].
With unbalanced data sets, data mining learning algorithms
produce degenerated models that do not take into account
the minority class as most data mining algorithms assume
balanced data set.
A number of solutions to the class-imbalance problem were
previously proposed both at the data and algorithmic levels
[3]. At the data level, these solutions include many different
forms of re-sampling such as random oversampling with
replacement, randomundersampling,directedoversampling
(in which no new examples are created, but the choice of
samples to replace is informed ratherthanrandom),directed
undersampling (where, again, the choice of examples to
eliminate is informed), oversampling with informed
generation of new samples, and combinations of the above
techniques. At the algorithmic level, solutions include
adjusting the costs of the various classes so as to counter the
class imbalance, adjusting the probabilistic estimate at the
tree leaf (when working with decision trees), adjusting the
decision threshold, and recognition-based(i.e.,learningfrom
one class) rather than discrimination-based (two class)
learning. The most common techniques to deal with
unbalanced data include resizing training datasets, cost-
sensitive classifier, and snowball method. Recently, several
methods have been proposed with good performance on
unbalanced data. These approachesincludemodifiedSVMs,k
nearest neighbor (kNN), neural networks, genetic
programming, rough set based algorithms, probabilistic
decision tree and learning methods. The next sections focus
on some of the method in detail.
II. SAMPLING METHODS
An easy Datalevel methodsfor balancing theclassesconsists
of re-sampling the original data set, either by oversampling
the minority class or by under-sampling the majority class,

International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
until the classes are approximatelyequallyrepresented.Both
strategies can be applied in any learning system, since they
act as a preprocessingphase, allowingthe learningsystemto
receive the training instances as if they belonged to a well-
balanced data set. Thus, any bias of the system towards the
majority class due tothedifferentproportionofexamplesper
class would be expected to be suppressed.
Hulse et al. [4] suggest that the utility of the re-sampling
methods depends on a number of factors, including the ratio
between positive and negative examples, other
characteristics of data, and the nature of the classifier.
However, re-sampling methods have shown important
drawbacks. Under-samplingmaythrowoutpotentiallyuseful
data, while over-sampling artificially increases thesizeofthe
data set and consequently, worsens the computational
burden of the learning algorithm.
A. Oversampling
The simplestmethod toincrease thesizeoftheminorityclass
corresponds to random over-sampling, that is, a non-
heuristicmethod thatbalances theclassdistributionthrough
the random replication of positive examples. Nevertheless,
sincethis methodreplicatesexistingexamplesintheminority
class, over fitting is more likely to occur.
Chawla proposed Synthetic Minority Over-sampling
Technique (SMOTE) [5] an over-samplingapproachinwhich
the minority class is over-sampled by creating synthetically
examples rather than by over-sampling with replacement.
The minority class is over-sampled by taking each minority
class sample and introducing synthetic examples along the
line segments joining any/all of the k minority class nearest
neighbors. Depending upon the amount of over-sampling
required, neighbors from the k nearest neighbors are
randomlychosen.FromtheoriginalSMOTEalgorithm,several
modifications have been proposed in the literature. While
SMOTE approach does not handle data sets with all nominal
features, it was generalized to handle mixed datasets of
continuousandnominalfeatures.ChawlaproposeSMOTE-NC
(Synthetic Minority Over-sampling Technique Nominal
Continuous) and SMOTE-N (Synthetic Minority Over-
sampling Technique Nominal), the SMOTE can also be
extended for nominal features.
Andrew Estabrooks et al. proposed a multiple re- sampling
method which selected the most appropriate re-sampling
rate adaptively [6]. Taeho Jo et al. put forward a cluster-
based over-samplingmethodwhichdealtwithbetween-class
imbalance and within-class imbalance simultaneously [7].
Hongyu Guo et al. found out hard examples of the majority
and minority classes dur-ing the process of boosting, then
generated new synthetic examples from hard examples and
add them to the data sets [8].Based on SMOTE method, Hui
Han and Wen-Yuan Wang [9] presented two new minority
over-samplingmethods,borderline-SMOTE1andborderline-
SMOTE2, in which only the minority examples near the
borderline are over- sampled. These approaches achieve
better TP rate and F- value than SMOTE and random over-
sampling methods.
B. Undersampling
Under-samplingisanefficientmethodforclassing-imbalance
learning. This method uses a subset of the majority class to
train the classifier. Since many majority class examples are
ignored, the training set becomes more balanced and the
training process becomes faster. The most common
preprocessing technique israndommajorityunder-sampling
(RUS), IN RUS, Instances of the majority class are randomly
discarded from the dataset.
However, the main drawback of under-sampling is that
potentially useful information contained in these ignored
examples is neglected. There many ways attempts to
improve upon the performance of random sampling, such as
Tomek links, Condensed Nearest Neighbor Rule and One-
sided selection etc. one-sided selection (OSS) is proposed
by Rule Kubat and Matwin attempts to intelligently under-
sample the majority class by removing majority class
examples that are considered either redundant or noisy.‘
Over-sampling is a method for improve minority class
recognition, randomly duplicate the minority data not only
without increase any category of a small number of new
information, but also will lead to over-fitting.
For some problems like fraud detection which is highly
overlapped unbalanced data classification problem, where
non-fraud samples heavily outnumber the fraud samples,T.
Maruthi Padmaja[10]proposedhybridsamplingtechnique, a
combination of SMOTE to over-sample the minority data
(fraud samples) and random undersampling to under-
sample the majoritydata(non-fraudsamples)ifweeliminate
extreme outliers from the minority samples for highly
skewed imbalanced data sets like fraud detection
classification accuracy can be improved.
Sampling methods consider the class skew and properties of
the dataset as a whole. However, machine learning and data
mining often face nontrivial datasets, which often exhibit
characteristics and properties at a local, rather than global
level. It is noted that a classifier improved through global
sampling levels may be insensitive to the peculiarities of
different components or modalities in the data, resulting in a
suboptimal performance. David A. Cieslak, Nitesh V.
Chawla[11] has suggested that for improving classifier
performance sampling can be treated locally, instead of
applying uniform levels of sampling globally. They proposed
a framework which first identifiesmeaningfulregionsofdata
and then proceeds to find optimal sampling levels within
each.
There are known disadvantages associated with the use of
sampling to implement cost-sensitive learning. The
disadvantage with undersampling is that it discards
potentially useful data. The main disadvantage with
oversampling, from our perspective, is that by making exact
copies of existing examples, it makes over fitting likely. In
fact, with oversampling it is quite common for a learner to
generate a classification rule to cover a single, replicated,
example. A second disadvantage of oversampling is that it
increases the number of training examples, thus increasing
the learning time.
Given the disadvantages with sampling, still sampling is a
popular way to deal with imbalanced data rather than a cost-
sensitive learning algorithm. There are several reasons for
this. The most obvious reason is there are not cost- sensitive
implementations of all learning algorithms and therefore a
wrapper-based approach using sampling is the only option.
While this is certainly less true today than in the past, many
learning algorithms (e.g., C4.5) still do not directly handle
costs in the learning process. A second reason for using

sampling is thatmany highly skewed data sets are enormous
and the size of the training set must be reduced in order for
learning to be feasible.
In this case, undersampling seems to be a reasonable, and
valid, strategy. if one needs to discard some training data, it
still might be beneficial to discard some of the majority class
examples in order to reduce the training set size to the
required size, and then alsoemploy a cost- sensitive learning
algorithm, so that the amount of discarded training data is
minimized. A final reason that may have contributed to the
use of sampling rather than a cost-sensitive learning
algorithm is that misclassification costs are often unknown.
However, this is not a valid reason for using sampling over a
cost-sensitive learning algorithm, since the analogous issue
arises with sampling—what should the class distribution of
the final training data be? If this cost information is not
known, a measure such as the area under the ROC curve
could be used to measure classifier performance and both
approaches could then empiricallydeterminethepropercost
ratio/class distribution [12].
III. COST-SENSITIVE LEARNING
At the algorithmic level, solutions include adjusting the costs
of the various classes so as to counter the class imbalance,
adjusting the probabilistic estimate at the tree leaf (when
working with decision trees), adjusting the decision
threshold, and recognition-based (i.e., learning from one
class) rather than discrimination-based (two class) learning.
Cost-Sensitive Learning is a type of learning in data mining
that takes the misclassification costs (and possibly other
types of cost) into consideration. There are many ways to
implement cost sensitive learning, in [13], it is categorized
into three, the first class of techniquesapplymisclassification
costs to the data set as a form of data space weighting, the
second class applies cost-minimizing techniques to the
combination schemesofensemblemethods,andthelastclass
of techniques incorporates cost sensitive features directly
into classification paradigms to essentially fit the cost
sensitive framework into these classifiers.
Incorporating costintodecision tree classification algorithm
which is one of the most widely used and simple classifier.
Cost can be incorporated into it in various ways. First way is
cost can be applied to adjust the decision threshold, second
way is cost can be used in splitting attribute selection during
decision treeconstruction and theother way is costsensitive
pruningschemes can be applied to the tree. Ref.[14]propose
a method for building and testing decision trees that
minimizes total sum of the misclassification and test costs.
The algorithm used by them chooses an splitting attribute
that minimizes the total cost, the sum of the test cost and the
misclassification cost rather than choosing an attribute that
minimizes the entropy. Information gain, Gini measures are
considered to be skew sensitive [15]. In Ref. [16] a new
decision tree algorithm called Class Confidence Proportion
Decision Tree (CCPDT) is proposed which is robust and
insensitive to size of classes and generates rules which are
statistically significant. Ref. [17] analytically and empirically
demonstrates the strong skew insensitivity of Hellinger
Distance and its advantages over popularalternativemetrics.
They arrived at a conclusion that for imbalanced data it is
sufficient to use Hellinger trees with bagging without any
sampling methods. Ref. [18] uses different operators of
Genetic algorithms for oversampling to enlarge the ratio of
positive samples and then apply clustering to the
oversampled training data set as adata clearning method for
both classes, removing the redundant or noisysamples.They
used AUC as evaluation metricand foundthattheiralgorithm
performed better.
Nguyen ha vo, Yonggwan won[19] extended Regularized
Least Square(RLS) algorithm that penalizes errors of
different samples with different weights and some rules of
thumb to determine those weights. The significantly better
classification accuracy of weighted RLS classifiers showed
that it is promising substitution of other previous cost-
sensitiveclassification methodsfor unbalanceddataset.This
approach is equivalent to up- sampling or down-sampling
depending on the cost we choose. For example, doubling the
cost-sensitivity of one class is said to be equivalent to
doubling the number of samples in that class.
Ref[20] proposed a novel approach reducing each within
group error, BABoostthat is a variant of AdaBoost. Adaboost
algorithm gives equal weight to each misclassified example.
But the misclassification error of each class is not same.
Generally, the misclassificationerroroftheminorityclasswill
larger than themajority‘s. SoAdaboostalgorithm will lead to
higher bias and smaller margin when encountering skew
distribution. BABoost algorithm in each round of boosting
assigns more weights to the misclassified examples,
especially those in the minority class.
Yanmin Sun a and Mohamed S. Kamel[21] explored three
cost-sensitive boosting algorithms, which are developed by
introducing cost items into the learning framework of
AdaBoost. These boosting algorithms are also studied with
respect to their weighting strategies towards different types
of samples, and their effectiveness in identifying rare cases
through experiments on several real worldmedicaldatasets,
where the class imbalance problem prevails.
IV. SVM AND IMBALANCED DATASETS
The success of SVM is very limited when it is applied to the
problem of learning from imbalanced datasets in which
negative instances heavily outnumber the positiveinstances.
Even though undersamplingthe majorityclass doesimprove
SVM performance, there is an inherent loss of valuable
information in this process. Rehan Akbani[22]combined
sampling and cost sensitive learning for improving
performance of SVM. Their algorithm is based on a variantof
the SMOTE algorithm by Chawla et al, combined with
Veropoulos et al‘s different error costs algorithm.
TAO Xiao-yan[23] presented A modified proximal support
vector machine (MPSVM) which assigns different penalty
coefficients to the positive and negative samplesrespectively
by adding a new diagonal matrix in the primal optimization
problem. And further the decision function is obtained. The
real-coded immune clone algorithm (RICA) is employed to
select the global optimal parameters to get the high
generalization performance.
M. Muntean 1 and H. Vălean[24] provided the Enhancer, a
viable algorithm for improving the SVM classification of
unbalanced datasets. They improve the Cost-sensitive
classification for Support Vector Machines, by multiplying in
the training step the instances of the underrepresented
classes.

Yuchun Tang and nitesh chawla[25] also implemented and
rigorouslyevaluated four SVM modelingtechniquesSVM can
be effective if incorporate different ―rebalance‖heuristics in
SVM modeling, including cost-sensitive learning, and over
and under sampling.
Geneticprogramming(GP)canevolvebiasedclassifierswhen
data sets are unbalanced. The cost sensitive learning uses
cost adjustment within the learningalgorithmtofactorinthe
uneven distribution of class examples in the original
(unmodified) unbalanced data set, during the training
process. In GP, cost adjustment can be enforced by adapting
the fitness function. Here, solutions with good classification
accuracy on both classes are rewarded with better fitness,
while those that are biased toward one class only are
penalized with poor fitness.
Common techniques include using fixed misclassification
costs for minority and majority class examples [26], [27], or
improved performance criteria such as the area under the
receiver operating characteristic (ROC) curve (AUC) [28], in
the fitness function. While these techniques have
substantially improved minority class performances in
evolved classifiers, they can incur both a tradeoff in majority
class accuracy and, thus, a loss in overall classificationability,
and long training times duetothe computationaloverheadin
evaluating these improved fitness measures. In addition,
these approaches can be problem specific, i.e., ﬁtness
functions are handcrafted for a particular problem domain
only.
V. HYBRID ALGORITHMS
The EasyEnsemble classifierisanunder-samplingalgorithm,
which independently samples several subsets from negative
examples and one classifier is built for each subset. All
generated classifiers arethencombinedfor thefinaldecision
by using Adaboost. In imbalanced problems, some features
are redundant and even irrelevant; these features will hurt
thegeneralizationperformanceoflearningmachines.Feature
selection, a process of choosing a subset of features from the
original ones, isfrequentlyusedasapreprocessingtechnique
in analysis of data. It has been proved effective in reducing
dimensionality, improving mining efficiency, increasing
mining accuracy and enhancing result comprehensibility.
Ref[29] combined the feature selection method with Easy
Ensemble in order to improve the accuracy.
In ref[30] a hybrid algorithm based on random over-
sampling, decision tree (DT), particle swarm optimization
(PSO) and feature selection is proposed to classify
unbalanced data. The proposed algorithm has the ability to
select beneficial feature subsets, automatically adjust values
of parameter and obtain the bestclassification accuracy. The
zoo dataset is used to test the performance. From simulation
results, the classification accuracy ofthisproposedalgorithm
outperforms other existing methods
Decision trees,supplementedwithsamplingtechniques,have
proven to be an effectiveway to address the imbalanceddata
problem. Despite their effectiveness, however, sampling
methods add complexity and the need for parameter
selection. To bypass these difficulties a new decision tree
technique called Hellinger Distance Decision Trees (HDDT)
which uses Hellinger distance as the splitting criterion is
suggested in ref[17]. Theytook advantageofthestrongskew
insensitivity of Hellinger distance and its advantages over
popular alternatives such as entropy (gain ratio). For
imbalanced data it is sufficient to use Hellinger trees with
bagging without any sampling methods.
VI. CONCLUSION
This paper provides an overview of the classification of
imbalanced data sets. At data level, sampling is the most
common approach to deal with imbalanced data. Over-
sampling clearly appears as better than under-sampling for
local classifiers, whereas some under-sampling strategies
outperform over-sampling when employing classifiers with
global learning. Researchers proved that Hybrid sampling
techniques can perform better than just oversampling or
under sampling. At the algorithmic level, solutions include
adjusting the costs of the various classes so as to counter the
class imbalance, adjusting the probabilistic estimate at the
tree leaf (when working with decision trees), adjusting the
decision threshold, and recognition-based(i.e.,learningfrom
one class) rather than discrimination-based (two class)
learning. Solutions based on modified support vector
machine, rough set based minority class oriented rule
learning methods, cost sensitive classifier are also proposed
to deal with unbalanced data. There areof coursemanyother
worthwhile research possibilities that are not included here.
DevelopingClassifierswhicharerobustandskew-insensitive
or hybrid algorithms can be point of interest for the future
research in imbalanced dataset.
REFERENCE
[1] Miho Ohsaki, Peng Wang, Kenji Matsuda, Shigeru
Katagiri, Hideyuki Watanabe, and Anca Ralescu,
“Confusion-matrix-based Kernel Logistic Regression
for Imbalanced Data Classification”, IEEE Transactions
on Knowledge and Data Engineering, 2017.
[2] Alberto Fernández, Sara del Río, Nitesh V. Chawla,
Francisco Herrera, “An insight into imbalanced Big
Data classification: outcomes and challenges”,Springer
journal of bigdata, 2017.
[3] Vaibhav P. Vasani1, Rajendra D. Gawali, “Classification
and performance evaluation using data mining
algorithms”, International Journal of Innovative
Research in Science, Engineering and Technology,
2014.
[4] Kaile Su, Huijing Huang, Xindong Wu, Shichao Zhang,
“Rough Sets for FeatureSelectionand Classification:An
Overview with Applications”, International Journal of
Recent Technology and Engineering (IJRTE) ISSN:
2277-3878, Volume-3, Issue-5, November 2014.
[5] Senzhang Wang, Zhoujun Li, Wenhan Chao and
Qinghua Cao, “Applying Adaptive Over-sampling
Technique Based on Data Density and Cost-Sensitive
SVM to Imbalanced Learning”,IEEE World Congresson
Computational Intelligence June, 2012.
[6] Mikel Galar, Alberto Fernandez, Edurne Barrenechea,
Humberto Bustince and Francisco Herrera, “A Review
on Ensembles for the Class Imbalance Problem:
Bagging, Boosting, and Hybrid-Based Approaches”,
IEEE Transactions on Systems, Man and Cybernetics—
Part C: Applications and Reviews, Vol. 42, No. 4, July
2012.
[7] Nada M. A. Al Salami, “Mining High Speed Data
Streams”. UbiCC Journal, 2011.

[8] Dian Palupi Rini, Siti Mariyam Shamsuddin and Siti
Sophiyati, “Particle Swarm Optimization: Technique,
System and Challenges”, International Journal of
Computer Applications (0975 – 8887) Volume 14–
No.1, January 2011.
[9] Amit Saxena, Leeladhar Kumar Gavel, Madan Madhaw
Shrivas, “Online Streaming Feature Selection”, 27th
International Conference on Machine Learning, 2010.
[10] Yuchun Tang, Member, Yan-Qing Zhang, Nitesh V.
Chawla and Sven Krasser, “SVMs Modeling for Highly
Imbalanced Classification”, IEEE Transaction on
Systems, Man and Cybernetics,Vol.39, NO.1,Feb2009.
[11] Haibo He and Edwardo A. Garcia, “Learning from
Imbalanced Data”, IEEE Transactions on Knowledge
and Data Engineering, September 2009.
[12] Thair Nu Phyu, “Survey of Classification Techniques in
Data Mining”, International Multi Conference of
Engineers and Computer Scientists, IMECS 2009,
March, 2009.
[13] Haibo He, Yang Bai, Edwardo A. Garcia and Shutao Li,
“ADASYN: Adaptive Synthetic Sampling Approach for
Imbalanced Learning”, IEEE Transaction of Data
Mining, 2009.
[14] Swagatam Das, Ajith Abraham and Amit Konar,
“Particle Swarm Optimization and Differential
Evolution Algorithms: TechnicalAnalysis, Applications
and Hybridization Perspectives”, Springer journal on
knowledge engineering, 2008.
[15] “A logical framework for identifyingqualityknowledge
from different data sources”, International Conference
on Decision Support Systems, 2006.
[16] “Database classification for multi-database mining”,
International Conferenceon DecisionSupportSystems,
2005.
[17] Volker Roth, “Probabilistic Discriminative Kernel
Classifiers for Multi-class Problems”, Springer-Verlag
journal, 2001.
[18] R. Chen, K. Sivakumar and H. Kargupta “Collective
Mining of Bayesian Networks from Distributed
Heterogeneous Data”, Kluwer Academic Publishers,
2001.
[19] Shigeru Katagiri, Biing-Hwang Juang and Chin-HuiLee,
“Pattern Recognition Using a Family of Design
Algorithms Based Upon the Generalized Probabilistic
Descent Method”, IEEE Journal of Data Minig, 1998.
[20] I. Katakis, G. Tsoumakas, and I. Vlahavas. Tracking
recurring contexts using ensemble classifiers: an
application to email filtering. Knowledge and
Information Systems, Pp 371–391, 2010.
[21] J. Kolter and M. Maloof. Using additive expert
ensembles to cope with concept drift. In Proc. ICML,Pp
449–456, 2005.
[22] D. D. Lewis, Y. Yang, T. Rose, and F. Li. Rcv1: A new
benchmark collection for text categorization research.
Journal of Machine Learning Research, Pp 361–397,
2004.
[23] X. Li, P. S. Yu, B. Liu, and S.-K. Ng. Positive unlabeled
learning for data stream classification. In Proc.SDM,Pp
257–268, 2009.
[24] M. M. Masud, Q. Chen, J. Gao, L. Khan, J. Han, and B. M.
Thuraisingham. Classificationand novel classdetection
of data streams in a dynamic feature space. In Proc.
ECML PKDD, volume II, Pp 337–352, 2010.
[25] P. Zhang, X. Zhu, J. Tan, and L. Guo, “Classifier and
Cluster Ensembles for Mining Concept Drifting Data
Streams,” Proc. 10th Int’l Conf. Data Mining, 2010.
[26] X. Zhu, P. Zhang, X. Lin, and Y. Shi, “Active Learning
from Stream Data Using Optimal Weight Classifier
Ensemble,” IEEE Trans. Systems,Man, CyberneticsPart
B, vol. 40, no. 6, Pp 1607- 1621, Dec. 2010.
[27] Q. Zhang, J. Liu, and W. Wang, “Incremental Subspace
Clustering over Multiple Data Streams,” Proc. Seventh
Int’l Conf. Data Mining, 2007.
[28] Q. Zhang, J. Liu, and W. Wang, “Approximate Clustering
on Distributed Data Streams,” Proc. 24th Int’l Conf.
Data Eng., 2008.
[29] C. C. Aggarwal. On classification and segmentation of
massive audio data streams. Knowl. and Info. Sys., Pp
137–156, July 2009.
[30] C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A
framework for on-demand classification of evolving
data streams. IEEE Trans. Knowl. Data Eng, Pp 577–
589, 2006.
[31] A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R.
Gavald. New ensemble methods for evolving data
streams. In Proc. SIGKDD, Pp 139–148, 2009.
[32] S. Chen, H. Wang, S. Zhou, and P. Yu. Stop chasing
trends: Discovering highorder modelsin evolvingdata.
In Proc. ICDE, Pp 923–932, 2008.
[33] P. Zhang, X. Zhu, and L. Guo. Mining data streams with
labeled and unlabeled training examples. In Proc.
ICDM, Pp 627–636, 2009.
[34] O. R. Terrades, E. Valveny, and S. Tabbone, “Optimal
classifier fusion in a non-Bayesian probabilistic
framework,” IEEE Trans.Pattern Anal.Mach.Intell., vol.
31, no. 9, Pp 1630–1644, Sep. 2009.

Analysis of Imbalanced Classification Algorithms A Perspective View

More Related Content

What's hot (20)

Similar to Analysis of Imbalanced Classification Algorithms A Perspective View (20)

More from ijtsrd (20)

Recently uploaded (20)

Analysis of Imbalanced Classification Algorithms A Perspective View