SlideShare a Scribd company logo
2
Most read
5
Most read
15
Most read
MULTICLASS
CLASSIFICATION OF
IMBALANCED DATA
SAURABH WANI
27/04/2019
What are Imbalanced Datasets ?
Why do we need to work with
Imbalanced Datasets ?
Applications of Handling Imbalanced Data
Terrorist
Identification
Credit Card Fraud
Detection
Rare Disease
Identification
Anomaly Detection
The MYTH of Accuracy
Predicted Positive Predicted Negative
Actual Positive TP FN
Actual Negative FP TN
Confusion Matrix
𝐴𝑐𝑐 =
𝑇𝑃 + 𝑇𝑁
𝑇𝑃 + 𝐹𝑃 + 𝑇𝑁 + 𝐹𝑁
𝑃𝑟ecision =
𝑇𝑃
𝑇𝑃 + 𝐹𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
𝐹1 = 2 ∗
𝑃 ∗ 𝑅
𝑃 + 𝑅
Overfitting and Underfitting
Handling Imbalanced Data
Resampling Techniques Algorithmic Ensemble Techniques
1. Undersampling
2. Oversampling
3. Clustering based
resampling
4. Sample Synthesis
1. Bootstrap Aggregating
(Bagging)
2. Boosting
3. Ada-Boost
Resampling Techniques
Undersampling and Oversampling
Clustering based Undersampling
Cluster the majority class, to
create a smaller set – replace
actual data from majority class.
•Can cause accuracy loss for
negative cases
•Centroids could be randomized
Sample Synthesis
SMOTE ADASYN
SMOTE
(Synthetic Minority Oversampling Technique)
Algorithmic Ensemble Techniques
Bagging
(Bootstrap Aggregating)
Bootstrapping
The method of randomly
assigning k samples out of n
samples to a subset with
replacement
• Can be performed on both
Regression and Classification
• Increase Accuracy and reduce
Variance
• Reduce the problem of
Overfitting
The weak learners (Models) are
chosen in such a way that each
of them specializes in
predictions based on one of the
feature space.
Boosting
The term ‘Boosting’ refers to a
family of algorithms which
converts weak learner to strong
learners. Boosting is an ensemble
method for improving the model
predictions of any given learning
algorithm. The idea of boosting is
to train weak learners
sequentially, each trying to
correct its predecessor.
Multiclass classification of imbalanced data
Ada-Boost
(Adaptive Boosting)
It focuses on classification problems and
aims to convert a set of weak classifiers into
a strong one.
1. ‘Stumps’ are the weak learners
almost every time
2. Each sample has a weight
assigned to it (Equal in the
beginning)
3. Weights are reassigned based
on the accuracy of classification
of that specific sample
4. The higher the weight, the more
the focus will be on that sample
Gradient Boosting
1. Instead of adjusting weights
in each iteration Gradient
Boosting algorithms
minimizes the error term each
time
2. ‘Gradient Descent’ is used for
error correction
3. Highly efficient
4. Really slow
Thank You

More Related Content

PDF
Machine Learning Model Validation (Aijun Zhang 2024).pdf
PPTX
Learning from imbalanced data
PDF
Intro to Jupyter Notebooks
PPTX
Evaluating classification algorithms
DOCX
Online attendance management system
PDF
Bias and variance trade off
PPTX
Customer Churn Analysis and Prediction
PPTX
Algorithmic music generation
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Learning from imbalanced data
Intro to Jupyter Notebooks
Evaluating classification algorithms
Online attendance management system
Bias and variance trade off
Customer Churn Analysis and Prediction
Algorithmic music generation

What's hot (20)

PPTX
boosting algorithm
PDF
Handling Imbalanced Data: SMOTE vs. Random Undersampling
PPTX
Ensemble methods in machine learning
PPT
Support Vector Machines
PPTX
Ensemble methods
PPTX
Borderline Smote
PDF
Racing for unbalanced methods selection
PPTX
Lecture 6: Ensemble Methods
PPTX
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
PPTX
Ensemble learning
PDF
Class imbalance problem1
PDF
Dimensionality Reduction
PPTX
Random forest
PDF
Understanding Bagging and Boosting
PPTX
Boosting Approach to Solving Machine Learning Problems
PPTX
Data Augmentation
PDF
Boosting Algorithms Omar Odibat
PDF
Introduction to Recurrent Neural Network
PPT
K means Clustering Algorithm
boosting algorithm
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Ensemble methods in machine learning
Support Vector Machines
Ensemble methods
Borderline Smote
Racing for unbalanced methods selection
Lecture 6: Ensemble Methods
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
Ensemble learning
Class imbalance problem1
Dimensionality Reduction
Random forest
Understanding Bagging and Boosting
Boosting Approach to Solving Machine Learning Problems
Data Augmentation
Boosting Algorithms Omar Odibat
Introduction to Recurrent Neural Network
K means Clustering Algorithm
Ad

Similar to Multiclass classification of imbalanced data (20)

PDF
Supervised Learning Ensemble Techniques Machine Learning
PDF
BaggingBoosting.pdf
PPTX
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
PPTX
AIML UNIT 4.pptx. IT contains syllabus and full subject
PDF
Technique Presentation
PDF
Dealing with imbalanced data sets.pdf
PPTX
Ensemble Method (Bagging Boosting)
PPTX
Bagging - Boosting-and-Stacking-ensemble.pptx
PPTX
CST413 KTU S7 CSE Machine Learning Classification Assessment Confusion matrix...
PPTX
Gradient Boosted trees
PDF
dm1.pdf
PPTX
Ensemble Method.pptx
PPTX
Introduction to XGBoost Machine Learning Model.pptx
PPTX
Random Forest.pptx
PPTX
Lecture 5b Ensemble_Techniques_Presentation.pptx
PDF
Analysis of Imbalanced Classification Algorithms A Perspective View
PPTX
Ensemble hybrid learning technique
PPT
Ensemble Learning in Machine Learning.ppt
PPTX
Ensemble Models in machine learning.pptx
PDF
Ensemblelearning 181220105413
Supervised Learning Ensemble Techniques Machine Learning
BaggingBoosting.pdf
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
AIML UNIT 4.pptx. IT contains syllabus and full subject
Technique Presentation
Dealing with imbalanced data sets.pdf
Ensemble Method (Bagging Boosting)
Bagging - Boosting-and-Stacking-ensemble.pptx
CST413 KTU S7 CSE Machine Learning Classification Assessment Confusion matrix...
Gradient Boosted trees
dm1.pdf
Ensemble Method.pptx
Introduction to XGBoost Machine Learning Model.pptx
Random Forest.pptx
Lecture 5b Ensemble_Techniques_Presentation.pptx
Analysis of Imbalanced Classification Algorithms A Perspective View
Ensemble hybrid learning technique
Ensemble Learning in Machine Learning.ppt
Ensemble Models in machine learning.pptx
Ensemblelearning 181220105413
Ad

Recently uploaded (20)

PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
Business Analytics and business intelligence.pdf
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Introduction to Data Science and Data Analysis
PPTX
Leprosy and NLEP programme community medicine
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
Clinical guidelines as a resource for EBP(1).pdf
STUDY DESIGN details- Lt Col Maksud (21).pptx
Database Infoormation System (DBIS).pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction to Knowledge Engineering Part 1
SAP 2 completion done . PRESENTATION.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Mega Projects Data Mega Projects Data
STERILIZATION AND DISINFECTION-1.ppthhhbx
Business Analytics and business intelligence.pdf
Supervised vs unsupervised machine learning algorithms
Introduction to Data Science and Data Analysis
Leprosy and NLEP programme community medicine
[EN] Industrial Machine Downtime Prediction
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
oil_refinery_comprehensive_20250804084928 (1).pptx

Multiclass classification of imbalanced data