SlideShare a Scribd company logo
4
Most read
13
Most read
14
Most read
Binary class and Multi
Class Strategies for
Machine learning
By Vaibhav Arora
Two popular Approaches to algorithm design
contd..
Approach 1 the benefit is that we have much better and scalable algorithms we may come up with
new, elegant, learning algorithms and contribute to basic research in machine learning.With
sufficient time at hand this approach should be followed.
Approach 2 Will often get your algorithm working more quickly[3].it has faster time to
market.This approach really gives us a timely feedback and helps us in utilizing our resources
better.
What can go wrong?
1) High bias(underfit)
2) high variance(overfit)
How to detect it?
There are a no of ways we can do it but i will mention two:-
firstly, If you have a 2d or 3d dataset then you can visualize
the decision boundary and try to figure out what is possibly
wrong with your implementation.
ie if it overly fits the training set but fails on the
validation set then there is an overfitting problem (High
variance).Or if it fails to fit both the training or validation
set then there may be a high bias problem (underfitting).
Learning Curves
Secondly, In case you have more than three dimensions we can’t
visualize them so we use learning curves instead to see what
problem we have.
Def:-
A learning curve is a plot of the training and validation error
as a function of the number of training examples.These are meant
to give us important ideas about the problems associated with
our model.
LEARNING CURVES Procedure
1) let Xtr denote the training examples and Ytr denote the training targets and
let the no of training examples be n.
2) Then do
for i=1:n
train the learning algorithm on the training subset ie (Xtr(1:i),Ytr(1:i))
pairs
calculate train error
train_error(i)=Training set error(on Xtr(1:i),Ytr(1:i))
calculate validation error
valid_error(i)=validation set error(on complete validation set)
note that if you are using regularization then set regularization parameter to 0
while calculating the train_error and valid_error.
3) plot them
Learning Curves (typical plots)
contd..
What we want ?
How can you correct it?
● To Fix high variance
o Get more training examples.
o Get a smaller set of features.
o Tweak algo parameters
● To Fix high Bias
o Get a larger set of features.
o use polynomial features
o Tweak algo parameters
Multi class classification strategies
whenever we have a multi class classification problem there are actually many
strategies that we can follow
1) Use a multi class classifier ie a NNet etc
1) Use an no of binary classifiers to solve the problem at hand
we can combine a no of binary classifiers using these strategies:-
a)one vs all
b)one vs one
c)ECOC
There are advantages and disadvantages of both approaches (you can experiment
with both and try to find out!!) but for the purpose of this presentation we are
only going to discuss the latter.
one vs all
In a one vs all strategy we would divide the data sets such that a hypothesis
separates one class label from all of the rest.
one vs one
In a one vs one strategy we would divide the data sets into pairs (like shown
below) and then do classification.Note that one hypothesis separates only two
classes irrespective of the other classes.
How to select the class of a test
example
Now that we have trained a number of classifiers in one vs one
configuration we may choose a class for a training example as
follows(note that this can be done for both the one vs one or
one vs all, for the examples i have considered there are three
hypothesis for both the cases):
a) Feed the training sample to all the three hypothesis
(classifiers)
b) Select the class(label) for which the hypothesis
function is maximum.
Error correcting output codes
In an ECOC strategy we try to build classifiers which target
classification between different class subsets ie divide the
main classification problem into sub-classification tasks
How do we do that?
1) first use a no of binary classifiers,we make a matrix as
shown(three classifiers C1,C2,C3 and three labels L1,L2 and
L3) ,here one contains positive class and zero contains
negative class.(it is also possible to exclude certain
classes in which case you can use 1,-1,and 0 ,0 indicating
the removed class)
contd..
2) Next when we have a new example we feed it to all the
classifiers and the resultant output can then be compared with
the labels (in terms of a distance measure say hamming[1]) and
the label with the minimum hamming distance is assigned to the
training example(if more than one rows are equidistant then
assign arbitrarily).
3)Eg suppose we have a training example t1 and when we feed it
to the classifiers we get output 001 now its difference with
label 1 is 2, with label 2 is 3 and with label 3 is 1 therefore
label 1 will be assigned to the training example.
contd..
4) something that we have to keep in mind :-
if n is the no of classifiers and m is the no of labels then
n>logm (base 2)
5) there are also a no of ways by which we select the coding
matrix(the classifier label matrix) one can be through the use
of generator matrix.other possible strategies include learning
from data(via genetic algorithms) .For more info check
references.
Personal experience
ok this is all based on personal experience so you may choose to agree or not
1) When working with svms ECOC one vs one seems to work better so try that first
when using svms.
2) With logistic regression try one vs all approach first.
3) When you have a large no of training examples choose to fit logistic
regression as best as possible instead of a support vector machines with
linear kernel because they may work well but you will have a lot of support
vectors (storage).
So with reasonable compromise on accuracy you can do with a lot less
storage.
4) When programming try to implement all these strategies by implementing the
ECOC coding matrix and then you can choose to select the label either by maximum
hypothesis or on the basis of hamming distance.
References
1) https://en.wikipedia.org/wiki/Hamming_distance
1) for detail on learning curves and classification strategies prof Andrew ng’s
machine learning course coursera.
1) Prof Andrew ng course slides at
http://cs229.stanford.edu/materials/ML-advice.pdf
1) for ecoc classification a nice paper ,”Error-Correcting Output Coding for
Text Classification”,by Adam Berger, School of Computer Science Carnegie
Mellon University.can be found at
www.cs.cmu.edu/~aberger/pdf/ecoc.pdf
Thank You..
You can contact us at info@paxcel.net
Your feedback is appreciated.

More Related Content

ODP
Machine Learning with Decision trees
PDF
Confusion Matrix
PDF
Performance Metrics for Machine Learning Algorithms
PPTX
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
PPTX
K-Folds Cross Validation Method
PPTX
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...
PDF
Introduction to Machine Learning Classifiers
PPTX
Machine Learning - Ensemble Methods
Machine Learning with Decision trees
Confusion Matrix
Performance Metrics for Machine Learning Algorithms
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
K-Folds Cross Validation Method
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...
Introduction to Machine Learning Classifiers
Machine Learning - Ensemble Methods

What's hot (20)

PDF
Linear regression
PPTX
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
PPTX
Lecture-12Evaluation Measures-ML.pptx
PDF
Classification Based Machine Learning Algorithms
PPTX
Introdution and designing a learning system
PPTX
Evaluation of multilabel multi class classification
PPTX
Cross validation.pptx
PPT
2.4 rule based classification
PPTX
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
PPT
Instance Based Learning in Machine Learning
PPTX
Dimensionality Reduction | Machine Learning | CloudxLab
PPTX
Support vector machine
PPTX
Unsupervised learning
PDF
Logistic regression in Machine Learning
PPT
1.7 data reduction
PDF
Data preprocessing using Machine Learning
PDF
L2. Evaluating Machine Learning Algorithms I
PDF
Understanding Bagging and Boosting
PPTX
Naive Bayes
Linear regression
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Lecture-12Evaluation Measures-ML.pptx
Classification Based Machine Learning Algorithms
Introdution and designing a learning system
Evaluation of multilabel multi class classification
Cross validation.pptx
2.4 rule based classification
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Instance Based Learning in Machine Learning
Dimensionality Reduction | Machine Learning | CloudxLab
Support vector machine
Unsupervised learning
Logistic regression in Machine Learning
1.7 data reduction
Data preprocessing using Machine Learning
L2. Evaluating Machine Learning Algorithms I
Understanding Bagging and Boosting
Naive Bayes
Ad

Viewers also liked (6)

PDF
Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?
PPTX
Enhancing the performance of Naive Bayesian Classifier using Information Gain...
PDF
Legal Analytics Course - Class 7 - Binary Classification with Decision Tree L...
KEY
Document Classification In PHP
PPT
Winnow vs perceptron
PDF
Law + Complexity & Prediction: Toward a Characterization of Legal Systems as ...
Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?
Enhancing the performance of Naive Bayesian Classifier using Information Gain...
Legal Analytics Course - Class 7 - Binary Classification with Decision Tree L...
Document Classification In PHP
Winnow vs perceptron
Law + Complexity & Prediction: Toward a Characterization of Legal Systems as ...
Ad

Similar to Binary Class and Multi Class Strategies for Machine Learning (20)

PDF
Ai_Project_report
PDF
lec6_annotated.pdf ml csci 567 vatsal sharan
PPT
594503964-Introduction-to-Classification-PPT-Slides-1.ppt
PPT
[ppt]
PPT
[ppt]
PPTX
Lecture 09(introduction to machine learning)
PPTX
Classification Algorithm in Machine Learning
PPTX
MACHINE LEARNING YEAR DL SECOND PART.pptx
PPTX
PPTX
Machine_Learning.pptx
DOC
Lecture #1: Introduction to machine learning (ML)
PPTX
Machine learning with scikitlearn
PDF
MS CS - Selecting Machine Learning Algorithm
PPTX
Introduction to Machine Learning
PDF
Machine learning (5)
PPT
Computational Biology, Part 4 Protein Coding Regions
PPT
Introduction to Machine Learning Aristotelis Tsirigos
PPTX
digital image processing - classification
PDF
Brain, Bytes & Bias: ML Interview Questions You Can’t Miss!
PPTX
Introduction to Machine Learning
Ai_Project_report
lec6_annotated.pdf ml csci 567 vatsal sharan
594503964-Introduction-to-Classification-PPT-Slides-1.ppt
[ppt]
[ppt]
Lecture 09(introduction to machine learning)
Classification Algorithm in Machine Learning
MACHINE LEARNING YEAR DL SECOND PART.pptx
Machine_Learning.pptx
Lecture #1: Introduction to machine learning (ML)
Machine learning with scikitlearn
MS CS - Selecting Machine Learning Algorithm
Introduction to Machine Learning
Machine learning (5)
Computational Biology, Part 4 Protein Coding Regions
Introduction to Machine Learning Aristotelis Tsirigos
digital image processing - classification
Brain, Bytes & Bias: ML Interview Questions You Can’t Miss!
Introduction to Machine Learning

More from Paxcel Technologies (11)

PPTX
Async pattern
PDF
Window phone 8 introduction
PDF
Ssrs 2012(powerview) installation ans configuration
PPTX
Paxcel Mobile development Portfolio
PPTX
Sequence diagrams in UML
PPTX
Introduction to UML
PDF
Risk Oriented Testing of Web-Based Applications
PPTX
Knockout.js explained
PDF
All about Contactless payments
PDF
Html5 deciphered - designing concepts part 1
PDF
Paxcel Snapshot
Async pattern
Window phone 8 introduction
Ssrs 2012(powerview) installation ans configuration
Paxcel Mobile development Portfolio
Sequence diagrams in UML
Introduction to UML
Risk Oriented Testing of Web-Based Applications
Knockout.js explained
All about Contactless payments
Html5 deciphered - designing concepts part 1
Paxcel Snapshot

Recently uploaded (20)

PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Lecture1 pattern recognition............
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Introduction to Data Science and Data Analysis
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Introduction to the R Programming Language
PDF
Fluorescence-microscope_Botany_detailed content
PPT
Quality review (1)_presentation of this 21
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
1_Introduction to advance data techniques.pptx
PDF
annual-report-2024-2025 original latest.
PPTX
Introduction to Knowledge Engineering Part 1
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Lecture1 pattern recognition............
Introduction-to-Cloud-ComputingFinal.pptx
Business Analytics and business intelligence.pdf
climate analysis of Dhaka ,Banglades.pptx
Introduction to Data Science and Data Analysis
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Introduction to the R Programming Language
Fluorescence-microscope_Botany_detailed content
Quality review (1)_presentation of this 21
Clinical guidelines as a resource for EBP(1).pdf
Miokarditis (Inflamasi pada Otot Jantung)
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
1_Introduction to advance data techniques.pptx
annual-report-2024-2025 original latest.
Introduction to Knowledge Engineering Part 1

Binary Class and Multi Class Strategies for Machine Learning

  • 1. Binary class and Multi Class Strategies for Machine learning By Vaibhav Arora
  • 2. Two popular Approaches to algorithm design
  • 3. contd.. Approach 1 the benefit is that we have much better and scalable algorithms we may come up with new, elegant, learning algorithms and contribute to basic research in machine learning.With sufficient time at hand this approach should be followed. Approach 2 Will often get your algorithm working more quickly[3].it has faster time to market.This approach really gives us a timely feedback and helps us in utilizing our resources better.
  • 4. What can go wrong? 1) High bias(underfit) 2) high variance(overfit)
  • 5. How to detect it? There are a no of ways we can do it but i will mention two:- firstly, If you have a 2d or 3d dataset then you can visualize the decision boundary and try to figure out what is possibly wrong with your implementation. ie if it overly fits the training set but fails on the validation set then there is an overfitting problem (High variance).Or if it fails to fit both the training or validation set then there may be a high bias problem (underfitting).
  • 6. Learning Curves Secondly, In case you have more than three dimensions we can’t visualize them so we use learning curves instead to see what problem we have. Def:- A learning curve is a plot of the training and validation error as a function of the number of training examples.These are meant to give us important ideas about the problems associated with our model.
  • 7. LEARNING CURVES Procedure 1) let Xtr denote the training examples and Ytr denote the training targets and let the no of training examples be n. 2) Then do for i=1:n train the learning algorithm on the training subset ie (Xtr(1:i),Ytr(1:i)) pairs calculate train error train_error(i)=Training set error(on Xtr(1:i),Ytr(1:i)) calculate validation error valid_error(i)=validation set error(on complete validation set) note that if you are using regularization then set regularization parameter to 0 while calculating the train_error and valid_error. 3) plot them
  • 11. How can you correct it? ● To Fix high variance o Get more training examples. o Get a smaller set of features. o Tweak algo parameters ● To Fix high Bias o Get a larger set of features. o use polynomial features o Tweak algo parameters
  • 12. Multi class classification strategies whenever we have a multi class classification problem there are actually many strategies that we can follow 1) Use a multi class classifier ie a NNet etc 1) Use an no of binary classifiers to solve the problem at hand we can combine a no of binary classifiers using these strategies:- a)one vs all b)one vs one c)ECOC There are advantages and disadvantages of both approaches (you can experiment with both and try to find out!!) but for the purpose of this presentation we are only going to discuss the latter.
  • 13. one vs all In a one vs all strategy we would divide the data sets such that a hypothesis separates one class label from all of the rest.
  • 14. one vs one In a one vs one strategy we would divide the data sets into pairs (like shown below) and then do classification.Note that one hypothesis separates only two classes irrespective of the other classes.
  • 15. How to select the class of a test example Now that we have trained a number of classifiers in one vs one configuration we may choose a class for a training example as follows(note that this can be done for both the one vs one or one vs all, for the examples i have considered there are three hypothesis for both the cases): a) Feed the training sample to all the three hypothesis (classifiers) b) Select the class(label) for which the hypothesis function is maximum.
  • 16. Error correcting output codes In an ECOC strategy we try to build classifiers which target classification between different class subsets ie divide the main classification problem into sub-classification tasks
  • 17. How do we do that? 1) first use a no of binary classifiers,we make a matrix as shown(three classifiers C1,C2,C3 and three labels L1,L2 and L3) ,here one contains positive class and zero contains negative class.(it is also possible to exclude certain classes in which case you can use 1,-1,and 0 ,0 indicating the removed class)
  • 18. contd.. 2) Next when we have a new example we feed it to all the classifiers and the resultant output can then be compared with the labels (in terms of a distance measure say hamming[1]) and the label with the minimum hamming distance is assigned to the training example(if more than one rows are equidistant then assign arbitrarily). 3)Eg suppose we have a training example t1 and when we feed it to the classifiers we get output 001 now its difference with label 1 is 2, with label 2 is 3 and with label 3 is 1 therefore label 1 will be assigned to the training example.
  • 19. contd.. 4) something that we have to keep in mind :- if n is the no of classifiers and m is the no of labels then n>logm (base 2) 5) there are also a no of ways by which we select the coding matrix(the classifier label matrix) one can be through the use of generator matrix.other possible strategies include learning from data(via genetic algorithms) .For more info check references.
  • 20. Personal experience ok this is all based on personal experience so you may choose to agree or not 1) When working with svms ECOC one vs one seems to work better so try that first when using svms. 2) With logistic regression try one vs all approach first. 3) When you have a large no of training examples choose to fit logistic regression as best as possible instead of a support vector machines with linear kernel because they may work well but you will have a lot of support vectors (storage). So with reasonable compromise on accuracy you can do with a lot less storage. 4) When programming try to implement all these strategies by implementing the ECOC coding matrix and then you can choose to select the label either by maximum hypothesis or on the basis of hamming distance.
  • 21. References 1) https://en.wikipedia.org/wiki/Hamming_distance 1) for detail on learning curves and classification strategies prof Andrew ng’s machine learning course coursera. 1) Prof Andrew ng course slides at http://cs229.stanford.edu/materials/ML-advice.pdf 1) for ecoc classification a nice paper ,”Error-Correcting Output Coding for Text Classification”,by Adam Berger, School of Computer Science Carnegie Mellon University.can be found at www.cs.cmu.edu/~aberger/pdf/ecoc.pdf
  • 22. Thank You.. You can contact us at [email protected] Your feedback is appreciated.