SlideShare a Scribd company logo
Supervised vs. Unsupervised Learning
 Supervised learning (classification)
◦ Supervision: The training data (observations,
measurements, etc.) are accompanied by labels indicating
the class of the observations
◦ New data is classified based on the training set
 Unsupervised learning (clustering)
◦ The class labels of training data is unknown
◦ Given a set of measurements, observations, etc. with the
aim of establishing the existence of classes or clusters in
the data
Classification Algorithm in Machine Learning
CLASSIFICATION
 Classification
 predicts categorical class labels (discrete or nominal)
 classifies data (constructs a model) based on the training
set and the values (class labels) in a classifying attribute
and uses it in classifying new data
 Prediction
 models continuous-valued functions, i.e., predicts
unknown or missing values
 Typical applications
 Credit approval
 Target marketing
 Medical diagnosis
 Fraud detection
Classification vs. Prediction
Classification: Definition
 Given a collection of records (training set )
 Each record contains a set of attributes, one of the attributes is
the class.
 Find a model for class attribute as a function of
the values of other attributes.
 Goal: previously unseen records should be
assigned a class as accurately as possible.
 A test set is used to determine the accuracy of the model.
Usually, the given data set is divided into training and test sets,
with training set used to build the model and test set used to
validate it.
Classification—A Two-Step Process
 Model construction: describing a set of
predetermined classes
Each tuple/sample is assumed to belong to a
predefined class, as determined by the class
label attribute
The set of tuples used for model
construction is training set
The model is represented as classification
rules, decision trees, or mathematical
formulae
Classification—A Two-Step Process
 Model usage: for classifying future or unknown
objects
Estimate accuracy of the model
 The known label of test sample is compared
with the classified result from the model
 Accuracy rate is the percentage of test set
samples that are correctly classified by the
model
 Test set is independent of training set,
otherwise over-fitting will occur
If the accuracy is acceptable, use the model to
classify data tuples whose class labels are not
known
Classification Process (1): Model Construction
Training
Data
NAME RANK YEARS TENURED
Mike Assistant Prof 3 no
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
Dave Assistant Prof 6 no
Anne Associate Prof 3 no
Classification
Algorithms
IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
Classifier
(Model)
Classification Process (2): Use the Model in Prediction
Classifier
Testing
Data
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
Unseen Data
(Jeff, Professor, 4)
Tenured?
The Learning Process in spam mail Example
Email Server
● Number of recipients
● Size of message
● Number of attachments
● Number of "re's" in the
subject line
…
Model Learning Model
Testin
g
An Example
 A fish-packing plant wants to automate the
process of sorting incoming fish according to
species
 As a pilot project, it is decided to try to
separate sea bass from salmon using optical
sensing
Classification
An Example (continued)
 Features/attributes:
Length
Lightness
Width
Position of mouth

Classification
An Example (continued)
 Preprocessing: Images of different
fishes are isolated from one another
and from background;
 Feature extraction: The information
of a single fish is then sent to a feature
extractor, that measure certain
“features” or “properties”;
 Classification: The values of these
features are passed to a classifier that
evaluates the evidence presented, and
build a model to discriminate between
the two species
Classification
An Example (continued)
Classification
 Domain knowledge:
◦ A sea bass is generally longer than a salmon
 Related feature: (or attribute)
◦ Length
 Training the classifier:
◦ Some examples are provided to the classifier in this
form: <fish_length, fish_name>
◦ These examples are called training examples
◦ The classifier learns itself from the training examples,
how to distinguish Salmon from Bass based on the
fish_length
An Example (continued)
Classification
 Classification model (hypothesis):
◦ The classifier generates a model from the training data to classify
future examples (test examples)
◦ An example of the model is a rule like this:
◦ If Length >= l* then sea bass otherwise salmon
◦ Here the value of l* determined by the classifier
 Testing the model
◦ Once we get a model out of the classifier, we may use the
classifier to test future examples
◦ The test data is provided in the form <fish_length>
◦ The classifier outputs <fish_type> by checking fish_length against
the model
An Example (continued)
 So the overall
classification process
goes like this 
Classification
Preprocessing,
and feature
extraction
Training
Training Data
Model
Test/Unlabeled
Data
Testing against
model/
Classification
Feature vector
Preprocessing, and
feature extraction
Feature vector
Prediction/
Evaluation
An Example (continued)
Classification
Pre-
processing,
Feature
extraction
12, salmon
15, sea bass
8, salmon
5, sea bass
Training data
Feature vector
Training If len > 12,
then sea bass
else salmon
Model
Test data
15, salmon
10, salmon
18, ?
8, ?
Feature vector
Test/
Classify
sea bass (error!)
salmon (correct)
sea bass
salmon
Evaluation/Prediction
Pre-
processing,
Feature
extraction
Labeled data
Unlabeled data
An Example (continued)
Classification
 Why error?
 Insufficient training data
 Too few features
 Too many/irrelevant features
 Overfitting / specialization
An Example (continued)
Classification
Pre-
processing,
Feature
extraction
12, 4, salmon
15, 8, sea bass
8, 2, salmon
5, 10, sea bass
Training data
Feature vector
Training
If ltns > 6 or
len*5+ltns*2>100
then sea bass else
salmon
Model
Test data
15, 2, salmon
10, 7, salmon
18, 7, ?
8, 5, ?
Feature vector
Test/
Classify
salmon (correct)
salmon (correct)
sea bass
salmon
Evaluation/Prediction
Pre-
processing,
Feature
extraction
Linear, Non-linear, Multi-class
and
Multi-label classification
Linear Classification
 A linear classifier achieves this by making
a classification decision based on the value of
a linear combination of the characteristics.
 A classification algorithm (Classifier) that makes its
classification based on a linear predictor function
combining a set of weights with the feature vector
 Decision boundaries is flat
◦ Line, plane, ….
 May involve non-linear operations
Classification Algorithm in Machine Learning
Classification Algorithm in Machine Learning
Linear Classifiers
How would you
classify this data?
New Recipients
Email
Length
Linear Classifiers
New Recipients
Email
Length
Classification Algorithm in Machine Learning
Linear Classifiers
Any of these would
be fine..
..but which is best?
New Recipients
Email
Length
Classifier Margin
New Recipients
Define the margin of
a linear classifier
as the width that
the boundary
could be increased
by before hitting a
datapoint.
Email
Length
Classification Algorithm in Machine Learning
Classification Algorithm in Machine Learning
Classification Algorithm in Machine Learning
Classification Algorithm in Machine Learning
Classification Algorithm in Machine Learning
No Linear Classifier can cover all instances
How would you
classify this data?
New Recipients
Email
Length
• Ideally, the best decision boundary should
be the one which provides an optimal
performance such as in the following
figure
No Linear Classifier can cover all instances
Email
Length
New Recipients
What is multiclass
 Output
◦ In some cases, output space can be very large
(i.e., K is very large)
 Each input belongs to exactly one class
(c.f. in multilabel, input belongs to many classes)
Multi-Classes Classification
 Multi-class classification is simply
classifying objects into any one
of multiple categories. Such as
classifying just into either a dog
or cat from the dataset.
 1.When there are more than two
categories in which the images can
be classified, and
 2.An image does not belong to
more than one class

 If both of the above conditions are
satisfied, it is referred to as a multi-
class image classification problem
Classification Algorithm in Machine Learning
Multi-label classification
 When we can classify an image into
more than one class (as in the image
beside), it is known as a multi-label
image classification problem.
 Multi-label classification is a type
of classification in which an object
can be categorized into more than
one class.
 For example, In the image dataset,
we will classify a picture as
the image of a dog or cat and
also classify the same image based
on the breed of the dog or cat
 .
These are all labels of the given images. Each
image here belongs to more than one
class and hence it is a multi-label image
classification problem.
BinaryVs Multi-class
Classification Algorithm in Machine Learning
Multi classVs multi label classification
Ad

Recommended

Unit 4 Classification of data and more info on it
Unit 4 Classification of data and more info on it
randomguy1722
 
dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptx
AsrithaKorupolu
 
Presentation on supervised learning
Presentation on supervised learning
Tonmoy Bhagawati
 
Lect8 Classification & prediction
Lect8 Classification & prediction
hktripathy
 
Data mining chapter04and5-best
Data mining chapter04and5-best
ABDUmomo
 
classification in Data Analysis Data Analysis.pptx
classification in Data Analysis Data Analysis.pptx
ssuser71aa7e
 
6 classification
6 classification
Vishal Dutt
 
introducatio to ml introducatio to ml introducatio to ml
introducatio to ml introducatio to ml introducatio to ml
DecentMusicians
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)
Jeet Das
 
Lecture2.ppt
Lecture2.ppt
sriRam132674
 
3 classification
3 classification
Mahmoud Alfarra
 
evaluation and credibility-Part 2
evaluation and credibility-Part 2
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
Classification
Classification
thamizh arasi
 
Machine learning and types
Machine learning and types
Padma Metta
 
AI_06_Machine Learning.pptx
AI_06_Machine Learning.pptx
Yousef Aburawi
 
Machine learning Method and techniques
Machine learning Method and techniques
MarkMojumdar
 
Supervised Learning-Unit 3.pptx
Supervised Learning-Unit 3.pptx
nehashanbhag5
 
Machine learning session 7
Machine learning session 7
NirsandhG
 
Supervised learning
Supervised learning
Johnson Ubah
 
ai4.ppt
ai4.ppt
akshatsharma823122
 
ai4.ppt
ai4.ppt
ssuser448ad3
 
Big Data Analytics - Unit 3.pptx
Big Data Analytics - Unit 3.pptx
PlacementsBCA
 
04-Machine-Learning-Overview pros and cons
04-Machine-Learning-Overview pros and cons
abzalbekulasbekov
 
Machine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approach
Ajit Ghodke
 
in5490-classification (1).pptx
in5490-classification (1).pptx
MonicaTimber
 
ai4.ppt
ai4.ppt
atul404633
 
Machine learning and decision trees
Machine learning and decision trees
Padma Metta
 
Lecture 3 ml
Lecture 3 ml
Kalpesh Doru
 
Logical Design Architecture in Internet of Things
Logical Design Architecture in Internet of Things
Senthil Vit
 
Wireless sensor networks in Internet of Things
Wireless sensor networks in Internet of Things
Senthil Vit
 

More Related Content

Similar to Classification Algorithm in Machine Learning (20)

Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)
Jeet Das
 
Lecture2.ppt
Lecture2.ppt
sriRam132674
 
3 classification
3 classification
Mahmoud Alfarra
 
evaluation and credibility-Part 2
evaluation and credibility-Part 2
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
Classification
Classification
thamizh arasi
 
Machine learning and types
Machine learning and types
Padma Metta
 
AI_06_Machine Learning.pptx
AI_06_Machine Learning.pptx
Yousef Aburawi
 
Machine learning Method and techniques
Machine learning Method and techniques
MarkMojumdar
 
Supervised Learning-Unit 3.pptx
Supervised Learning-Unit 3.pptx
nehashanbhag5
 
Machine learning session 7
Machine learning session 7
NirsandhG
 
Supervised learning
Supervised learning
Johnson Ubah
 
ai4.ppt
ai4.ppt
akshatsharma823122
 
ai4.ppt
ai4.ppt
ssuser448ad3
 
Big Data Analytics - Unit 3.pptx
Big Data Analytics - Unit 3.pptx
PlacementsBCA
 
04-Machine-Learning-Overview pros and cons
04-Machine-Learning-Overview pros and cons
abzalbekulasbekov
 
Machine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approach
Ajit Ghodke
 
in5490-classification (1).pptx
in5490-classification (1).pptx
MonicaTimber
 
ai4.ppt
ai4.ppt
atul404633
 
Machine learning and decision trees
Machine learning and decision trees
Padma Metta
 
Lecture 3 ml
Lecture 3 ml
Kalpesh Doru
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)
Jeet Das
 
Machine learning and types
Machine learning and types
Padma Metta
 
AI_06_Machine Learning.pptx
AI_06_Machine Learning.pptx
Yousef Aburawi
 
Machine learning Method and techniques
Machine learning Method and techniques
MarkMojumdar
 
Supervised Learning-Unit 3.pptx
Supervised Learning-Unit 3.pptx
nehashanbhag5
 
Machine learning session 7
Machine learning session 7
NirsandhG
 
Supervised learning
Supervised learning
Johnson Ubah
 
Big Data Analytics - Unit 3.pptx
Big Data Analytics - Unit 3.pptx
PlacementsBCA
 
04-Machine-Learning-Overview pros and cons
04-Machine-Learning-Overview pros and cons
abzalbekulasbekov
 
Machine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approach
Ajit Ghodke
 
in5490-classification (1).pptx
in5490-classification (1).pptx
MonicaTimber
 
Machine learning and decision trees
Machine learning and decision trees
Padma Metta
 

More from Senthil Vit (20)

Logical Design Architecture in Internet of Things
Logical Design Architecture in Internet of Things
Senthil Vit
 
Wireless sensor networks in Internet of Things
Wireless sensor networks in Internet of Things
Senthil Vit
 
Decision Trees Learning in Machine Learning
Decision Trees Learning in Machine Learning
Senthil Vit
 
Operating system Virtualization_NEW.pptx
Operating system Virtualization_NEW.pptx
Senthil Vit
 
Synchronization Peterson’s Solution.pptx
Synchronization Peterson’s Solution.pptx
Senthil Vit
 
Control structures in Python programming
Control structures in Python programming
Senthil Vit
 
Data and Expressions in Python programming
Data and Expressions in Python programming
Senthil Vit
 
Python programming Introduction about Python
Python programming Introduction about Python
Senthil Vit
 
Switching Problems.pdf
Switching Problems.pdf
Senthil Vit
 
Big Oh.ppt
Big Oh.ppt
Senthil Vit
 
AsymptoticNotations.ppt
AsymptoticNotations.ppt
Senthil Vit
 
snort.ppt
snort.ppt
Senthil Vit
 
First Best and Worst Fit.pptx
First Best and Worst Fit.pptx
Senthil Vit
 
File Implementation Problem.pptx
File Implementation Problem.pptx
Senthil Vit
 
Design Issues of an OS.ppt
Design Issues of an OS.ppt
Senthil Vit
 
Operating Systems – Structuring Methods.pptx
Operating Systems – Structuring Methods.pptx
Senthil Vit
 
deadlock.ppt
deadlock.ppt
Senthil Vit
 
Virtualization.pptx
Virtualization.pptx
Senthil Vit
 
Traffic-Monitoring.ppt
Traffic-Monitoring.ppt
Senthil Vit
 
Lect_2.pptx
Lect_2.pptx
Senthil Vit
 
Logical Design Architecture in Internet of Things
Logical Design Architecture in Internet of Things
Senthil Vit
 
Wireless sensor networks in Internet of Things
Wireless sensor networks in Internet of Things
Senthil Vit
 
Decision Trees Learning in Machine Learning
Decision Trees Learning in Machine Learning
Senthil Vit
 
Operating system Virtualization_NEW.pptx
Operating system Virtualization_NEW.pptx
Senthil Vit
 
Synchronization Peterson’s Solution.pptx
Synchronization Peterson’s Solution.pptx
Senthil Vit
 
Control structures in Python programming
Control structures in Python programming
Senthil Vit
 
Data and Expressions in Python programming
Data and Expressions in Python programming
Senthil Vit
 
Python programming Introduction about Python
Python programming Introduction about Python
Senthil Vit
 
Switching Problems.pdf
Switching Problems.pdf
Senthil Vit
 
AsymptoticNotations.ppt
AsymptoticNotations.ppt
Senthil Vit
 
First Best and Worst Fit.pptx
First Best and Worst Fit.pptx
Senthil Vit
 
File Implementation Problem.pptx
File Implementation Problem.pptx
Senthil Vit
 
Design Issues of an OS.ppt
Design Issues of an OS.ppt
Senthil Vit
 
Operating Systems – Structuring Methods.pptx
Operating Systems – Structuring Methods.pptx
Senthil Vit
 
Virtualization.pptx
Virtualization.pptx
Senthil Vit
 
Traffic-Monitoring.ppt
Traffic-Monitoring.ppt
Senthil Vit
 
Ad

Recently uploaded (20)

Rigor, ethics, wellbeing and resilience in the ICT doctoral journey
Rigor, ethics, wellbeing and resilience in the ICT doctoral journey
Yannis
 
NALCO Green Anode Plant,Compositions of CPC,Pitch
NALCO Green Anode Plant,Compositions of CPC,Pitch
arpitprachi123
 
Center Enamel can Provide Aluminum Dome Roofs for diesel tank.docx
Center Enamel can Provide Aluminum Dome Roofs for diesel tank.docx
CenterEnamel
 
ElysiumPro Company Profile 2025-2026.pdf
ElysiumPro Company Profile 2025-2026.pdf
info751436
 
Engineering Mechanics Introduction and its Application
Engineering Mechanics Introduction and its Application
Sakthivel M
 
最新版美国圣莫尼卡学院毕业证(SMC毕业证书)原版定制
最新版美国圣莫尼卡学院毕业证(SMC毕业证书)原版定制
Taqyea
 
Industry 4.o the fourth revolutionWeek-2.pptx
Industry 4.o the fourth revolutionWeek-2.pptx
KNaveenKumarECE
 
4th International Conference on Computer Science and Information Technology (...
4th International Conference on Computer Science and Information Technology (...
ijait
 
The basics of hydrogenation of co2 reaction
The basics of hydrogenation of co2 reaction
kumarrahul230759
 
社内勉強会資料_Chain of Thought .
社内勉強会資料_Chain of Thought .
NABLAS株式会社
 
Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx
Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx
resming1
 
362 Alec Data Center Solutions-Slysium Data Center-AUH-ABB Furse.pdf
362 Alec Data Center Solutions-Slysium Data Center-AUH-ABB Furse.pdf
djiceramil
 
chemistry investigatory project for class 12
chemistry investigatory project for class 12
Susis10
 
Development of Portable Biomass Briquetting Machine (S, A & D)-1.pptx
Development of Portable Biomass Briquetting Machine (S, A & D)-1.pptx
aniket862935
 
3. What is the principles of Teamwork_Module_V1.0.ppt
3. What is the principles of Teamwork_Module_V1.0.ppt
engaash9
 
20CE601- DESIGN OF STEEL STRUCTURES ,INTRODUCTION AND ALLOWABLE STRESS DESIGN
20CE601- DESIGN OF STEEL STRUCTURES ,INTRODUCTION AND ALLOWABLE STRESS DESIGN
gowthamvicky1
 
362 Alec Data Center Solutions-Slysium Data Center-AUH-Adaptaflex.pdf
362 Alec Data Center Solutions-Slysium Data Center-AUH-Adaptaflex.pdf
djiceramil
 
362 Alec Data Center Solutions-Slysium Data Center-AUH-Adaptaflex.pdf
362 Alec Data Center Solutions-Slysium Data Center-AUH-Adaptaflex.pdf
djiceramil
 
Water demand - Types , variations and WDS
Water demand - Types , variations and WDS
dhanashree78
 
Cadastral Maps
Cadastral Maps
Google
 
Rigor, ethics, wellbeing and resilience in the ICT doctoral journey
Rigor, ethics, wellbeing and resilience in the ICT doctoral journey
Yannis
 
NALCO Green Anode Plant,Compositions of CPC,Pitch
NALCO Green Anode Plant,Compositions of CPC,Pitch
arpitprachi123
 
Center Enamel can Provide Aluminum Dome Roofs for diesel tank.docx
Center Enamel can Provide Aluminum Dome Roofs for diesel tank.docx
CenterEnamel
 
ElysiumPro Company Profile 2025-2026.pdf
ElysiumPro Company Profile 2025-2026.pdf
info751436
 
Engineering Mechanics Introduction and its Application
Engineering Mechanics Introduction and its Application
Sakthivel M
 
最新版美国圣莫尼卡学院毕业证(SMC毕业证书)原版定制
最新版美国圣莫尼卡学院毕业证(SMC毕业证书)原版定制
Taqyea
 
Industry 4.o the fourth revolutionWeek-2.pptx
Industry 4.o the fourth revolutionWeek-2.pptx
KNaveenKumarECE
 
4th International Conference on Computer Science and Information Technology (...
4th International Conference on Computer Science and Information Technology (...
ijait
 
The basics of hydrogenation of co2 reaction
The basics of hydrogenation of co2 reaction
kumarrahul230759
 
社内勉強会資料_Chain of Thought .
社内勉強会資料_Chain of Thought .
NABLAS株式会社
 
Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx
Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx
resming1
 
362 Alec Data Center Solutions-Slysium Data Center-AUH-ABB Furse.pdf
362 Alec Data Center Solutions-Slysium Data Center-AUH-ABB Furse.pdf
djiceramil
 
chemistry investigatory project for class 12
chemistry investigatory project for class 12
Susis10
 
Development of Portable Biomass Briquetting Machine (S, A & D)-1.pptx
Development of Portable Biomass Briquetting Machine (S, A & D)-1.pptx
aniket862935
 
3. What is the principles of Teamwork_Module_V1.0.ppt
3. What is the principles of Teamwork_Module_V1.0.ppt
engaash9
 
20CE601- DESIGN OF STEEL STRUCTURES ,INTRODUCTION AND ALLOWABLE STRESS DESIGN
20CE601- DESIGN OF STEEL STRUCTURES ,INTRODUCTION AND ALLOWABLE STRESS DESIGN
gowthamvicky1
 
362 Alec Data Center Solutions-Slysium Data Center-AUH-Adaptaflex.pdf
362 Alec Data Center Solutions-Slysium Data Center-AUH-Adaptaflex.pdf
djiceramil
 
362 Alec Data Center Solutions-Slysium Data Center-AUH-Adaptaflex.pdf
362 Alec Data Center Solutions-Slysium Data Center-AUH-Adaptaflex.pdf
djiceramil
 
Water demand - Types , variations and WDS
Water demand - Types , variations and WDS
dhanashree78
 
Cadastral Maps
Cadastral Maps
Google
 
Ad

Classification Algorithm in Machine Learning

  • 1. Supervised vs. Unsupervised Learning  Supervised learning (classification) ◦ Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations ◦ New data is classified based on the training set  Unsupervised learning (clustering) ◦ The class labels of training data is unknown ◦ Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data
  • 4.  Classification  predicts categorical class labels (discrete or nominal)  classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data  Prediction  models continuous-valued functions, i.e., predicts unknown or missing values  Typical applications  Credit approval  Target marketing  Medical diagnosis  Fraud detection Classification vs. Prediction
  • 5. Classification: Definition  Given a collection of records (training set )  Each record contains a set of attributes, one of the attributes is the class.  Find a model for class attribute as a function of the values of other attributes.  Goal: previously unseen records should be assigned a class as accurately as possible.  A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.
  • 6. Classification—A Two-Step Process  Model construction: describing a set of predetermined classes Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute The set of tuples used for model construction is training set The model is represented as classification rules, decision trees, or mathematical formulae
  • 7. Classification—A Two-Step Process  Model usage: for classifying future or unknown objects Estimate accuracy of the model  The known label of test sample is compared with the classified result from the model  Accuracy rate is the percentage of test set samples that are correctly classified by the model  Test set is independent of training set, otherwise over-fitting will occur If the accuracy is acceptable, use the model to classify data tuples whose class labels are not known
  • 8. Classification Process (1): Model Construction Training Data NAME RANK YEARS TENURED Mike Assistant Prof 3 no Mary Assistant Prof 7 yes Bill Professor 2 yes Jim Associate Prof 7 yes Dave Assistant Prof 6 no Anne Associate Prof 3 no Classification Algorithms IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Classifier (Model)
  • 9. Classification Process (2): Use the Model in Prediction Classifier Testing Data NAME RANK YEARS TENURED Tom Assistant Prof 2 no Merlisa Associate Prof 7 no George Professor 5 yes Joseph Assistant Prof 7 yes Unseen Data (Jeff, Professor, 4) Tenured?
  • 10. The Learning Process in spam mail Example Email Server ● Number of recipients ● Size of message ● Number of attachments ● Number of "re's" in the subject line … Model Learning Model Testin g
  • 11. An Example  A fish-packing plant wants to automate the process of sorting incoming fish according to species  As a pilot project, it is decided to try to separate sea bass from salmon using optical sensing Classification
  • 12. An Example (continued)  Features/attributes: Length Lightness Width Position of mouth  Classification
  • 13. An Example (continued)  Preprocessing: Images of different fishes are isolated from one another and from background;  Feature extraction: The information of a single fish is then sent to a feature extractor, that measure certain “features” or “properties”;  Classification: The values of these features are passed to a classifier that evaluates the evidence presented, and build a model to discriminate between the two species Classification
  • 14. An Example (continued) Classification  Domain knowledge: ◦ A sea bass is generally longer than a salmon  Related feature: (or attribute) ◦ Length  Training the classifier: ◦ Some examples are provided to the classifier in this form: <fish_length, fish_name> ◦ These examples are called training examples ◦ The classifier learns itself from the training examples, how to distinguish Salmon from Bass based on the fish_length
  • 15. An Example (continued) Classification  Classification model (hypothesis): ◦ The classifier generates a model from the training data to classify future examples (test examples) ◦ An example of the model is a rule like this: ◦ If Length >= l* then sea bass otherwise salmon ◦ Here the value of l* determined by the classifier  Testing the model ◦ Once we get a model out of the classifier, we may use the classifier to test future examples ◦ The test data is provided in the form <fish_length> ◦ The classifier outputs <fish_type> by checking fish_length against the model
  • 16. An Example (continued)  So the overall classification process goes like this  Classification Preprocessing, and feature extraction Training Training Data Model Test/Unlabeled Data Testing against model/ Classification Feature vector Preprocessing, and feature extraction Feature vector Prediction/ Evaluation
  • 17. An Example (continued) Classification Pre- processing, Feature extraction 12, salmon 15, sea bass 8, salmon 5, sea bass Training data Feature vector Training If len > 12, then sea bass else salmon Model Test data 15, salmon 10, salmon 18, ? 8, ? Feature vector Test/ Classify sea bass (error!) salmon (correct) sea bass salmon Evaluation/Prediction Pre- processing, Feature extraction Labeled data Unlabeled data
  • 18. An Example (continued) Classification  Why error?  Insufficient training data  Too few features  Too many/irrelevant features  Overfitting / specialization
  • 19. An Example (continued) Classification Pre- processing, Feature extraction 12, 4, salmon 15, 8, sea bass 8, 2, salmon 5, 10, sea bass Training data Feature vector Training If ltns > 6 or len*5+ltns*2>100 then sea bass else salmon Model Test data 15, 2, salmon 10, 7, salmon 18, 7, ? 8, 5, ? Feature vector Test/ Classify salmon (correct) salmon (correct) sea bass salmon Evaluation/Prediction Pre- processing, Feature extraction
  • 21. Linear Classification  A linear classifier achieves this by making a classification decision based on the value of a linear combination of the characteristics.  A classification algorithm (Classifier) that makes its classification based on a linear predictor function combining a set of weights with the feature vector  Decision boundaries is flat ◦ Line, plane, ….  May involve non-linear operations
  • 24. Linear Classifiers How would you classify this data? New Recipients Email Length
  • 27. Linear Classifiers Any of these would be fine.. ..but which is best? New Recipients Email Length
  • 28. Classifier Margin New Recipients Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint. Email Length
  • 34. No Linear Classifier can cover all instances How would you classify this data? New Recipients Email Length
  • 35. • Ideally, the best decision boundary should be the one which provides an optimal performance such as in the following figure
  • 36. No Linear Classifier can cover all instances Email Length New Recipients
  • 37. What is multiclass  Output ◦ In some cases, output space can be very large (i.e., K is very large)  Each input belongs to exactly one class (c.f. in multilabel, input belongs to many classes)
  • 38. Multi-Classes Classification  Multi-class classification is simply classifying objects into any one of multiple categories. Such as classifying just into either a dog or cat from the dataset.  1.When there are more than two categories in which the images can be classified, and  2.An image does not belong to more than one class   If both of the above conditions are satisfied, it is referred to as a multi- class image classification problem
  • 40. Multi-label classification  When we can classify an image into more than one class (as in the image beside), it is known as a multi-label image classification problem.  Multi-label classification is a type of classification in which an object can be categorized into more than one class.  For example, In the image dataset, we will classify a picture as the image of a dog or cat and also classify the same image based on the breed of the dog or cat  . These are all labels of the given images. Each image here belongs to more than one class and hence it is a multi-label image classification problem.
  • 43. Multi classVs multi label classification