SlideShare a Scribd company logo
Similarity-based Learning
By
Sharmila Chidaravalli
Assistant Professor
Department of ISE
Global Academy of Technology
Similarity-based Learning
• supervised learning technique
• predicts the class label of a test instance by gauging the similarity of this test
instance with training instances.
• refers to a family of instance-based learning which is used to solve both
classification and regression problems.
Instance-based learning
• makes prediction by computing distances or similarities between test instance and
specific set of training instances local to the test instance in an incremental
process.
• it considers only the nearest instance or instances to predict the class of unseen
instances.
Similarity-based classification is useful in various fields such as image processing, text
classification, pattern recognition, bio informatics, data mining, information retrieval,
natural language processing, etc.
Similarity-based learning
• also called as Instance-based learning/Just-in time learning since it does not
build an abstract model of the training instances and performs lazy learning
when classifying a new instance.
• This learning mechanism simply stores all data and uses it only when it needs
to classify an unseen instance.
• The advantage of using this learning is that processing occurs only when a
request to classify a new instance is given.
• The drawback of this learning is that it requires a large memory to store the
data since a global abstract model is not constructed initially with the training
data.
Classification of instances is done based on the measure of similarity in the form of
distance functions over data instances.
Several distance metrics are used to estimate the similarity or dissimilarity between
instances required for clustering, nearest neighbor classification, anomaly detection,
and so on.
Popular distance metrics used are Hamming distance, Euclidean distance,
Manhattan distance, Minkowski distance, Cosine similarity, Mahalanobis
distance, Pearson’s correlation
Differences Between Instance- and Model-based Learning
Instance-based Learning Model-based Learning
Lazy Learners Eager Learners
Processing of training instances is done only
during testing phase
Processing of training instances is done
during training phase
No model is built with the training instances
before it receives a test instance
Generalizes a model with the training
instances before it receives a test instance
Predicts the class of the test instance directly
from the training data
Predicts the class of the test instance from the
model built
Slow in testing phase Fast in testing phase
Learns by making many local approximations Learns by creating global approximation
NEAREST-NEIGHBOR Algorithm
The K-Nearest Neighbours (KNN) algorithm is one of the simplest supervised machine learning
algorithms that is used to solve both classification and regression problems.
KNN is also known as an instance-based model or a lazy learner because it doesn’t construct an
internal model.
It is a simple and powerful non-parametric algorithm that predicts the category of the test
instance according to the ‘k’ training samples which are closer to the test instance and classifies
it to that category which has the largest probability.
For classification problems, it will find the k nearest neighbors and predict the class by the
majority vote of the nearest neighbors.
For regression problems, it will find the k nearest neighbors and predict the value by calculating
the mean value of the nearest neighbors.
k - NEAREST-NEIGHBOR Algorithm
Problem 1 :Classification (Continuous Attributes)
Instance x y Class
A 1 2 Red
B 2 3 Blue
C 3 3 Blue
D 5 1 Red
Test Instance (t):
x=2, y=2
k = 3
Training Data
Problem 1 : Given Training Data and Test Instance
Instance x y Class
A 1 2 Red
B 2 3 Blue
C 3 3 Blue
D 5 1 Red
Test Instance (t):
x=2, y=2
k = 3
Problem 1 : Given Training Data and Test Instance
Instance x y Class
A 1 2 Red
B 2 3 Blue
C 3 3 Blue
D 5 1 Red
Test Instance (t):
x=2, y=2
k = 3
Instance Distance Class
A 1.00 Red
B 1.00 Blue
C 1.41 Blue
Step 2: Sort by distance
Step 3: Choose top k = 3 neighbors
•A (Red), B (Blue), C (Blue)
Step 4: Majority vote
•Blue: 2 votes
•Red: 1 vote
Predicted class: Blue
Problem 2: Regression (Continuous Target)
Instance x y Price
A 1 2 200
B 2 3 250
C 3 5 300
D 5 1 400
Training Data: Test Instance:
x=2,y=2
k = 2
Predicted Price: 225
Problem 3: Classification (Categorical/Binary Features)
Training Data
Instance Fever Cough Class
A Yes No Flu
B No Yes Cold
C Yes Yes Flu
D No No Healthy
Test Instance:
Fever = Yes, Cough = Yes
Step 1: Hamming distances
(Count differences in categorical features)
Instance
Hamming
Distance
Class
A 1 Flu
B 1 Cold
C 0 Flu
D 2 Healthy
Step 2: Select k = 3 nearest:
C (0)
A (1)
B (1)
Step 3: Majority vote
•Flu: 2
•Cold: 1
Prediction: Flu
Training Data
Test Instance (t):
height=150 cm , weight= 61
k = 3
Using k-NN Classify the given test instance
Problem 4
Problem 5
Consider the student performance training dataset. Given a test instance (6.1, 40, 5) and a set of
categories {Pass, Fail}.Classify the test instance considering k=3
WEIGHTED K-NEAREST-NEIGHBOR ALGORITHM
The Weighted k-NN is an extension of k-NN.
It chooses the neighbors by using the weighted distance.
The k-Nearest Neighbor (k-NN) algorithm has some serious limitations as its performance is solely
dependent on choosing the k nearest neighbors, the distance metric used and the decision rule.
However, the principle idea of Weighted k-NN is that k closest neighbors to the test instance are
assigned a higher weight in the decision as compared to neighbors that are farther away from the test
instance.
The idea is that weights are inversely proportional to distances.
The selected k nearest neighbors can be assigned uniform weights, which means all the instances in
each neighborhood are weighted equally or weights can be assigned by the inverse of their distance.
In the second case, closer neighbors of a query point will have a greater influence than neighbors which
are further away.
Weighted k-NN Algorithm
Problem 1 :Classification (Continuous Attributes)
Instance x y Class
A 1 2 Red
B 2 3 Blue
C 3 3 Blue
D 5 1 Red
Test Instance (t):
x=2, y=2
k = 3
Training Data
Problem 1 : Given Training Data and Test Instance
Instance x y Class
A 1 2 Red
B 2 3 Blue
C 3 3 Blue
D 5 1 Red
Test Instance (t):
x=2, y=2
k = 3
Problem 1 : Given Training Data and Test Instance
Instance x y Class
A 1 2 Red
B 2 3 Blue
C 3 3 Blue
D 5 1 Red
Test Instance (t):
x=2, y=2
k = 3
Problem 1 : Given Training Data and Test Instance
Instance x y Class
A 1 2 Red
B 2 3 Blue
C 3 3 Blue
D 5 1 Red
Test Instance (t):
x=2, y=2
k = 3
Predicted Class: Blue (higher total weight)
Problem 2: Regression (Continuous Target)
Instance x y Price
A 1 2 200
B 2 3 250
C 3 5 300
D 5 1 400
Training Data: Test Instance:
x=2,y=2
k = 3
Problem 2: Regression (Continuous Target)
Instance x y Price
A 1 2 200
B 2 3 250
C 3 5 300
D 5 1 400
Training Data: Test Instance:
x=2,y=2
k = 3
Predicted Price: ≈ 235.23
Problem 5
Consider the student performance training dataset. Given a test instance (6.1, 40, 5) and a set of
categories {Pass, Fail}.Classify the test instance considering k=3 using Weighted k-NN
NEAREST CENTROID CLASSIFIER
• Input: Training dataset T, Distance metric d, Test instance t
• Output: Predicted class/category
Steps:
1. Compute the mean (centroid) of each class
2. Compute Euclidean distance between test instance and each centroid
3. Predict the class with the smallest distance
It is a simple classifier and also called as Mean Difference classifier. The idea of this
classifier is to classify a test instance to the class whose centroid/mean is closest to
that instance.
Algorithm
Problem 1
Consider the training dataset. Given a test instance t = (4, 4) Classify the test instance considering
using nearest centroid classifier.
Instance X1 X2 Class
A1 1 2 C1
A2 2 3 C1
A3 3 3 C1
B1 6 5 C2
B2 7 7 C2
B3 8 6 C2
Problem 1
Consider the training dataset. Given a test instance t = (4, 4) Classify the test instance considering
using nearest centroid classifier.
Instance X1 X2 Class
A1 1 2 C1
A2 2 3 C1
A3 3 3 C1
B1 6 5 C2
B2 7 7 C2
B3 8 6 C2
Since 2.4 < 3.6, classify test instance t = (4, 4) as Class C1.
Problem 2
Consider the training dataset. Given a test instance t = (6, 5) Classify the test instance considering
using nearest centroid classifier.
x y Class
1 1 Cat
2 2 Cat
6 5 Dog
7 6 Dog
Problem 3
Consider the training dataset. Given a test instance t = (3, 2.5).Classify the test instance considering
using nearest centroid classifier.
Locally Weighted Regression (LWR)
Locally Weighted Regression (LWR) is a non-parametric supervised learning algorithm that performs
local regression by combining regression model with nearest neighbor’s model.
LWR is also referred to as a memory-based method as it requires training data while prediction
.
The key idea is that we need to approximate the linear functions of all ‘k’ neighbors that minimize the
error such that the prediction line is no more linear but rather it is a curve.
Ordinary linear regression finds out a linear relationship between the input x and the output y.
Locally Weighted Regression (LWR)
1. Given Training Dataset T
2. Train set {(xi,yi)}
3. The standard linear regression hypothesis function is given by :
Locally Weighted Regression (LWR)
4. Compute weights
Locally Weighted Regression (LWR)
5. Compute Cost Function
6. Minimize cost to find β specific to this query point (this gives a different model for each test point)

More Related Content

PDF
Decision Tree-ID3,C4.5,CART,Regression Tree
PDF
Regression Analysis-Machine Learning -Different Types
DOC
Trigono Triang Lista 1
PPTX
Calculus in Machine Learning
PPTX
Gpgpu tomoaki-fp16
PDF
RESUME FORMAT 2016 signed
PDF
AWS Black Belt Tech シリーズ 2015 - AWS IoT
PDF
+ Neet ug, iseet chemistry mc qs ( pdfdrive )
Decision Tree-ID3,C4.5,CART,Regression Tree
Regression Analysis-Machine Learning -Different Types
Trigono Triang Lista 1
Calculus in Machine Learning
Gpgpu tomoaki-fp16
RESUME FORMAT 2016 signed
AWS Black Belt Tech シリーズ 2015 - AWS IoT
+ Neet ug, iseet chemistry mc qs ( pdfdrive )

Similar to KNN,Weighted KNN,Nearest Centroid Classifier,Locally Weighted Regression (20)

PPTX
MachineLearningGlobalAcademyofTechnologySlides
PDF
Dr. Shivu__Machine Learning-Module 3.pdf
PPTX
Instance Based Learning in machine learning
PDF
09_dm1_knn_2022_23.pdf
PPTX
k-Nearest Neighbors with brief explanation.pptx
PPT
Instance Based Learning in Machine Learning
PDF
Lecture 6 - Classification Classification
PDF
KNN presentation.pdf
PPT
k Nearest Neighbor
PPTX
3a-knn.pptxhggmtdu0lphm0kultkkkkkkkkkkkk
PPTX
Statistical Machine Learning unit3 lecture notes
PPT
Poggi analytics - distance - 1a
PPTX
Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...
PPTX
UNIT IV (4).pptx
PPT
instance bases k nearest neighbor algorithm.ppt
PDF
Di35605610
PPTX
Data Mining Lecture_10(b).pptx
PPT
K nearest neighbors s machine learning method
PDF
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
PDF
K nearest neighbours
MachineLearningGlobalAcademyofTechnologySlides
Dr. Shivu__Machine Learning-Module 3.pdf
Instance Based Learning in machine learning
09_dm1_knn_2022_23.pdf
k-Nearest Neighbors with brief explanation.pptx
Instance Based Learning in Machine Learning
Lecture 6 - Classification Classification
KNN presentation.pdf
k Nearest Neighbor
3a-knn.pptxhggmtdu0lphm0kultkkkkkkkkkkkk
Statistical Machine Learning unit3 lecture notes
Poggi analytics - distance - 1a
Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...
UNIT IV (4).pptx
instance bases k nearest neighbor algorithm.ppt
Di35605610
Data Mining Lecture_10(b).pptx
K nearest neighbors s machine learning method
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
K nearest neighbours
Ad

More from Sharmila Chidaravalli (16)

PDF
Artificial Neural Network-Types,Perceptron,Problems
PDF
Clustering Algorithms - Kmeans,Min ALgorithm
PDF
Bayesian Learning - Naive Bayes Algorithm
PDF
Concept Learning - Find S Algorithm,Candidate Elimination Algorithm
PDF
Big Data Tools MapReduce,Hive and Pig.pdf
PDF
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
PDF
Big Data Intoduction & Hadoop ArchitectureModule1.pdf
PPTX
Dms introduction Sharmila Chidaravalli
PDF
Assembly code
PDF
Direct Memory Access & Interrrupts
PPT
8255 Introduction
PPTX
System Modeling & Simulation Introduction
PDF
Travelling Salesperson Problem-Branch & Bound
PDF
Bellman ford algorithm -Shortest Path
Artificial Neural Network-Types,Perceptron,Problems
Clustering Algorithms - Kmeans,Min ALgorithm
Bayesian Learning - Naive Bayes Algorithm
Concept Learning - Find S Algorithm,Candidate Elimination Algorithm
Big Data Tools MapReduce,Hive and Pig.pdf
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
Big Data Intoduction & Hadoop ArchitectureModule1.pdf
Dms introduction Sharmila Chidaravalli
Assembly code
Direct Memory Access & Interrrupts
8255 Introduction
System Modeling & Simulation Introduction
Travelling Salesperson Problem-Branch & Bound
Bellman ford algorithm -Shortest Path
Ad

Recently uploaded (20)

PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PPTX
master seminar digital applications in india
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
Complications of Minimal Access Surgery at WLH
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Trump Administration's workforce development strategy
PPTX
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
PPTX
Cell Structure & Organelles in detailed.
PDF
Yogi Goddess Pres Conference Studio Updates
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PPTX
History, Philosophy and sociology of education (1).pptx
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PDF
01-Introduction-to-Information-Management.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Updated Idioms and Phrasal Verbs in English subject
Practical Manual AGRO-233 Principles and Practices of Natural Farming
master seminar digital applications in india
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Complications of Minimal Access Surgery at WLH
Anesthesia in Laparoscopic Surgery in India
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Module 4: Burden of Disease Tutorial Slides S2 2025
Microbial disease of the cardiovascular and lymphatic systems
Trump Administration's workforce development strategy
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
Cell Structure & Organelles in detailed.
Yogi Goddess Pres Conference Studio Updates
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
History, Philosophy and sociology of education (1).pptx
Microbial diseases, their pathogenesis and prophylaxis
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Orientation - ARALprogram of Deped to the Parents.pptx
01-Introduction-to-Information-Management.pdf
Supply Chain Operations Speaking Notes -ICLT Program
Updated Idioms and Phrasal Verbs in English subject

KNN,Weighted KNN,Nearest Centroid Classifier,Locally Weighted Regression

  • 1. Similarity-based Learning By Sharmila Chidaravalli Assistant Professor Department of ISE Global Academy of Technology
  • 2. Similarity-based Learning • supervised learning technique • predicts the class label of a test instance by gauging the similarity of this test instance with training instances. • refers to a family of instance-based learning which is used to solve both classification and regression problems. Instance-based learning • makes prediction by computing distances or similarities between test instance and specific set of training instances local to the test instance in an incremental process. • it considers only the nearest instance or instances to predict the class of unseen instances. Similarity-based classification is useful in various fields such as image processing, text classification, pattern recognition, bio informatics, data mining, information retrieval, natural language processing, etc.
  • 3. Similarity-based learning • also called as Instance-based learning/Just-in time learning since it does not build an abstract model of the training instances and performs lazy learning when classifying a new instance. • This learning mechanism simply stores all data and uses it only when it needs to classify an unseen instance. • The advantage of using this learning is that processing occurs only when a request to classify a new instance is given. • The drawback of this learning is that it requires a large memory to store the data since a global abstract model is not constructed initially with the training data. Classification of instances is done based on the measure of similarity in the form of distance functions over data instances. Several distance metrics are used to estimate the similarity or dissimilarity between instances required for clustering, nearest neighbor classification, anomaly detection, and so on. Popular distance metrics used are Hamming distance, Euclidean distance, Manhattan distance, Minkowski distance, Cosine similarity, Mahalanobis distance, Pearson’s correlation
  • 4. Differences Between Instance- and Model-based Learning Instance-based Learning Model-based Learning Lazy Learners Eager Learners Processing of training instances is done only during testing phase Processing of training instances is done during training phase No model is built with the training instances before it receives a test instance Generalizes a model with the training instances before it receives a test instance Predicts the class of the test instance directly from the training data Predicts the class of the test instance from the model built Slow in testing phase Fast in testing phase Learns by making many local approximations Learns by creating global approximation
  • 5. NEAREST-NEIGHBOR Algorithm The K-Nearest Neighbours (KNN) algorithm is one of the simplest supervised machine learning algorithms that is used to solve both classification and regression problems. KNN is also known as an instance-based model or a lazy learner because it doesn’t construct an internal model. It is a simple and powerful non-parametric algorithm that predicts the category of the test instance according to the ‘k’ training samples which are closer to the test instance and classifies it to that category which has the largest probability. For classification problems, it will find the k nearest neighbors and predict the class by the majority vote of the nearest neighbors. For regression problems, it will find the k nearest neighbors and predict the value by calculating the mean value of the nearest neighbors.
  • 7. Problem 1 :Classification (Continuous Attributes) Instance x y Class A 1 2 Red B 2 3 Blue C 3 3 Blue D 5 1 Red Test Instance (t): x=2, y=2 k = 3 Training Data
  • 8. Problem 1 : Given Training Data and Test Instance Instance x y Class A 1 2 Red B 2 3 Blue C 3 3 Blue D 5 1 Red Test Instance (t): x=2, y=2 k = 3
  • 9. Problem 1 : Given Training Data and Test Instance Instance x y Class A 1 2 Red B 2 3 Blue C 3 3 Blue D 5 1 Red Test Instance (t): x=2, y=2 k = 3 Instance Distance Class A 1.00 Red B 1.00 Blue C 1.41 Blue Step 2: Sort by distance Step 3: Choose top k = 3 neighbors •A (Red), B (Blue), C (Blue) Step 4: Majority vote •Blue: 2 votes •Red: 1 vote Predicted class: Blue
  • 10. Problem 2: Regression (Continuous Target) Instance x y Price A 1 2 200 B 2 3 250 C 3 5 300 D 5 1 400 Training Data: Test Instance: x=2,y=2 k = 2 Predicted Price: 225
  • 11. Problem 3: Classification (Categorical/Binary Features) Training Data Instance Fever Cough Class A Yes No Flu B No Yes Cold C Yes Yes Flu D No No Healthy Test Instance: Fever = Yes, Cough = Yes Step 1: Hamming distances (Count differences in categorical features) Instance Hamming Distance Class A 1 Flu B 1 Cold C 0 Flu D 2 Healthy Step 2: Select k = 3 nearest: C (0) A (1) B (1) Step 3: Majority vote •Flu: 2 •Cold: 1 Prediction: Flu
  • 12. Training Data Test Instance (t): height=150 cm , weight= 61 k = 3 Using k-NN Classify the given test instance Problem 4
  • 13. Problem 5 Consider the student performance training dataset. Given a test instance (6.1, 40, 5) and a set of categories {Pass, Fail}.Classify the test instance considering k=3
  • 14. WEIGHTED K-NEAREST-NEIGHBOR ALGORITHM The Weighted k-NN is an extension of k-NN. It chooses the neighbors by using the weighted distance. The k-Nearest Neighbor (k-NN) algorithm has some serious limitations as its performance is solely dependent on choosing the k nearest neighbors, the distance metric used and the decision rule. However, the principle idea of Weighted k-NN is that k closest neighbors to the test instance are assigned a higher weight in the decision as compared to neighbors that are farther away from the test instance. The idea is that weights are inversely proportional to distances. The selected k nearest neighbors can be assigned uniform weights, which means all the instances in each neighborhood are weighted equally or weights can be assigned by the inverse of their distance. In the second case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
  • 16. Problem 1 :Classification (Continuous Attributes) Instance x y Class A 1 2 Red B 2 3 Blue C 3 3 Blue D 5 1 Red Test Instance (t): x=2, y=2 k = 3 Training Data
  • 17. Problem 1 : Given Training Data and Test Instance Instance x y Class A 1 2 Red B 2 3 Blue C 3 3 Blue D 5 1 Red Test Instance (t): x=2, y=2 k = 3
  • 18. Problem 1 : Given Training Data and Test Instance Instance x y Class A 1 2 Red B 2 3 Blue C 3 3 Blue D 5 1 Red Test Instance (t): x=2, y=2 k = 3
  • 19. Problem 1 : Given Training Data and Test Instance Instance x y Class A 1 2 Red B 2 3 Blue C 3 3 Blue D 5 1 Red Test Instance (t): x=2, y=2 k = 3 Predicted Class: Blue (higher total weight)
  • 20. Problem 2: Regression (Continuous Target) Instance x y Price A 1 2 200 B 2 3 250 C 3 5 300 D 5 1 400 Training Data: Test Instance: x=2,y=2 k = 3
  • 21. Problem 2: Regression (Continuous Target) Instance x y Price A 1 2 200 B 2 3 250 C 3 5 300 D 5 1 400 Training Data: Test Instance: x=2,y=2 k = 3 Predicted Price: ≈ 235.23
  • 22. Problem 5 Consider the student performance training dataset. Given a test instance (6.1, 40, 5) and a set of categories {Pass, Fail}.Classify the test instance considering k=3 using Weighted k-NN
  • 23. NEAREST CENTROID CLASSIFIER • Input: Training dataset T, Distance metric d, Test instance t • Output: Predicted class/category Steps: 1. Compute the mean (centroid) of each class 2. Compute Euclidean distance between test instance and each centroid 3. Predict the class with the smallest distance It is a simple classifier and also called as Mean Difference classifier. The idea of this classifier is to classify a test instance to the class whose centroid/mean is closest to that instance. Algorithm
  • 24. Problem 1 Consider the training dataset. Given a test instance t = (4, 4) Classify the test instance considering using nearest centroid classifier. Instance X1 X2 Class A1 1 2 C1 A2 2 3 C1 A3 3 3 C1 B1 6 5 C2 B2 7 7 C2 B3 8 6 C2
  • 25. Problem 1 Consider the training dataset. Given a test instance t = (4, 4) Classify the test instance considering using nearest centroid classifier. Instance X1 X2 Class A1 1 2 C1 A2 2 3 C1 A3 3 3 C1 B1 6 5 C2 B2 7 7 C2 B3 8 6 C2 Since 2.4 < 3.6, classify test instance t = (4, 4) as Class C1.
  • 26. Problem 2 Consider the training dataset. Given a test instance t = (6, 5) Classify the test instance considering using nearest centroid classifier.
  • 27. x y Class 1 1 Cat 2 2 Cat 6 5 Dog 7 6 Dog Problem 3 Consider the training dataset. Given a test instance t = (3, 2.5).Classify the test instance considering using nearest centroid classifier.
  • 28. Locally Weighted Regression (LWR) Locally Weighted Regression (LWR) is a non-parametric supervised learning algorithm that performs local regression by combining regression model with nearest neighbor’s model. LWR is also referred to as a memory-based method as it requires training data while prediction . The key idea is that we need to approximate the linear functions of all ‘k’ neighbors that minimize the error such that the prediction line is no more linear but rather it is a curve. Ordinary linear regression finds out a linear relationship between the input x and the output y.
  • 29. Locally Weighted Regression (LWR) 1. Given Training Dataset T 2. Train set {(xi,yi)} 3. The standard linear regression hypothesis function is given by :
  • 30. Locally Weighted Regression (LWR) 4. Compute weights
  • 31. Locally Weighted Regression (LWR) 5. Compute Cost Function 6. Minimize cost to find β specific to this query point (this gives a different model for each test point)