Instance Based Learning in machine learning

Unsupervised Learning:
Customer Segmentation: The unsupervised learning puts the customers
into different buying groups, hence the companies can know the different
customer segments and advertise to the group to make them better
targets.
Market Basket Analysis: This also extends to suggestions. It facilitates the
exploration of the relations between the products that are usually bought
together. Think of a store putting peanut butter and jelly closer to each
other because of this assumption.

Model-Based Learning:
 Model-based learning involves creating a mathematical model that can
predict outcomes based on input data.
 The model is trained on a large dataset and then used to make
predictions on new data.
 The model can be thought of as a set of rules that the machine uses to
make predictions.
 The model is typically created using statistical algorithms such as linear
regression, logistic regression, decision trees, and neural networks.
 Parameterized : if it learns using predefined mapped function

Instance-based learning:
 Sometimes called memory-based learning is a family of learning
algorithms that, instead of performing explicit generalization, compares
new problem instances with instances seen in training, which have been
stored in memory.
 Instead of summarizing the training data into a model, uses the training
instances themselves to make predictions.

 Lazy Learning: Unlike eager learning algorithms (which generalize the
training data into a model), instance-based learning algorithms delay
processing until a prediction is needed.
 Some of the instance-based learning algorithms are :
K Nearest Neighbor (KNN)
Self-Organizing Map (SOM)
Learning Vector Quantization (LVQ)
Locally Weighted Learning (LWL)
Case-Based Reasoning

KNN Algorithm:
 K-nearest neighbours (KNN) algorithm is a type of supervised ML
algorithm which can be used for both classification as well as
regression problems.
It is mainly used for classification problems in industry.
 Lazy learning algorithm − KNN is a lazy learning algorithm because it
does not have a specialized training phase and uses all the data for
training while classification.
 Non-parametric learning algorithm − KNN is also a non-parametric
learning algorithm because it doesn’t assume anything about the
underlying data.

 Makes predictions based on the similarity (typically distance)
between the new data point(new instance ) and the stored instances.

NAME AGE GENDER CLASS OF SPORTS
Ajay 32 0 Football
Mark 40 0 Neither
Sara 16 1 Cricket
Zaira 34 1 Cricket
Sachin 55 0 Neither
Rahul 40 0 Cricket
Pooja 20 1 Neither
Smith 15 0 Cricket
Laxmi 55 1 Football
Michael 15 0 Football
Let’s find in which class
of people Angelina will
lie whose k factor is 3
and age is 5.
So we have to find out
the distance using
d=√((x2-x1)²+(y2-y1)²)
to find the distance
between any two
points.

distance between Ajay and Angelina using formula
d=√((age2-age1)²+(gender2-gender1)²)
d=√((5-32)²+(1-0)²)
d=√729+1
d=27.02

Similarly, we find out all distance one by one.
Distance between Angelina and
Distance
Ajay 27.02
Mark 35.01
Sara 11.00 Cricket
Zaira 29.00
Sachin 50.01
Rahul 35.01
Pooja 15.00
Smith 10.05 Cricket
Laxmi 50.00
Michael 10.05 Football
Angelina-Cricket

BRIGHTNESS SATURATION CLASS
40 20 Red
50 50 Blue
60 90 Blue
10 25 Red
70 70 Blue
60 10 Red
25 80 Blue
BRIGHTNESS SATURATION CLASS K=5
20 35 ?

BRIGHTNESS SATURATION CLASS DISTANCE
40 20 Red 25
50 50 Blue 33.54
60 90 Blue 68.01
10 25 Red 10
70 70 Blue 61.03
60 10 Red 47.17
25 80 Blue 45

BRIGHTNESS SATURATION CLASS DISTANCE
10 25 Red 10
40 20 Red 25
50 50 Blue 33.54
25 80 Blue 45
60 10 Red 47.17
70 70 Blue 61.03
60 90 Blue 68.01

BRIGHTNESS SATURATION CLASS
40 20 Red
50 50 Blue
60 90 Blue
10 25 Red
70 70 Blue
60 10 Red
25 80 Blue
20 35 Red

How it Works:
Training Phase:
In k-NN, there is no explicit training phase. The algorithm simply stores
the training data.
.

Prediction Phase:
When a new instance is introduced for prediction, the algorithm follows these steps:
 Compute Distances: Calculate the distance between the new instance and all the instances in the
training set. Common distance metrics include Euclidean distance for continuous variables, Manhattan distance,
or Hamming distance for categorical variables.
 Identify Neighbors: Select the 'k' instances from the training set that are closest to the new instance (the
'k' nearest neighbors).
 Aggregate the Output:
For classification: Perform a majority vote among the 'k' nearest neighbors. The class that appears most
frequently among the neighbours is assigned to the new instance.
For regression: Calculate the average of the values of the 'k' nearest neighbors and assign this average to
the new instance

Step 1: Dataset and New Point
Dataset:
x y
1 2
2 3
3 5
4 4
5 7
New Point:
𝑥new=3.5

distance between new instance and data samples in data set:
D1= sqrt((3.5-1) **2 )=2.5
D2=1.5
D3=0.5
D4=0.5
D5=1.5
Select 3 nearest neighbours
(x3,y3)=(3,5)
(x4,y4)=(4,4)
((x2,y2)=(2,3

Compute Weights
Weights are the inverse of the distances. To avoid division by zero, we add a
small value (0.000010 to the distances.
W3=1/(0.5+0.00001)=1.99996
W4=1/(0.5+0.00001)=1.99996
W2=1/(1.5+0.00001)=0.66666
Compute Weighted Average
Compute the weighted sum of the target values and the sum of weights:
Weighted sum of Y=(5 *1.99996) + (4 * 1.99996) + (3 * 0.66666)=19.99962
Sum of weights=!.99996 +1.99996 + 0.66666=4.66658
Weighted average=19.99962/4.66658=4.2857 (3.5, 4.2857)

• Once we add distance weighting, there is really no harm in allowing all
training examples to have an influence on the classification of the x,,
because very distant examples will have very little effect on f(x,).
• Global method(Shepard's method) /otherwise local method.
• Considering all examples will make our our classifier to run more slowly.

CASE-BASED REASONING:
• k-NEAREST NEIGHBOR algorithm is lazy and classify new query instances by
analysing similar instances while ignoring instances that are very different from the
query.
• Represent instances as real-valued points in an n-dimensional Euclidean space.
• Case-based reasoning (CBR) is a learning paradigm based on the first two of these
principles
• In CBR, instances are typically represented using more rich symbolic descriptions,
and the methods used to retrieve similar instances are correspondingly more
elaborate
• CBR has been applied to problems such as conceptual design of mechanical devices
based on a stored library of previous designs.
• Reasoning about new legal cases based on previous rulings

• Solving planning and scheduling problems by reusing and combining
portions of previous solutions to similar problems.
• The CADET system :CADET is a Case-based Design Tool. CADET is a system
that aids conceptual design of electro-mechanical devices and is based on
the paradigm of Case-based Reasoning.
• A library containing approximately 75 previous designs and design
fragments to suggest conceptual designs to meet the specifications of new
design problems. Each instance stored in memory (e.g., a water pipe) is
represented by describing both its structure and its qualitative function.

• Given this functional specification for the new design problem, CADET
searches its library for stored cases whose functional descriptions match
the design problem.
• If an exact match is found, indicating that some stored case implements
exactly the desired function, then this case can be returned as a suggested
solution to the design problem.
• If no exact match occurs, CADET may find cases that match various
subgraphs of the desired functional specification.
• T-junction function matches a subgraph of the water faucet function
• graph.

• By retrieving multiple cases that match different subgraphs, the entire
design can sometimes be pieced together.
• It may also require backtracking on earlier choices of design subgoals
and, therefore, rejecting cases that were previously retrieved.

Instance Based Learning in machine learning

More Related Content

What's hot (20)

Similar to Instance Based Learning in machine learning (20)

Recently uploaded (20)

Instance Based Learning in machine learning