SlideShare a Scribd company logo
MACHINE LEARNING
(22ISE62)
Module-2
Dr. Shivashankar
Professor
Department of Information Science & Engineering
GLOBAL ACADEMY OF TECHNOLOGY-Bengaluru
24-05-2025 1
GLOBAL ACADEMY OF TECHNOLOGY
Ideal Homes Township, Rajarajeshwari Nagar, Bengaluru – 560 098
Department of Information Science & Engineering
Dr. Shivashankar-ISE-GAT
Module 2 - Understanding Data – 2
Bivariate Data
• Bivariate analysis is one of the statistical analysis where two variables are observed.
• One variable here is dependent (X) while the other is independent (Y).
• Bivariate data can be used to determine whether or not two variables are related.
• The aim of bivariate analysis is to find relationships among data.
• The relationships can then be used in comparisons, finding causes, and in further explorations.
• To do that, graphical display of the data is necessary.
• One such graph method is called scatter plot.
• Scatter plots are the graphs that present the relationship between two variables in a data-set.
• It is a 2D graph showing the relationship between two variables.
• It is useful in exploratory data before calculating a correlation coefficient or fitting regression
curve.
24-05-2025 2
Dr. Shivashankar-ISE-GAT
Conti..
Temperature (in
centigrade)
Sales of Sweaters
(in thousands)
5 300
12 250
15 200
20 110
23 45
27 10
35 5
24-05-2025 3
Dr. Shivashankar-ISE-GAT
300
250
200
110
45
10 5
-100
-50
0
50
100
150
200
250
300
350
0 5 10 15 20 25 30 35 40
Sales
of
Sweaters
Temparature
Sales of Sweaters (in thousands)
Figure 2.11: Scatter Plot Line graphs are similar to scatter
plots.
300
250
200
110
45
10 5
0
50
100
150
200
250
300
350
1 2 3 4 5 6 7
sales
of
Sweaters
Temparature
Sales of Sweaters (in thousands)
Figure 2.12: Line Chart
Table 2.1: Temperature in a Shop and Sales Data
Bivariate Statistics
• Bivariate analysis is stated to be an analysis of any concurrent relation between two variables or
attributes.
• Example: examples: student's study time vs. their exam scores, ice cream sales vs. temperature,
height vs. weight, income vs. years of education, and patient's BMI vs. blood pressure.
• Covariance and Correlation are methods of bivariate statistics.
• Covariance is a measure of joint probability of random variables, say X and Y.
• It is defined as covariance(X, Y) or COV(X, Y) and is used to measure the variance between two
dimensions.
• The formula for finding co-variance for specific x, and y are:
𝐶𝑂𝑉(𝑋, 𝑌) =
1
𝑁
෍
𝑖=1
𝑁
𝑥𝑖 − 𝐸(𝑋) 𝑦𝑖 − 𝐸(𝑌)
Here, 𝑥𝑖and 𝑦𝑖are data values from X and Y. E(X) and E(Y) are the mean values of 𝑥𝑖 and 𝑦𝑖.
N is the number of given data.
Also, the COV(X, Y) is same as COV(Y, X).
24-05-2025 4
Dr. Shivashankar-ISE-GAT
Bivariate Statistics
Problem 1: Find the covariance of data X = {1, 2, 3, 4, 5} and Y = {1, 4, 9, 16, 25}.
Solution: Mean(X) = E(X) =
15
5
= 3
Mean(Y) = E(Y) =
55
5
= 11
𝐶𝑂𝑉(𝑋, 𝑌) =
1
𝑁
෍
𝑖=1
𝑁
𝑥𝑖 − 𝐸(𝑋) 𝑦𝑖 − 𝐸(𝑌)
=
1 − 3 1 − 11 + 2 − 3 4 − 11 + 3 − 3 9 − 11 + 4 − 3 16 − 11 + (5 − 3)(25 − 11)
5
= 12
The covariance between X and Y is 12.
24-05-2025 5
Dr. Shivashankar-ISE-GAT
Bivariate Statistics
Problem 2: Find the covariance between X and Y for the following data:
Solution:
24-05-2025 6
Dr. Shivashankar-ISE-GAT
X 3 4 5 8 7 9 6 2 1
Y 4 3 4 7 8 7 6 3 2
Correlation
Correlation refers to a process for establishing the relationships between two
variables.
The correlation coefficient is a statistical measure of the strength of a linear relationship
between two variables. Its values can range from -1 to 1.
The sign is more important than the actual value.
1. If the value is positive, it indicates that the dimensions increase together.
2. If the value is negative, it indicates that while one-dimension increases, the other
dimension decreases.
3. If the value is zero, then it indicates that both the dimensions are independent of each
other.
If the given attributes are X = (𝑥1, 𝑥2, 𝑥3, …, 𝑥𝑛) and Y = (𝑦1, 𝑦2, 𝑦3, …, 𝑦𝑛), then the Pearson
correlation coefficient, that is denoted as r, is given as:
r=
𝐶𝑂𝑉(𝑋,𝑌)
𝜎𝑥,𝜎𝑦
where, 𝜎𝑥, 𝜎𝑦 are the standard deviations of X and Y.
24-05-2025 7
Dr. Shivashankar-ISE-GAT
Conti..
Problem 1: Find the correlation coefficient of data X = {1, 2, 3, 4, 5} and Y = {1, 4, 9, 16, 25}.
Solution: Step 1: The mean values of X and Y
Mean(X) = ത
𝑋=
15
5
= 3
Mean(Y) = = ത
𝑌 =
55
5
= 11
Step 2: Calculate the squared differences from the mean
For X for Y
(𝑋1 − ത
𝑋)2= 1 − 3 2=4 (𝑌1 − ത
𝑌)2= 1 − 11 2=100
(𝑋2 − ത
𝑋)2
= 2 − 3 2
=1 (𝑌2 − ത
𝑌)2
= 4 − 11 2
=49
(𝑋3 − ത
𝑋)2= 3 − 3 2=0 (𝑌3 − ത
𝑌)2= 9 − 11 2=4
(𝑋4 − ത
𝑋)2
= 4 − 3 2
=1 (𝑌4 − ത
𝑌)2
= 16 − 11 2
=25
(𝑋5 − ത
𝑋)2
= 5 − 3 2
=4 (𝑌5 − ത
𝑌)2
= 25 − 11 2
=196
Sum of squared differences for X: 10 Sum of squared differences for X: 374
24-05-2025 8
Dr. Shivashankar-ISE-GAT
CONTI..
Step 3: Calculate the variance
• The variance for each set is the average of these squared differences.
For X:
• Variance of X =
10
5
= 2
For Y:
• Variance of Y=
374
5
= 74.8
Step 4: Calculate the standard deviation
• The standard deviation is the square root of the variance.
• For X:
• 𝜎𝑋 = 2≈1.414
• For Y:
• 𝜎𝑌 = 74.8≈8.6486
• Therefore, the correlation coefficient is given as ratio of covariance
• 𝐶𝑂𝑉 𝑋, 𝑌 =
1
𝑁
σ𝑖=1
𝑁
𝑥𝑖 − 𝐸 𝑋 𝑦𝑖 − 𝐸 𝑌 =
1−3 1−11 + 2−3 4−11 + 3−3 9−11 + 4−3 16−11 + 5−3 25−11
5
= 12
• Therefore, correlation coefficient, r=
12
1.414+8.6486
= 0.984
24-05-2025 9
Dr. Shivashankar-ISE-GAT
Conti..
Problem 1: Find the correlation coefficient of data X = {5,9,10,3,5,7} and Y = {6,11,6,4,6,9}.
Solution:
24-05-2025 10
Dr. Shivashankar-ISE-GAT
Multivariate Statistics
• Multivariate statistics refers to methods that examine the simultaneous effect of multiple variables.
• In machine learning, almost all datasets are multivariable.
• Multivariate data is the analysis of more than two observable variables, and often, thousands of multiple
measurements need to be conducted for one or more subjects.
• The multivariate data is like bivariate data but may have more than two dependent variables.
• Some of the multivariate analysis are regression analysis, principal component analysis, and path analysis.
id Attribute-1 Attribute-2 Attribute-3
1 1 4 1
2 2 5 2
3 3 6 1
• The mean of multivariate data is a mean vector and the mean of the above three attributes is given as (2,
5, 1.33).
• The variance of multivariate data becomes the covariance matrix.
• The mean vector is called centroid and variance is called dispersion matrix.
• Multivariate data has three or more variables.
24-05-2025 11
Dr. Shivashankar-ISE-GAT
Heatmap
• In machine learning, a heatmap is a data visualization technique that uses color-coding to represent the
magnitude of individual values within a dataset, often displayed as a grid or matrix.
• It helps to identify patterns, correlations, and anomalies within complex datasets by highlighting areas of
significance.
• It takes a matrix as input and colours it.
• The darker colours indicate very large values and lighter colours indicate smaller values.
• The advantage of this method is that humans perceive colours well.
• So, by colour shaping, larger values can be perceived well.
• For example, in vehicle traffic data, heavy traffic regions can be differentiated from low traffic regions
through heatmap.
24-05-2025 12
Dr. Shivashankar-ISE-GAT
Figure 2.3 : Grid with Heatmap Pattern
Pairplot
• A scatterplot matrix, is a data visualization tool that displays pairwise relationships between all variables in
a dataset, helping to understand distributions and correlations at a glance.
• Pairplot or scatter matrix is a data visual technique for multivariate data.
• A scatter matrix consists of several pair-wise scatter plots of variables of the multivariate data.
• All the results are presented in a matrix format.
• By visual examination of the chart, one can easily find relationships among the variables such as
correlation between the variables.
• A random matrix of three columns is chosen and the relationships of the columns is plotted as a pairplot.
24-05-2025 13
Dr. Shivashankar-ISE-GAT
Figure 1: PAIRPLOT VISUALIZATION
Essential Mathematics for Multivariate Data
• Machine learning involves many mathematical concepts from the domain of Linear algebra, Statistics,
Probability and Information theory.
• Linear algebra deals with linear equations, vectors, matrices, vector spaces and transformations.
• These are the driving forces of machine learning and machine learning cannot exist without these data
types.
Linear Systems and Gaussian Elimination for Multivariate Data
• A linear system of equations is a group of equations with unknown variables. Let Ax = y,
then the solution x is given as: x = y/A = 𝐴−1
y
This is true if y is not zero and A is not zero.
The logic can be extended for N-set of equations with ‘n’ unknown variables.
• It means if and y=(𝑦1, 𝑦2,……, 𝑦𝑛)
• Then unknown variable x= y/A = 𝐴−1
y
24-05-2025 14
Dr. Shivashankar-ISE-GAT
Conti..
For solving large number of system of equations, Gaussian elimination can be used.
The procedure for applying Gaussian elimination is given as follows:
1. Write the given matrix.
2. Append vector y to the matrix A. This matrix is called augmentation matrix.
3. Keep the element 𝑎11as pivot and eliminate all 𝑎11 in second row using the matrix operation,
𝑅2−
𝑎21
𝑎11
, here 𝑅2 is the second row and
𝑎21
𝑎11
is called as multiplier. The same logic can be
used to remove 𝑎12in all other equations.
4. Repeat the same logic and reduce it to reduced echelon form. Then, the unknown variable as:
𝑥𝑛=
𝑦𝑛𝑛
𝑥𝑛𝑛
5. Then, the remaining unknown variables can be found by back-substitution as:
𝑥𝑛−1 =
𝑦𝑛−1 − 𝑎𝑛−1 𝑥 𝑎𝑛
𝑎(𝑛−1)(𝑥−1)
This part is called backward substitution.
24-05-2025 15
Dr. Shivashankar-ISE-GAT
Bivariate Statistics
Problem 4: Solve the following set of equations using Gaussian Elimination method.
2𝑥1 + 4𝑥2 = 6
4𝑥1 + 3𝑥2 = 7
Solution:
2 4 | 6
4 3 | 7
Apply the transformation by dividing the row 1 by 2 (R1/2).
-
1 2 | 3
4 3 | 7
R2=R2-4R1
-
1 2 | 3
0 − 5 | − 5
R2=R2/-5
-
1 2 | 3
0 1 | 1
R1=R1-2R2 -
1 0 | 1
0 1 | 1
x1 = 1, x2 = 1
24-05-2025 16
Dr. Shivashankar-ISE-GAT
Bivariate Statistics
Problem 5: Solve the following set of equations using Gaussian Elimination
method. 2x+y=-1
3x-5y= -21
solution:
1 0 | − 2
0 1 | 3
24-05-2025 17
Dr. Shivashankar-ISE-GAT
Machine Learning and Importance of Probability and Statistics
• Machine learning is linked with statistics and probability.
• Like linear algebra, statistics is the heart of machine learning.
• The importance of statistics needs to be stressed as without statistics; analysis of data is difficult.
• Probability is especially important for machine learning.
• In machine learning, probability is a fundamental concept that deals with the likelihood of events or
outcomes. It's used to model uncertainty and make predictions, especially in algorithms that deal with
probabilistic models like Naive Bayes.
Probability Distributions
• The mathematical function that gives the probabilities of occurrence of possible outcomes for
an experiment.
• In other words, distribution is a function that describes the relationship between the observations in a
sample space.
Probability distributions are of two types:
1. Discrete probability distribution
2. Continuous probability distribution
24-05-2025 18
Dr. Shivashankar-ISE-GAT
FEATURE ENGINEERING AND DIMENSIONALITY REDUCTION TECHNIQUES
• The process of selecting, transforming, and creating new features (or variables) from raw data to improve the performance of
machine learning models.
• It involves carefully preparing the input data so that machine learning algorithms can learn effectively and make accurate
predictions.
• Features are attributes.
• Feature engineering is about determining the subset of features that form an important part of the input that improves the
performance of the model, be it classification or any other model in machine learning.
• Feature engineering deals with two problems – Feature Transformation and Feature Selection.
• Feature transformation is extraction of features and creating new features that may be helpful in increasing performance.
• For example, the height and weight may give a new attribute called Body Mass Index (BMI).
• Feature subset selection is another important aspect of feature engineering that focuses on selection of features to reduce
the time but not at the cost of reliability.
The features can be removed based on two aspects:
1. Feature relevancy – Some features contribute more for classification than other features.
For example, a mole on the face can help in face detection than common features like nose.
2. Feature redundancy – Some features are redundant.
For example, when a database table has a field called Date of birth, then age field is not relevant as age can be computed
easily from date of birth. This helps in removing the column age that leads to reduction of dimension one.
24-05-2025 19
Dr. Shivashankar-ISE-GAT
conti..
1 Stepwise Forward Selection:
• This procedure starts with an empty set of attributes.
• Every time, an attribute is tested for statistical significance for best quality and is added to the reduced
set. This process is continued till a good reduced set of attributes is obtained.
2 Stepwise Backward Elimination:
• This procedure starts with a complete set of attributes.
• At every stage, the procedure removes the worst attribute from the set, leading to the reduced set.
Combined Approach Both forward and reverse methods can be combined so that the procedure can add
the best attribute and remove the worst attribute.
3 Principal Component Analysis
• The idea of the principal component analysis (PCA) or KL transform is to transform a given set of
measurements to a new set of features so that the features exhibit high information packing properties.
This leads to a reduced and compact set of features.
• Basically, this elimination is made possible because of the information redundancies.
• This compact representation is of a reduced dimension.
24-05-2025 20
Dr. Shivashankar-ISE-GAT
PCA
Consider a group of random vectors of the form:
𝑥 =
𝑥1
𝑥2
𝑥3
.
.
𝑥𝑛
The mean vector of the set of random vectors is defined as: 𝑚𝑥 = 𝐸 𝑥
The operator E refers to the expected value of the population.
This is calculated theoretically using the probability density functions (PDF) of the elements 𝑥𝑖 and the joint
probability density functions between the elements 𝑥𝑖 and 𝑥𝑗.
From this, the covariance matrix can be calculated as:
C = E{(x - 𝑚𝑥) 𝑥 − 𝑚𝑥
𝑇
}
For M random vectors, when M is large enough, the mean vector and covariance matrix can be approximately
calculated as:
𝑚𝑥 =
1
𝑁
෍
𝑘=1
𝑀
𝑥𝑘
𝐴 =
1
𝑁
෍
𝑘=1
𝑀
𝑥𝑘 𝑥𝑘
𝑇
− 𝑚𝑥𝑚𝑥
𝑇
24-05-2025 21
Dr. Shivashankar-ISE-GAT
conti..
The mapping of the vectors x to y using the transformation can now be described as:
y = A(x - 𝑚𝑥)
This transform is also called as Karhunen-Loeve or Hoteling transform. The original vector x can now be
reconstructed as follows:
x = 𝐴𝑇
y + 𝑚𝑥
The goal of PCA is to reduce the set of attributes to a newer, smaller set that captures the variance of the data.
The variance is captured by fewer components, which would give the same result as the original, with all the
attributes.
If K largest eigen values are used, the recovered information would be:
𝑥 = 𝐴𝑘
𝑇
𝑚𝑥
The advantages of PCA are immense.
It reduces the attribute list by eliminating all irrelevant attributes.
The PCA algorithm is as follows: 1. The target dataset x is obtained
2. The mean is subtracted from the dataset. Let the mean be m.
Thus, the adjusted dataset is X – m. The objective of this process is to transform the dataset with zero mean.
3. The covariance of dataset x is obtained. Let it be C.
24-05-2025 22
Dr. Shivashankar-ISE-GAT
conti..
4. Eigen values and eigen vectors of the covariance matrix are calculated.
5. The eigen vector of the highest eigen value is the principal component of the dataset. The eigen values
are arranged in a descending order.
The feature vector is formed with these eigen vectors in its columns. Feature vector = {𝐸𝑖𝑔𝑒𝑛 𝑣𝑒𝑐𝑡𝑜𝑟1,
𝐸𝑖𝑔𝑒𝑛 𝑣𝑒𝑐𝑡𝑜𝑟2, 𝐸𝑖𝑔𝑒𝑛 𝑣𝑒𝑐𝑡𝑜𝑟3,…. 𝐸𝑖𝑔𝑒𝑛 𝑣𝑒𝑐𝑡𝑜𝑟𝑛}
6. Obtain the transpose of feature vector. Let it be A.
7. PCA transform is y = A × (x – m), where x is the input dataset, m is the mean, and A is the transpose of the
feature vector.
The original data can be retrieved using the formula given below:
Original data (f) = { 𝐴 −1
× y} + m
= { 𝐴 𝑇
× y} + m
The new data is a dimensionally reduced matrix that represents the original data.
Therefore, PCA is effective in removing the attributes that do not contribute.
24-05-2025 23
Dr. Shivashankar-ISE-GAT
Conti..
Problem 1: Let the data points be
2
6
and
1
7
. Apply PCA and find the transformed data. Again, apply the inverse
and prove that PCA works.
Solution: One can combine two vectors into a matrix as follows:
The mean vector can be computed as follows:
𝜇 =
2 + 1
2
6 + 7
2
=
1.5
6.5
As part of PCA, the mean must be subtracted from the data to get the adjusted data:
𝑥1=
2 − 1.5
6 − 6.5
=
0.5
−0.5
𝑥2=
1 − 1.5
7 − 6.5
=
−0.5
0.5
The covariance can be obtained as follows:
𝑚1= (𝑥1 − 𝜇) 𝑥1 − 𝜇 𝑇
=
0.5
−0.5
0.5 − 0.5 =
0.25 − 0.25
−0.25 0.25
𝑚2= (𝑥2 − 𝜇) 𝑥2 − 𝜇 𝑇 −0.5
0.5
−0.5 0.5 =
0.25 − 0.25
−0.25 0.25
m=(𝑚1+𝑚2) =
0.5 − 0.5
−0.5 0.5
24-05-2025 24
Dr. Shivashankar-ISE-GAT
Conti..
The final covariance matrix is obtained by adding these two matrices as:
C =
0.5 − 0.5
−0.5 0.5
λ is an eigen value for a matrix M if it is a solution of the characteristic equation |m – λI| = 0 =
0.5 − 0.5
−0.5 0.5
-
λ 0
0 λ)
= 0
0.5− λ − 0.5
−0.5 0.5 − λ)
= 0 → 0.5− λ 0.5− λ - −0.5 −0.5 = 0
(0.5)2
+ λ2
- 2[0.5 – λ] - (0.5)2
= 0, λ2
- λ = 0, λ (λ-1) = 0
Therefore, λ= 1,0
For λ = 0
0.5 − 0.5
−0.5 0.5
𝑥
𝑦
0.5x – 0.5y = 0
-0.5x + 0.5y = 0
X = 1, y =1 =
1
1
For λ = 1
−0.5 − 0.5
−0.5 − 0.5
𝑥
𝑦
X = -1, y = 1 =
−1
1
24-05-2025 25
Dr. Shivashankar-ISE-GAT
Conti..
Now from λ = 1, 0, Adjacent matrix
A =
−1 1
1 1
Transferred matix 𝐴𝑇 =
−1 1
1 1
Normalization factors: for λ = 1, −1 2 + 1 2 = 2
= 0, 1 2 + 1 2 = 2
Therefore, A =
−
1
2
1
2
1
2
1
2
Transferred data = A(𝑚1+ 𝑚2) =
−
1
2
1
2
1
2
1
2
0.5 − 0.5
−0.5 0.5
=
−
1
2
1
2
1
2
1
2
1
2
−
1
2
−
1
2
1
2
=
−
1
2
1
2
0 0
24-05-2025 26
Dr. Shivashankar-ISE-GAT
Conti..
One can check that the PCA matrix A is orthogonal. A matrix is orthogonal is 𝐴−1= A and 𝐴𝐴−1= 1
𝐴𝐴𝑇 =
−
1
2
1
2
1
2
1
2
−
1
2
1
2
1
2
1
2
=
1 0
0 1
The transformed matrix y is given as:
Y=A (x-m)
Recollect that (x-m) is the adjusted matrix.
Y=A (x-m) =
−
1
2
1
2
1
2
1
2
0.5 − 0.5
−0.5 0.5
=
−
1
2
1
2
1
2
1
2
1
2
−
1
2
−
1
2
1
2
=
−
1
2
1
2
0 0
One can check the original matrix can be retrieved from this matrix as:
X= 𝐴𝑇𝑦 + 𝑚 =
−
1
2
1
2
1
2
1
2
−
1
2
1
2
0 0
+
1.5
6.5
=
2 1
6 7
Therefore, one can infer the original is obtained without any loss of information.
24-05-2025 27
Dr. Shivashankar-ISE-GAT
Conti..
Problem 2: Consider the two dimensional patterns (2, 1), (3, 5), (4, 3), (5, 6), (6, 7), (7, 8) or
(
2
1
3
5
4
3
5
6
6
7
7
8
), Compute the principal component using PCA Algorithm.
Solution: Mean vector, µ = ((2 + 3 + 4 + 5 + 6 + 7) / 6, (1 + 5 + 3 + 6 + 7 + 8) / 6) = (4.5, 5)
Subtract mean vector (µ) from the given feature vectors.
x1 – µ = (2 – 4.5, 1 – 5) = (-2.5, -4)
x2 – µ = (3 – 4.5, 5 – 5) = (-1.5, 0)
x3 – µ = (4 – 4.5, 3 – 5) = (-0.5, -2)
x4 – µ = (5 – 4.5, 6 – 5) = (0.5, 1)
x5 – µ = (6 – 4.5, 7 – 5) = (1.5, 2)
x6 – µ = (7 – 4.5, 8 – 5) = (2.5, 3)
Feature vectors (xi) after subtracting mean vector (µ) are-
24-05-2025 28
Dr. Shivashankar-ISE-GAT
Conti…
Feature vectors (xi) after subtracting mean vector (µ) are-
−2.5
−4
−1.5
0
−0.5
−2
0.5
1
1.5
2
2.5
3
𝑚1 = (𝑥1 − 𝜇) 𝑥1 − 𝜇 𝑇
=
−2.5
−4
−2.5 − 4 =
6.25 10
10 16
𝑚2 =
2.25 0
0 0
𝑚3 =
0.25 1
1 4
𝑚4 =
0.25 0.5
0.5 1
𝑚5 =
2.25 3
3 4
𝑚6 =
6.25 7.5
7.5 9
Covariance, C= m =
17.5 22
22 34
24-05-2025 29
Dr. Shivashankar-ISE-GAT
Conti..
• Calculate the eigen values and eigen vectors of the covariance matrix.
• λ is an eigen value for a matrix M if it is a solution of the characteristic
equation |M – λI| = 0.
So, we have-
2.92 3.67
3.67 5.67
-
λ 0
0 λ)
= 0
17.5 22
22 34
From here,
(17.5 – λ)(34 – λ) – (22 x 22) = 0
24-05-2025 30
Dr. Shivashankar-ISE-GAT
Basic Learning Theory
Design of Learning System
In machine learning, a learning system is a framework that allows machines to learn from data,
identify patterns, and make decisions with minimal human intervention, improving their
performance and accuracy over time.
A system that is built around a learning algorithm is called a learning system.
The design of systems focuses on these steps:
1. Choosing a training experience
2. Choosing a target function
3. Representation of a target function
4. Function approximation Training Experience Let us consider designing of a chess game.
24-05-2025 31
Dr. Shivashankar-ISE-GAT
Conti..
Training Experience
• Machine learning algorithms are trained on datasets, which provide examples of inputs and outputs. The algorithm
uses these examples to identify patterns and relationships in the data.
• It refers to the process of a machine learning algorithm learning from data to make predictions or
decisions. This involves exposing the algorithm to a dataset, allowing it to identify patterns and adjust
its parameters to improve its performance on future, unseen data.
• Example: designing of a chess game.
• If the training samples and testing samples have the same distribution, the results would be good.
Determine the Target Function
• In machine learning, the "target function" is the relationship a model aims to learn and predict,
mapping input variables (features) to an output variable.
• The goal is to approximate this function from training data and use it to make predictions on new
data.
• If x and y are variables, the target function: y = f(x)
• Example:
• Imagine you want to predict house prices (Y) based on features like size (X1), location (X2), and age
(X3). The target function would be the relationship between these features and the house price.
24-05-2025 32
Dr. Shivashankar-ISE-GAT
Conti..
Determine the Target Function Representation
The representation of knowledge may be a table, collection of rules or a neural network.
The linear combination of these factors can be coined as:
V=𝑤0 + 𝑤1𝑥1 + 𝑤2𝑥2+………+𝑤𝑛𝑥𝑛
where, 𝑥1, 𝑥2 and 𝑥3represent different board features and 𝑤0, 𝑤1, 𝑤2 and 𝑤3 represent weights.
Choosing an Approximation Algorithm for the Target Function
The focus is to choose weights and fit the given training samples effectively. The aim is to reduce the error
given as:
𝐸 ≡ ෍
𝑇𝑟𝑎𝑖𝑛𝑖𝑛𝑔
𝑆𝑎𝑚𝑝𝑙𝑒𝑠
𝑉𝑡𝑟𝑎𝑖𝑛 𝑏 − ෠
𝑉(𝑏)
2
b is the sample and ෠
𝑉is the predicted hypothesis.
The approximation is carried out as:
• Computing the error as the difference between trained and expected hypothesis.
Let error be error(b).
• Then, for every board feature 𝑥𝑖, the weights are updated as: 𝑤𝑖 = 𝑤𝑖 + μ x error(b) x 𝑥𝑖
Here, μ is the constant that moderates the size of the weight update.
24-05-2025 33
Dr. Shivashankar-ISE-GAT
INTRODUCTION TO CONCEPT LEARNING
• The process where a machine learns a general rule or function from a set of specific examples or data
points, enabling it to recognize and classify new, unseen instances.
• It is a learning strategy of acquiring abstract knowledge or inferring a general concept or deriving a
category from the given training samples.
• It is a process of abstraction and generalization from the data.
• Concept learning helps to classify an object that has a set of common, relevant features.
• Thus, it helps a learner compare and contrast categories based on the similarity and association of
positive and negative instances in the training data to classify an object.
• The learner tries to simplify by observing the common features from the training samples and then apply
this simplified model to the future samples.
• This task is also known as learning from experience.
• Each concept or category obtained by learning is a Boolean valued function which takes true or false
value.
• This way of learning categories for object and to recognize new instances of those categories is called as
concept learning.
• It is formally defined as inferring a Boolean valued function by processing training instances.
24-05-2025 34
Dr. Shivashankar-ISE-GAT
Conti..
Concept learning requires three things:
1. Input: Training dataset which is a set of training instances, each labeled
with the name of a concept or category to which it belongs. Use this past
experience to train and build the model.
2. Output: Target concept or Target function. It is a mapping function f(x)
from input x to output y. It is to determine the specific features or
common features to identify an object. In other words, it is to find the
hypothesis to determine the target concept. For e.g., the specific set of
features to identify an elephant from all animals.
3. Test: New instances to test the learned model.
Formally, Concept learning is defined as–"Given a set of hypotheses, the
learner searches through the hypothesis space to identify the best
hypothesis that matches the target concept".
24-05-2025 35
Dr. Shivashankar-ISE-GAT
Conti..
Representation of a Hypothesis
• A hypothesis ‘h’ approximates a target function ‘f ’ to represent the relationship
between the independent attributes and the dependent attribute of the training
instances.
• The hypothesis is the predicted approximate model that best maps the inputs to
outputs.
• Each hypothesis is represented as a conjunction of attribute conditions in the
antecedent part. For example, (Tail = Short) ^(Color = Black)….
• The set of hypothesis in the search space is called as hypotheses.
24-05-2025 36
Dr. Shivashankar-ISE-GAT
Conti..
Hypothesis Space
• Hypothesis space is the set of all possible hypotheses that approximates the target function f.
• The set of all possible approximations of the target function can be defined as hypothesis space.
• From this set of hypotheses in the hypothesis space, a machine learning algorithm would
determine the best possible hypothesis that would best describe the target function or best fit
the outputs.
• For example, a regression algorithm represents the hypothesis space as a linear function
whereas a decision tree algorithm represents the hypothesis space as a tree.
• The set of hypotheses that can be generated by a learning algorithm can be further reduced by
specifying a language bias.
• The subset of hypothesis space that is consistent with all-observed training instances is called as
Version Space.
• Version space represents the only hypotheses that are used for the classification.
• Example:
• Horns - Yes, No Tail - Long, Short Tusks - Yes, No
• Paws - Yes, No Fur - Yes, No Color - Brown, Black,
• White Hooves - Yes, No Size - Medium, Big
24-05-2025 37
Dr. Shivashankar-ISE-GAT
Conti..
Hypothesis Space Search by Find-S Algorithm
• The find-S algorithm is a basic concept learning algorithm in machine learning.
• The find-S algorithm finds the most specific hypothesis that fits all the positive examples
• Thus, this algorithm considers only the positive instances and eliminates negative instances while generating the hypothesis.
• It initially starts with the most specific hypothesis.
• Input: Positive instances in the Training dataset
• Output: Hypothesis ‘h’ 1.
1. Initialize ‘h’ to the most specific hypothesis.
h = <Ψ, Ψ, Ψ, Ψ, Ψ,,……..>
2. Generalize the initial hypothesis for the first positive instance [Since ‘h’ is more specific].
3. For each subsequent instances:
If it is a positive instance,
Check for each attribute value in the instance with the hypothesis ‘h’.
If the attribute value is the same as the hypothesis value, then do nothing,
Else if the attribute value is different than the hypothesis value, change it to ‘?’ in ‘h’.
Else if it is a negative instance,
Ignore it.
24-05-2025 38
Dr. Shivashankar-ISE-GAT
Conti..
3.4: Consider the training dataset of 4 instances shown in Table 3.2. It contains the details of the
performance of students and their likelihood of getting a job offer or not in their final semester. Apply the
Find-S algorithm.
Solution:
Step 1: Initialize ‘h’ to the most specific hypothesis. There are 6 attributes, so for each attribute, we initially
fill ‘j’ in the initial hypothesis ‘h’. h = < <Ψ, Ψ, Ψ, Ψ, Ψ, Ψ>
Step 2: Generalize the initial hypothesis for the first positive instance. I1 is a positive instance, so generalize
the most specific hypothesis ‘h’ to include this positive instance. Hence,
I1 : ≥9 Yes Excellent Good Fast Yes Positive instance
h = < ≥9 Yes Excellent Good Fast Yes>
24-05-2025 39
Dr. Shivashankar-ISE-GAT
CGPA Instructiveness Practical
knowledge
Communicati
on skill
Logical thinking Interest Job offer
≥ 9 Yes Excellent Good Fast Yes Yes
≥ 9 Yes Good Good Fast Yes Yes
≥ 8 No Good Good Fast No No
≥ 9 Yes Good Good Slow No Yes
Conti..
Step 3: Scan the next instance I2, since I2 is a positive instance. Generalize ‘h’ to include
positive instance I2. For each of the non-matching attribute value in ‘h’ put a ‘?’ to include
this positive instance. The third attribute value is mismatching in ‘h’ with I2, so put a ‘?’.
I2: ≥9 Yes Good Good Fast Yes Positive instance
h = < ≥9 Yes? Good Fast Yes>
Now, scan I3. Since it is a negative instance, ignore it. Hence, the hypothesis remains the
same without any change after scanning I3.
I3: ≥8 No Good Good Fast No Negative instance
h = < ≥9 Yes? Good Fast Yes>
Now scan I4. Since it is a positive instance, check for mismatch in the hypothesis ‘h’ with
I4. The 5th and 6th attribute value are mismatching, so add ‘?’ to those attributes in ’h’.
I4: ≥9 Yes Good Good Slow No Positive instance
h = < ≥9 Yes? Good? ?>
Now, the final hypothesis generated with Find-S algorithm is:
h = < ≥9 Yes? Good? ?>
It includes all positive instances and obviously ignores any negative instance.
24-05-2025 40
Dr. Shivashankar-ISE-GAT
Conti..
3.6: Consider the training dataset of 4 instances shown in Table 3.6. It contains the details of the weather
conditions to paly Football. Apply the Find-S algorithm.
• Solution: Initialize h to most specific hypothesis in H
• H0 = <Ψ, Ψ, Ψ, Ψ, Ψ, Ψ>
I1: <Sunny, Warm, Normal, Strong, Warm, same>
Iteration 1: h1 = <Sunny, Warm, Normal, Strong, Warm, same>
24-05-2025 41
Dr. Shivashankar-ISE-GAT
Example Sky AirTemp Humidity Wind Water Forecast EnjoySpo
rt
1 Sunny Warm Normal Strong Warm Same Yes
2 Sunny Warm High Strong Warm Same Yes
3 Rainy Cold High Strong Warm Change No
4 sunny Warm High Strong Cool Change Yes
Conti..
h1: <Sunny, Warm, Normal, Strong, Warm, same>
Iteration 2: I2 = <Sunny, Warm, High, Strong, Warm, same>
h2: <Sunny, Warm, ?, Strong, Warm, same>
Iteration 3 (Rainy): Ignore h3: <Sunny, Warm, ?, Strong, Warm, same>
h3: <Sunny, Warm, ?, Strong, Warm, same>
Iteration 4:
I4 = <Sunny, Warm, High, Strong, Cool, Change>
Step 3:
Output h4: <Sunny, Warm, ?, Strong, ?, ?>
24-05-2025 42
Dr. Shivashankar-ISE-GAT
Conti..
3.6: Consider the training dataset of 4 instances shown in Table 3.6. It contains the details
of the weather conditions to paly tennis. Apply the Find-S algorithm.
24-05-2025 43
Dr. Shivashankar-ISE-GAT
Candidate Elimination Algorithm
Version space learning is to generate all consistent hypotheses around.
This algorithm computes the version space by the combination of the two cases namely,
• Specific to General learning – Generalize S to include the positive example
• General to Specific learning – Specialize G to exclude the negative example
Candidate Elimination Algorithm:
Input: Set of instances in the Training dataset
Output: Hypothesis G and S
1. Initialize G, to the maximally general hypotheses.
2. Initialize S, to the maximally specific hypotheses.
• Generalize the initial hypothesis for the first positive instance.
3. For each subsequent new training instance,
• If the instance is positive,
➢ Generalize S to include the positive instance,
➢ Check the attribute value of the positive instance and S,
➢ If the attribute value of positive instance and S are different, fill that field value with ‘?’.
➢ If the attribute value of positive instance and S are same, then do no change.
24-05-2025 44
Dr. Shivashankar-ISE-GAT
Candidate Elimination Algorithm
▪ Prune G to exclude all inconsistent hypotheses in G with the positive instance.
• If the instance is negative,
▪ Specialize G to exclude the negative instance,
➢ Add to G all minimal specializations to exclude the negative example and be consistent
with S.
• If the attribute value of S and the negative instance are different, then fill that attribute
value with S value.
• If the attribute value of S and negative instance are same, no need to update ‘G’ and fill
that attribute value with ‘?’.
ο Remove from S all inconsistent hypotheses with the negative instance.
24-05-2025 45
Dr. Shivashankar-ISE-GAT
Conti..
3.4: Consider the training dataset of 4 instances shown in Table 3.2. It contains the details of the performance of
students and their likelihood of getting a job offer or not in their final semester. Apply the Candidate Elimination
algorithm.
Solution: Step 1: Initialize ‘G’ boundary to the maximally general hypotheses, G = < ? ? ? ? ? ?>
Step 2: Initialize ‘S’ boundary to the maximally S = < <Ψ, Ψ, Ψ, Ψ, Ψ, Ψ>
Step 2: Generalize the initial hypothesis for the first positive instance. instance. I1 is a positive instance; so
generalize the most specific hypothesis ‘S’ to include this positive instance. Hence,
Sridhar, S; Vijayalakshmi, M. Machine Learning (p. 92). Kindle Edition.
I1 : ≥9 Yes Excellent Good Fast Yes Positive instance
S1 = < ≥9 Yes Excellent Good Fast Yes>
G1 = < ? ? ? ? ? ?>
24-05-2025 46
Dr. Shivashankar-ISE-GAT
CGPA Instructiveness Practical
knowledge
Communicati
on skill
Logical thinking Interest Job offer
≥ 9 Yes Excellent Good Fast Yes Yes
≥ 9 Yes Good Good Fast Yes Yes
≥ 8 No Good Good Fast No No
≥ 9 Yes Good Good Slow No Yes
Conti..
Step 3: Iteration 1
The third attribute value is mismatching in ‘S1’ with I2, so put a ‘?’.
I2: ≥9 Yes Good Good Fast Yes Positive instance
S2= < ≥9 ? Good Fast Yes>
Since G1 is consistent with this positive instance, there is no change.
The resulting G2 is, G2 = <? ? ? ? ? ?>
Iteration 2 Now Scan I3,
I3: ≥8 No Good Good Fast No Negative instance
There is no inconsistent hypothesis in S2 with the negative instance, hence S3 remains the same.
G3 = < ≥9 ? ? ? ? ?>
<? Yes ? ? ? ?>
< ? ? ? ? ? Yes>
S3 = < ≥9 Yes ? Good Fast Yes>
Iteration 3
Now Scan I4. Since it is a positive instance, check for mismatch in the hypothesis ‘S3’ with I4. The 5th and 6th attribute
value are mismatching, so add ‘?’ to those attributes in ‘S4’.
I4: ≥9 Yes Good Good Slow No Positive instance
S4 = < ≥9 Yes ? Good ? ?>
24-05-2025 47
Dr. Shivashankar-ISE-GAT
Candidate Elimination Algorithm
Prune G3 to exclude all inconsistent hypotheses with the positive instance I4.
G3 = < ≥9 ? ? ? ? ?>
< ? Yes ? ? ? ?>
< ? ? ? ? ? Yes> Inconsistent
Since the third hypothesis in G3 is inconsistent with this positive instance, remove the third one.
The resulting G4 is,
G4 = < ≥9 ? ? ? ? ?>
< ? Yes ? ? ? ?>
Using the two boundary sets, S4 and G4, the version space is converged to contain the set of
consistent hypotheses.
The final version space is,
< ≥9 Yes ? ? ? ?>
< ≥9 ? ? Good ? ?>
< ? Yes ? Good ? ?>
Thus, the algorithm finds the version space to contain only those hypotheses that are most
general and most specific. The diagrammatic
24-05-2025 48
Dr. Shivashankar-ISE-GAT
The diagrammatic representation of deriving the version space is shown
S
S1
S2
S3
S4
Vesrion Space
G4:
G3:
G2:
G1
G:
24-05-2025 49
Dr. Shivashankar-ISE-GAT
Ψ, Ψ, Ψ, Ψ, Ψ, Ψ
≥ 9 Yes Exc Good Fast Yes
≥ 9 Yes ? Good Fast Yes
≥ 9 Yes ? Good Fast Yes
≥ 9 Yes ? Good ? ?
? Yes ? Good ? ?
≥ 9 ? ? Good ? ?
≥ 9 Yes ? ? ? ?
? Yes ? ? ? ?
≥ 9 ? ? ? ? ?
≥ 9 ? ? ? ? ?
≥ 9 ? ? ? ? ? ? ? ? ? ? Yes
≥ 9 ? ? ? ? ?
≥ 9 ? ? ? ? ?
≥ 9 ? ? ? ? ?
Conti…
Problem 2: Generate consistent hypotheses for the following training datasets using Candidate Elimination algorithm.
Table 2: “ Enjoy Sport”
24-05-2025 50
Dr. Shivashankar-ISE-GAT
Example Sky AirTemp Humidity Wind Water Forecast
Enjoy
Sport
1 Sunny Warm Normal Strong Warm Same Yes
2 Sunny Warm High Strong Warm Same Yes
3 Rainy Cold High Strong Warm Change No
4 Sunny Warm High Strong Cool Change Yes

More Related Content

PDF
Regression lineaire simple
PDF
Module - 5 Machine Learning-22ISE62.pdf
PDF
Module - 4 Machine Learning -22ISE62.pdf
PDF
Machine Learning_2025_First Module_1.pdf
PDF
Dr. Shivu__Machine Learning-Module 3.pdf
PDF
Exposé segmentation
PPTX
PPTX
Clusters techniques
Regression lineaire simple
Module - 5 Machine Learning-22ISE62.pdf
Module - 4 Machine Learning -22ISE62.pdf
Machine Learning_2025_First Module_1.pdf
Dr. Shivu__Machine Learning-Module 3.pdf
Exposé segmentation
Clusters techniques

What's hot (20)

PPTX
K nearest neighbor
PPTX
Knn Algorithm presentation
PPT
Data Mining
PDF
Anomaly detection
PPT
k Nearest Neighbor
PPTX
Ml8 boosting and-stacking
PPTX
adversarial robustness lecture
PPTX
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
PPT
CS8091_BDA_Unit_II_Clustering
ODP
Introduction to Bayesian Statistics
PPTX
Anomaly Detection Technique
PDF
Data Visualization in Exploratory Data Analysis
PPTX
Machine Learning
PDF
Lecture 9 Perceptron
PDF
Default Credit Card Prediction
PPTX
Linear models and multiclass classification
PPTX
Data science Big Data
PDF
Ridge regression
K nearest neighbor
Knn Algorithm presentation
Data Mining
Anomaly detection
k Nearest Neighbor
Ml8 boosting and-stacking
adversarial robustness lecture
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
CS8091_BDA_Unit_II_Clustering
Introduction to Bayesian Statistics
Anomaly Detection Technique
Data Visualization in Exploratory Data Analysis
Machine Learning
Lecture 9 Perceptron
Default Credit Card Prediction
Linear models and multiclass classification
Data science Big Data
Ridge regression
Ad

Similar to Dr. Shivu___Machine Learning_Module 2pdf (20)

PPTX
MODULE-2.pptx machine learning notes for vtu 6th sem cse
PPTX
MODULE-3edited.pptx machine learning modulk
PPTX
data analysis
PPTX
UNIT 4.pptx
PPTX
GROUP_3_DATA_ANALYSIS presentations.pptx
PDF
Correlation and Regression
PDF
PPTX
Data Processing and Statistical Treatment.pptx
PPTX
QR II Lect 15 (Bivariate analysis and scatter plot, correlation).pptx
PDF
Lecture_note1.pdf
PPT
Econometrics
PPTX
Applied Stats for Real-Life Decisions.pptx
PDF
Applied Statistics In Business
PPTX
2.4 Scatterplots, correlation, and regression
PPTX
Chap2-Data.pptx. It is all about data in data mining.
PPTX
Unit 4_3 Correlation Regression.pptx
PDF
Correlation.pptx.pdf
PPTX
Advanced Excel, Day 4
PPTX
Lesson 27 using statistical techniques in analyzing data
PPTX
Data Processing and Statistical Treatment: Spreads and Correlation
MODULE-2.pptx machine learning notes for vtu 6th sem cse
MODULE-3edited.pptx machine learning modulk
data analysis
UNIT 4.pptx
GROUP_3_DATA_ANALYSIS presentations.pptx
Correlation and Regression
Data Processing and Statistical Treatment.pptx
QR II Lect 15 (Bivariate analysis and scatter plot, correlation).pptx
Lecture_note1.pdf
Econometrics
Applied Stats for Real-Life Decisions.pptx
Applied Statistics In Business
2.4 Scatterplots, correlation, and regression
Chap2-Data.pptx. It is all about data in data mining.
Unit 4_3 Correlation Regression.pptx
Correlation.pptx.pdf
Advanced Excel, Day 4
Lesson 27 using statistical techniques in analyzing data
Data Processing and Statistical Treatment: Spreads and Correlation
Ad

More from Dr. Shivashankar (20)

PDF
Dr Shivu_GAT_Computer Network_Module 5.pdf
PDF
Dr Shivu_GAT_Computer Network_22ISE52_Module 4.pdf
PDF
DrShivashankar_Computer Net_Module-3.pdf
PPTX
22ISE52_Computer Networks_Module _2.pptx
PDF
22ISE52_COMPUTER NETWORKS _Module 1+.pdf
PDF
5th Module_Machine Learning_Reinforc.pdf
PDF
Module 4_Machine Learning_Evaluating Hyp
PDF
Module 3_Machine Learning Bayesian Learn
PDF
Machine Learning- Perceptron_Backpropogation_Module 3.pdf
PDF
Machine Learning_SVM_KNN_K-MEANSModule 2.pdf
PDF
21 Scheme_21EC53_MODULE-5_CCN_Dr. ShivaS
PDF
21 SCHEME_21EC53_VTU_MODULE-4_COMPUTER COMMUNCATION NETWORK.pdf
PDF
21 Scheme_ MODULE-3_CCN.pdf
PDF
21_Scheme_MODULE-1_CCN.pdf
PDF
21 Scheme_MODULE-2_CCN.pdf
PDF
Network Security_Dr Shivashankar_Module 5.pdf
PDF
Wireless Cellular Communication_Module 3_Dr. Shivashankar.pdf
PDF
Wireless Cellular Communication_Mudule2_Dr.Shivashankar.pdf
PDF
Network Security_4th Module_Dr. Shivashankar
PDF
Network Security_3rd Module_Dr. Shivashankar
Dr Shivu_GAT_Computer Network_Module 5.pdf
Dr Shivu_GAT_Computer Network_22ISE52_Module 4.pdf
DrShivashankar_Computer Net_Module-3.pdf
22ISE52_Computer Networks_Module _2.pptx
22ISE52_COMPUTER NETWORKS _Module 1+.pdf
5th Module_Machine Learning_Reinforc.pdf
Module 4_Machine Learning_Evaluating Hyp
Module 3_Machine Learning Bayesian Learn
Machine Learning- Perceptron_Backpropogation_Module 3.pdf
Machine Learning_SVM_KNN_K-MEANSModule 2.pdf
21 Scheme_21EC53_MODULE-5_CCN_Dr. ShivaS
21 SCHEME_21EC53_VTU_MODULE-4_COMPUTER COMMUNCATION NETWORK.pdf
21 Scheme_ MODULE-3_CCN.pdf
21_Scheme_MODULE-1_CCN.pdf
21 Scheme_MODULE-2_CCN.pdf
Network Security_Dr Shivashankar_Module 5.pdf
Wireless Cellular Communication_Module 3_Dr. Shivashankar.pdf
Wireless Cellular Communication_Mudule2_Dr.Shivashankar.pdf
Network Security_4th Module_Dr. Shivashankar
Network Security_3rd Module_Dr. Shivashankar

Recently uploaded (20)

PDF
Well-logging-methods_new................
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Fundamentals of Mechanical Engineering.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
Artificial Intelligence
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Current and future trends in Computer Vision.pptx
PPTX
additive manufacturing of ss316l using mig welding
PPT
Project quality management in manufacturing
DOCX
573137875-Attendance-Management-System-original
PPTX
Safety Seminar civil to be ensured for safe working.
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPT
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Geodesy 1.pptx...............................................
Well-logging-methods_new................
III.4.1.2_The_Space_Environment.p pdffdf
UNIT 4 Total Quality Management .pptx
Fundamentals of Mechanical Engineering.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Artificial Intelligence
Foundation to blockchain - A guide to Blockchain Tech
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Current and future trends in Computer Vision.pptx
additive manufacturing of ss316l using mig welding
Project quality management in manufacturing
573137875-Attendance-Management-System-original
Safety Seminar civil to be ensured for safe working.
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
Embodied AI: Ushering in the Next Era of Intelligent Systems
Geodesy 1.pptx...............................................

Dr. Shivu___Machine Learning_Module 2pdf

  • 1. MACHINE LEARNING (22ISE62) Module-2 Dr. Shivashankar Professor Department of Information Science & Engineering GLOBAL ACADEMY OF TECHNOLOGY-Bengaluru 24-05-2025 1 GLOBAL ACADEMY OF TECHNOLOGY Ideal Homes Township, Rajarajeshwari Nagar, Bengaluru – 560 098 Department of Information Science & Engineering Dr. Shivashankar-ISE-GAT
  • 2. Module 2 - Understanding Data – 2 Bivariate Data • Bivariate analysis is one of the statistical analysis where two variables are observed. • One variable here is dependent (X) while the other is independent (Y). • Bivariate data can be used to determine whether or not two variables are related. • The aim of bivariate analysis is to find relationships among data. • The relationships can then be used in comparisons, finding causes, and in further explorations. • To do that, graphical display of the data is necessary. • One such graph method is called scatter plot. • Scatter plots are the graphs that present the relationship between two variables in a data-set. • It is a 2D graph showing the relationship between two variables. • It is useful in exploratory data before calculating a correlation coefficient or fitting regression curve. 24-05-2025 2 Dr. Shivashankar-ISE-GAT
  • 3. Conti.. Temperature (in centigrade) Sales of Sweaters (in thousands) 5 300 12 250 15 200 20 110 23 45 27 10 35 5 24-05-2025 3 Dr. Shivashankar-ISE-GAT 300 250 200 110 45 10 5 -100 -50 0 50 100 150 200 250 300 350 0 5 10 15 20 25 30 35 40 Sales of Sweaters Temparature Sales of Sweaters (in thousands) Figure 2.11: Scatter Plot Line graphs are similar to scatter plots. 300 250 200 110 45 10 5 0 50 100 150 200 250 300 350 1 2 3 4 5 6 7 sales of Sweaters Temparature Sales of Sweaters (in thousands) Figure 2.12: Line Chart Table 2.1: Temperature in a Shop and Sales Data
  • 4. Bivariate Statistics • Bivariate analysis is stated to be an analysis of any concurrent relation between two variables or attributes. • Example: examples: student's study time vs. their exam scores, ice cream sales vs. temperature, height vs. weight, income vs. years of education, and patient's BMI vs. blood pressure. • Covariance and Correlation are methods of bivariate statistics. • Covariance is a measure of joint probability of random variables, say X and Y. • It is defined as covariance(X, Y) or COV(X, Y) and is used to measure the variance between two dimensions. • The formula for finding co-variance for specific x, and y are: 𝐶𝑂𝑉(𝑋, 𝑌) = 1 𝑁 ෍ 𝑖=1 𝑁 𝑥𝑖 − 𝐸(𝑋) 𝑦𝑖 − 𝐸(𝑌) Here, 𝑥𝑖and 𝑦𝑖are data values from X and Y. E(X) and E(Y) are the mean values of 𝑥𝑖 and 𝑦𝑖. N is the number of given data. Also, the COV(X, Y) is same as COV(Y, X). 24-05-2025 4 Dr. Shivashankar-ISE-GAT
  • 5. Bivariate Statistics Problem 1: Find the covariance of data X = {1, 2, 3, 4, 5} and Y = {1, 4, 9, 16, 25}. Solution: Mean(X) = E(X) = 15 5 = 3 Mean(Y) = E(Y) = 55 5 = 11 𝐶𝑂𝑉(𝑋, 𝑌) = 1 𝑁 ෍ 𝑖=1 𝑁 𝑥𝑖 − 𝐸(𝑋) 𝑦𝑖 − 𝐸(𝑌) = 1 − 3 1 − 11 + 2 − 3 4 − 11 + 3 − 3 9 − 11 + 4 − 3 16 − 11 + (5 − 3)(25 − 11) 5 = 12 The covariance between X and Y is 12. 24-05-2025 5 Dr. Shivashankar-ISE-GAT
  • 6. Bivariate Statistics Problem 2: Find the covariance between X and Y for the following data: Solution: 24-05-2025 6 Dr. Shivashankar-ISE-GAT X 3 4 5 8 7 9 6 2 1 Y 4 3 4 7 8 7 6 3 2
  • 7. Correlation Correlation refers to a process for establishing the relationships between two variables. The correlation coefficient is a statistical measure of the strength of a linear relationship between two variables. Its values can range from -1 to 1. The sign is more important than the actual value. 1. If the value is positive, it indicates that the dimensions increase together. 2. If the value is negative, it indicates that while one-dimension increases, the other dimension decreases. 3. If the value is zero, then it indicates that both the dimensions are independent of each other. If the given attributes are X = (𝑥1, 𝑥2, 𝑥3, …, 𝑥𝑛) and Y = (𝑦1, 𝑦2, 𝑦3, …, 𝑦𝑛), then the Pearson correlation coefficient, that is denoted as r, is given as: r= 𝐶𝑂𝑉(𝑋,𝑌) 𝜎𝑥,𝜎𝑦 where, 𝜎𝑥, 𝜎𝑦 are the standard deviations of X and Y. 24-05-2025 7 Dr. Shivashankar-ISE-GAT
  • 8. Conti.. Problem 1: Find the correlation coefficient of data X = {1, 2, 3, 4, 5} and Y = {1, 4, 9, 16, 25}. Solution: Step 1: The mean values of X and Y Mean(X) = ത 𝑋= 15 5 = 3 Mean(Y) = = ത 𝑌 = 55 5 = 11 Step 2: Calculate the squared differences from the mean For X for Y (𝑋1 − ത 𝑋)2= 1 − 3 2=4 (𝑌1 − ത 𝑌)2= 1 − 11 2=100 (𝑋2 − ത 𝑋)2 = 2 − 3 2 =1 (𝑌2 − ത 𝑌)2 = 4 − 11 2 =49 (𝑋3 − ത 𝑋)2= 3 − 3 2=0 (𝑌3 − ത 𝑌)2= 9 − 11 2=4 (𝑋4 − ത 𝑋)2 = 4 − 3 2 =1 (𝑌4 − ത 𝑌)2 = 16 − 11 2 =25 (𝑋5 − ത 𝑋)2 = 5 − 3 2 =4 (𝑌5 − ത 𝑌)2 = 25 − 11 2 =196 Sum of squared differences for X: 10 Sum of squared differences for X: 374 24-05-2025 8 Dr. Shivashankar-ISE-GAT
  • 9. CONTI.. Step 3: Calculate the variance • The variance for each set is the average of these squared differences. For X: • Variance of X = 10 5 = 2 For Y: • Variance of Y= 374 5 = 74.8 Step 4: Calculate the standard deviation • The standard deviation is the square root of the variance. • For X: • 𝜎𝑋 = 2≈1.414 • For Y: • 𝜎𝑌 = 74.8≈8.6486 • Therefore, the correlation coefficient is given as ratio of covariance • 𝐶𝑂𝑉 𝑋, 𝑌 = 1 𝑁 σ𝑖=1 𝑁 𝑥𝑖 − 𝐸 𝑋 𝑦𝑖 − 𝐸 𝑌 = 1−3 1−11 + 2−3 4−11 + 3−3 9−11 + 4−3 16−11 + 5−3 25−11 5 = 12 • Therefore, correlation coefficient, r= 12 1.414+8.6486 = 0.984 24-05-2025 9 Dr. Shivashankar-ISE-GAT
  • 10. Conti.. Problem 1: Find the correlation coefficient of data X = {5,9,10,3,5,7} and Y = {6,11,6,4,6,9}. Solution: 24-05-2025 10 Dr. Shivashankar-ISE-GAT
  • 11. Multivariate Statistics • Multivariate statistics refers to methods that examine the simultaneous effect of multiple variables. • In machine learning, almost all datasets are multivariable. • Multivariate data is the analysis of more than two observable variables, and often, thousands of multiple measurements need to be conducted for one or more subjects. • The multivariate data is like bivariate data but may have more than two dependent variables. • Some of the multivariate analysis are regression analysis, principal component analysis, and path analysis. id Attribute-1 Attribute-2 Attribute-3 1 1 4 1 2 2 5 2 3 3 6 1 • The mean of multivariate data is a mean vector and the mean of the above three attributes is given as (2, 5, 1.33). • The variance of multivariate data becomes the covariance matrix. • The mean vector is called centroid and variance is called dispersion matrix. • Multivariate data has three or more variables. 24-05-2025 11 Dr. Shivashankar-ISE-GAT
  • 12. Heatmap • In machine learning, a heatmap is a data visualization technique that uses color-coding to represent the magnitude of individual values within a dataset, often displayed as a grid or matrix. • It helps to identify patterns, correlations, and anomalies within complex datasets by highlighting areas of significance. • It takes a matrix as input and colours it. • The darker colours indicate very large values and lighter colours indicate smaller values. • The advantage of this method is that humans perceive colours well. • So, by colour shaping, larger values can be perceived well. • For example, in vehicle traffic data, heavy traffic regions can be differentiated from low traffic regions through heatmap. 24-05-2025 12 Dr. Shivashankar-ISE-GAT Figure 2.3 : Grid with Heatmap Pattern
  • 13. Pairplot • A scatterplot matrix, is a data visualization tool that displays pairwise relationships between all variables in a dataset, helping to understand distributions and correlations at a glance. • Pairplot or scatter matrix is a data visual technique for multivariate data. • A scatter matrix consists of several pair-wise scatter plots of variables of the multivariate data. • All the results are presented in a matrix format. • By visual examination of the chart, one can easily find relationships among the variables such as correlation between the variables. • A random matrix of three columns is chosen and the relationships of the columns is plotted as a pairplot. 24-05-2025 13 Dr. Shivashankar-ISE-GAT Figure 1: PAIRPLOT VISUALIZATION
  • 14. Essential Mathematics for Multivariate Data • Machine learning involves many mathematical concepts from the domain of Linear algebra, Statistics, Probability and Information theory. • Linear algebra deals with linear equations, vectors, matrices, vector spaces and transformations. • These are the driving forces of machine learning and machine learning cannot exist without these data types. Linear Systems and Gaussian Elimination for Multivariate Data • A linear system of equations is a group of equations with unknown variables. Let Ax = y, then the solution x is given as: x = y/A = 𝐴−1 y This is true if y is not zero and A is not zero. The logic can be extended for N-set of equations with ‘n’ unknown variables. • It means if and y=(𝑦1, 𝑦2,……, 𝑦𝑛) • Then unknown variable x= y/A = 𝐴−1 y 24-05-2025 14 Dr. Shivashankar-ISE-GAT
  • 15. Conti.. For solving large number of system of equations, Gaussian elimination can be used. The procedure for applying Gaussian elimination is given as follows: 1. Write the given matrix. 2. Append vector y to the matrix A. This matrix is called augmentation matrix. 3. Keep the element 𝑎11as pivot and eliminate all 𝑎11 in second row using the matrix operation, 𝑅2− 𝑎21 𝑎11 , here 𝑅2 is the second row and 𝑎21 𝑎11 is called as multiplier. The same logic can be used to remove 𝑎12in all other equations. 4. Repeat the same logic and reduce it to reduced echelon form. Then, the unknown variable as: 𝑥𝑛= 𝑦𝑛𝑛 𝑥𝑛𝑛 5. Then, the remaining unknown variables can be found by back-substitution as: 𝑥𝑛−1 = 𝑦𝑛−1 − 𝑎𝑛−1 𝑥 𝑎𝑛 𝑎(𝑛−1)(𝑥−1) This part is called backward substitution. 24-05-2025 15 Dr. Shivashankar-ISE-GAT
  • 16. Bivariate Statistics Problem 4: Solve the following set of equations using Gaussian Elimination method. 2𝑥1 + 4𝑥2 = 6 4𝑥1 + 3𝑥2 = 7 Solution: 2 4 | 6 4 3 | 7 Apply the transformation by dividing the row 1 by 2 (R1/2). - 1 2 | 3 4 3 | 7 R2=R2-4R1 - 1 2 | 3 0 − 5 | − 5 R2=R2/-5 - 1 2 | 3 0 1 | 1 R1=R1-2R2 - 1 0 | 1 0 1 | 1 x1 = 1, x2 = 1 24-05-2025 16 Dr. Shivashankar-ISE-GAT
  • 17. Bivariate Statistics Problem 5: Solve the following set of equations using Gaussian Elimination method. 2x+y=-1 3x-5y= -21 solution: 1 0 | − 2 0 1 | 3 24-05-2025 17 Dr. Shivashankar-ISE-GAT
  • 18. Machine Learning and Importance of Probability and Statistics • Machine learning is linked with statistics and probability. • Like linear algebra, statistics is the heart of machine learning. • The importance of statistics needs to be stressed as without statistics; analysis of data is difficult. • Probability is especially important for machine learning. • In machine learning, probability is a fundamental concept that deals with the likelihood of events or outcomes. It's used to model uncertainty and make predictions, especially in algorithms that deal with probabilistic models like Naive Bayes. Probability Distributions • The mathematical function that gives the probabilities of occurrence of possible outcomes for an experiment. • In other words, distribution is a function that describes the relationship between the observations in a sample space. Probability distributions are of two types: 1. Discrete probability distribution 2. Continuous probability distribution 24-05-2025 18 Dr. Shivashankar-ISE-GAT
  • 19. FEATURE ENGINEERING AND DIMENSIONALITY REDUCTION TECHNIQUES • The process of selecting, transforming, and creating new features (or variables) from raw data to improve the performance of machine learning models. • It involves carefully preparing the input data so that machine learning algorithms can learn effectively and make accurate predictions. • Features are attributes. • Feature engineering is about determining the subset of features that form an important part of the input that improves the performance of the model, be it classification or any other model in machine learning. • Feature engineering deals with two problems – Feature Transformation and Feature Selection. • Feature transformation is extraction of features and creating new features that may be helpful in increasing performance. • For example, the height and weight may give a new attribute called Body Mass Index (BMI). • Feature subset selection is another important aspect of feature engineering that focuses on selection of features to reduce the time but not at the cost of reliability. The features can be removed based on two aspects: 1. Feature relevancy – Some features contribute more for classification than other features. For example, a mole on the face can help in face detection than common features like nose. 2. Feature redundancy – Some features are redundant. For example, when a database table has a field called Date of birth, then age field is not relevant as age can be computed easily from date of birth. This helps in removing the column age that leads to reduction of dimension one. 24-05-2025 19 Dr. Shivashankar-ISE-GAT
  • 20. conti.. 1 Stepwise Forward Selection: • This procedure starts with an empty set of attributes. • Every time, an attribute is tested for statistical significance for best quality and is added to the reduced set. This process is continued till a good reduced set of attributes is obtained. 2 Stepwise Backward Elimination: • This procedure starts with a complete set of attributes. • At every stage, the procedure removes the worst attribute from the set, leading to the reduced set. Combined Approach Both forward and reverse methods can be combined so that the procedure can add the best attribute and remove the worst attribute. 3 Principal Component Analysis • The idea of the principal component analysis (PCA) or KL transform is to transform a given set of measurements to a new set of features so that the features exhibit high information packing properties. This leads to a reduced and compact set of features. • Basically, this elimination is made possible because of the information redundancies. • This compact representation is of a reduced dimension. 24-05-2025 20 Dr. Shivashankar-ISE-GAT
  • 21. PCA Consider a group of random vectors of the form: 𝑥 = 𝑥1 𝑥2 𝑥3 . . 𝑥𝑛 The mean vector of the set of random vectors is defined as: 𝑚𝑥 = 𝐸 𝑥 The operator E refers to the expected value of the population. This is calculated theoretically using the probability density functions (PDF) of the elements 𝑥𝑖 and the joint probability density functions between the elements 𝑥𝑖 and 𝑥𝑗. From this, the covariance matrix can be calculated as: C = E{(x - 𝑚𝑥) 𝑥 − 𝑚𝑥 𝑇 } For M random vectors, when M is large enough, the mean vector and covariance matrix can be approximately calculated as: 𝑚𝑥 = 1 𝑁 ෍ 𝑘=1 𝑀 𝑥𝑘 𝐴 = 1 𝑁 ෍ 𝑘=1 𝑀 𝑥𝑘 𝑥𝑘 𝑇 − 𝑚𝑥𝑚𝑥 𝑇 24-05-2025 21 Dr. Shivashankar-ISE-GAT
  • 22. conti.. The mapping of the vectors x to y using the transformation can now be described as: y = A(x - 𝑚𝑥) This transform is also called as Karhunen-Loeve or Hoteling transform. The original vector x can now be reconstructed as follows: x = 𝐴𝑇 y + 𝑚𝑥 The goal of PCA is to reduce the set of attributes to a newer, smaller set that captures the variance of the data. The variance is captured by fewer components, which would give the same result as the original, with all the attributes. If K largest eigen values are used, the recovered information would be: 𝑥 = 𝐴𝑘 𝑇 𝑚𝑥 The advantages of PCA are immense. It reduces the attribute list by eliminating all irrelevant attributes. The PCA algorithm is as follows: 1. The target dataset x is obtained 2. The mean is subtracted from the dataset. Let the mean be m. Thus, the adjusted dataset is X – m. The objective of this process is to transform the dataset with zero mean. 3. The covariance of dataset x is obtained. Let it be C. 24-05-2025 22 Dr. Shivashankar-ISE-GAT
  • 23. conti.. 4. Eigen values and eigen vectors of the covariance matrix are calculated. 5. The eigen vector of the highest eigen value is the principal component of the dataset. The eigen values are arranged in a descending order. The feature vector is formed with these eigen vectors in its columns. Feature vector = {𝐸𝑖𝑔𝑒𝑛 𝑣𝑒𝑐𝑡𝑜𝑟1, 𝐸𝑖𝑔𝑒𝑛 𝑣𝑒𝑐𝑡𝑜𝑟2, 𝐸𝑖𝑔𝑒𝑛 𝑣𝑒𝑐𝑡𝑜𝑟3,…. 𝐸𝑖𝑔𝑒𝑛 𝑣𝑒𝑐𝑡𝑜𝑟𝑛} 6. Obtain the transpose of feature vector. Let it be A. 7. PCA transform is y = A × (x – m), where x is the input dataset, m is the mean, and A is the transpose of the feature vector. The original data can be retrieved using the formula given below: Original data (f) = { 𝐴 −1 × y} + m = { 𝐴 𝑇 × y} + m The new data is a dimensionally reduced matrix that represents the original data. Therefore, PCA is effective in removing the attributes that do not contribute. 24-05-2025 23 Dr. Shivashankar-ISE-GAT
  • 24. Conti.. Problem 1: Let the data points be 2 6 and 1 7 . Apply PCA and find the transformed data. Again, apply the inverse and prove that PCA works. Solution: One can combine two vectors into a matrix as follows: The mean vector can be computed as follows: 𝜇 = 2 + 1 2 6 + 7 2 = 1.5 6.5 As part of PCA, the mean must be subtracted from the data to get the adjusted data: 𝑥1= 2 − 1.5 6 − 6.5 = 0.5 −0.5 𝑥2= 1 − 1.5 7 − 6.5 = −0.5 0.5 The covariance can be obtained as follows: 𝑚1= (𝑥1 − 𝜇) 𝑥1 − 𝜇 𝑇 = 0.5 −0.5 0.5 − 0.5 = 0.25 − 0.25 −0.25 0.25 𝑚2= (𝑥2 − 𝜇) 𝑥2 − 𝜇 𝑇 −0.5 0.5 −0.5 0.5 = 0.25 − 0.25 −0.25 0.25 m=(𝑚1+𝑚2) = 0.5 − 0.5 −0.5 0.5 24-05-2025 24 Dr. Shivashankar-ISE-GAT
  • 25. Conti.. The final covariance matrix is obtained by adding these two matrices as: C = 0.5 − 0.5 −0.5 0.5 λ is an eigen value for a matrix M if it is a solution of the characteristic equation |m – λI| = 0 = 0.5 − 0.5 −0.5 0.5 - λ 0 0 λ) = 0 0.5− λ − 0.5 −0.5 0.5 − λ) = 0 → 0.5− λ 0.5− λ - −0.5 −0.5 = 0 (0.5)2 + λ2 - 2[0.5 – λ] - (0.5)2 = 0, λ2 - λ = 0, λ (λ-1) = 0 Therefore, λ= 1,0 For λ = 0 0.5 − 0.5 −0.5 0.5 𝑥 𝑦 0.5x – 0.5y = 0 -0.5x + 0.5y = 0 X = 1, y =1 = 1 1 For λ = 1 −0.5 − 0.5 −0.5 − 0.5 𝑥 𝑦 X = -1, y = 1 = −1 1 24-05-2025 25 Dr. Shivashankar-ISE-GAT
  • 26. Conti.. Now from λ = 1, 0, Adjacent matrix A = −1 1 1 1 Transferred matix 𝐴𝑇 = −1 1 1 1 Normalization factors: for λ = 1, −1 2 + 1 2 = 2 = 0, 1 2 + 1 2 = 2 Therefore, A = − 1 2 1 2 1 2 1 2 Transferred data = A(𝑚1+ 𝑚2) = − 1 2 1 2 1 2 1 2 0.5 − 0.5 −0.5 0.5 = − 1 2 1 2 1 2 1 2 1 2 − 1 2 − 1 2 1 2 = − 1 2 1 2 0 0 24-05-2025 26 Dr. Shivashankar-ISE-GAT
  • 27. Conti.. One can check that the PCA matrix A is orthogonal. A matrix is orthogonal is 𝐴−1= A and 𝐴𝐴−1= 1 𝐴𝐴𝑇 = − 1 2 1 2 1 2 1 2 − 1 2 1 2 1 2 1 2 = 1 0 0 1 The transformed matrix y is given as: Y=A (x-m) Recollect that (x-m) is the adjusted matrix. Y=A (x-m) = − 1 2 1 2 1 2 1 2 0.5 − 0.5 −0.5 0.5 = − 1 2 1 2 1 2 1 2 1 2 − 1 2 − 1 2 1 2 = − 1 2 1 2 0 0 One can check the original matrix can be retrieved from this matrix as: X= 𝐴𝑇𝑦 + 𝑚 = − 1 2 1 2 1 2 1 2 − 1 2 1 2 0 0 + 1.5 6.5 = 2 1 6 7 Therefore, one can infer the original is obtained without any loss of information. 24-05-2025 27 Dr. Shivashankar-ISE-GAT
  • 28. Conti.. Problem 2: Consider the two dimensional patterns (2, 1), (3, 5), (4, 3), (5, 6), (6, 7), (7, 8) or ( 2 1 3 5 4 3 5 6 6 7 7 8 ), Compute the principal component using PCA Algorithm. Solution: Mean vector, µ = ((2 + 3 + 4 + 5 + 6 + 7) / 6, (1 + 5 + 3 + 6 + 7 + 8) / 6) = (4.5, 5) Subtract mean vector (µ) from the given feature vectors. x1 – µ = (2 – 4.5, 1 – 5) = (-2.5, -4) x2 – µ = (3 – 4.5, 5 – 5) = (-1.5, 0) x3 – µ = (4 – 4.5, 3 – 5) = (-0.5, -2) x4 – µ = (5 – 4.5, 6 – 5) = (0.5, 1) x5 – µ = (6 – 4.5, 7 – 5) = (1.5, 2) x6 – µ = (7 – 4.5, 8 – 5) = (2.5, 3) Feature vectors (xi) after subtracting mean vector (µ) are- 24-05-2025 28 Dr. Shivashankar-ISE-GAT
  • 29. Conti… Feature vectors (xi) after subtracting mean vector (µ) are- −2.5 −4 −1.5 0 −0.5 −2 0.5 1 1.5 2 2.5 3 𝑚1 = (𝑥1 − 𝜇) 𝑥1 − 𝜇 𝑇 = −2.5 −4 −2.5 − 4 = 6.25 10 10 16 𝑚2 = 2.25 0 0 0 𝑚3 = 0.25 1 1 4 𝑚4 = 0.25 0.5 0.5 1 𝑚5 = 2.25 3 3 4 𝑚6 = 6.25 7.5 7.5 9 Covariance, C= m = 17.5 22 22 34 24-05-2025 29 Dr. Shivashankar-ISE-GAT
  • 30. Conti.. • Calculate the eigen values and eigen vectors of the covariance matrix. • λ is an eigen value for a matrix M if it is a solution of the characteristic equation |M – λI| = 0. So, we have- 2.92 3.67 3.67 5.67 - λ 0 0 λ) = 0 17.5 22 22 34 From here, (17.5 – λ)(34 – λ) – (22 x 22) = 0 24-05-2025 30 Dr. Shivashankar-ISE-GAT
  • 31. Basic Learning Theory Design of Learning System In machine learning, a learning system is a framework that allows machines to learn from data, identify patterns, and make decisions with minimal human intervention, improving their performance and accuracy over time. A system that is built around a learning algorithm is called a learning system. The design of systems focuses on these steps: 1. Choosing a training experience 2. Choosing a target function 3. Representation of a target function 4. Function approximation Training Experience Let us consider designing of a chess game. 24-05-2025 31 Dr. Shivashankar-ISE-GAT
  • 32. Conti.. Training Experience • Machine learning algorithms are trained on datasets, which provide examples of inputs and outputs. The algorithm uses these examples to identify patterns and relationships in the data. • It refers to the process of a machine learning algorithm learning from data to make predictions or decisions. This involves exposing the algorithm to a dataset, allowing it to identify patterns and adjust its parameters to improve its performance on future, unseen data. • Example: designing of a chess game. • If the training samples and testing samples have the same distribution, the results would be good. Determine the Target Function • In machine learning, the "target function" is the relationship a model aims to learn and predict, mapping input variables (features) to an output variable. • The goal is to approximate this function from training data and use it to make predictions on new data. • If x and y are variables, the target function: y = f(x) • Example: • Imagine you want to predict house prices (Y) based on features like size (X1), location (X2), and age (X3). The target function would be the relationship between these features and the house price. 24-05-2025 32 Dr. Shivashankar-ISE-GAT
  • 33. Conti.. Determine the Target Function Representation The representation of knowledge may be a table, collection of rules or a neural network. The linear combination of these factors can be coined as: V=𝑤0 + 𝑤1𝑥1 + 𝑤2𝑥2+………+𝑤𝑛𝑥𝑛 where, 𝑥1, 𝑥2 and 𝑥3represent different board features and 𝑤0, 𝑤1, 𝑤2 and 𝑤3 represent weights. Choosing an Approximation Algorithm for the Target Function The focus is to choose weights and fit the given training samples effectively. The aim is to reduce the error given as: 𝐸 ≡ ෍ 𝑇𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑆𝑎𝑚𝑝𝑙𝑒𝑠 𝑉𝑡𝑟𝑎𝑖𝑛 𝑏 − ෠ 𝑉(𝑏) 2 b is the sample and ෠ 𝑉is the predicted hypothesis. The approximation is carried out as: • Computing the error as the difference between trained and expected hypothesis. Let error be error(b). • Then, for every board feature 𝑥𝑖, the weights are updated as: 𝑤𝑖 = 𝑤𝑖 + μ x error(b) x 𝑥𝑖 Here, μ is the constant that moderates the size of the weight update. 24-05-2025 33 Dr. Shivashankar-ISE-GAT
  • 34. INTRODUCTION TO CONCEPT LEARNING • The process where a machine learns a general rule or function from a set of specific examples or data points, enabling it to recognize and classify new, unseen instances. • It is a learning strategy of acquiring abstract knowledge or inferring a general concept or deriving a category from the given training samples. • It is a process of abstraction and generalization from the data. • Concept learning helps to classify an object that has a set of common, relevant features. • Thus, it helps a learner compare and contrast categories based on the similarity and association of positive and negative instances in the training data to classify an object. • The learner tries to simplify by observing the common features from the training samples and then apply this simplified model to the future samples. • This task is also known as learning from experience. • Each concept or category obtained by learning is a Boolean valued function which takes true or false value. • This way of learning categories for object and to recognize new instances of those categories is called as concept learning. • It is formally defined as inferring a Boolean valued function by processing training instances. 24-05-2025 34 Dr. Shivashankar-ISE-GAT
  • 35. Conti.. Concept learning requires three things: 1. Input: Training dataset which is a set of training instances, each labeled with the name of a concept or category to which it belongs. Use this past experience to train and build the model. 2. Output: Target concept or Target function. It is a mapping function f(x) from input x to output y. It is to determine the specific features or common features to identify an object. In other words, it is to find the hypothesis to determine the target concept. For e.g., the specific set of features to identify an elephant from all animals. 3. Test: New instances to test the learned model. Formally, Concept learning is defined as–"Given a set of hypotheses, the learner searches through the hypothesis space to identify the best hypothesis that matches the target concept". 24-05-2025 35 Dr. Shivashankar-ISE-GAT
  • 36. Conti.. Representation of a Hypothesis • A hypothesis ‘h’ approximates a target function ‘f ’ to represent the relationship between the independent attributes and the dependent attribute of the training instances. • The hypothesis is the predicted approximate model that best maps the inputs to outputs. • Each hypothesis is represented as a conjunction of attribute conditions in the antecedent part. For example, (Tail = Short) ^(Color = Black)…. • The set of hypothesis in the search space is called as hypotheses. 24-05-2025 36 Dr. Shivashankar-ISE-GAT
  • 37. Conti.. Hypothesis Space • Hypothesis space is the set of all possible hypotheses that approximates the target function f. • The set of all possible approximations of the target function can be defined as hypothesis space. • From this set of hypotheses in the hypothesis space, a machine learning algorithm would determine the best possible hypothesis that would best describe the target function or best fit the outputs. • For example, a regression algorithm represents the hypothesis space as a linear function whereas a decision tree algorithm represents the hypothesis space as a tree. • The set of hypotheses that can be generated by a learning algorithm can be further reduced by specifying a language bias. • The subset of hypothesis space that is consistent with all-observed training instances is called as Version Space. • Version space represents the only hypotheses that are used for the classification. • Example: • Horns - Yes, No Tail - Long, Short Tusks - Yes, No • Paws - Yes, No Fur - Yes, No Color - Brown, Black, • White Hooves - Yes, No Size - Medium, Big 24-05-2025 37 Dr. Shivashankar-ISE-GAT
  • 38. Conti.. Hypothesis Space Search by Find-S Algorithm • The find-S algorithm is a basic concept learning algorithm in machine learning. • The find-S algorithm finds the most specific hypothesis that fits all the positive examples • Thus, this algorithm considers only the positive instances and eliminates negative instances while generating the hypothesis. • It initially starts with the most specific hypothesis. • Input: Positive instances in the Training dataset • Output: Hypothesis ‘h’ 1. 1. Initialize ‘h’ to the most specific hypothesis. h = <Ψ, Ψ, Ψ, Ψ, Ψ,,……..> 2. Generalize the initial hypothesis for the first positive instance [Since ‘h’ is more specific]. 3. For each subsequent instances: If it is a positive instance, Check for each attribute value in the instance with the hypothesis ‘h’. If the attribute value is the same as the hypothesis value, then do nothing, Else if the attribute value is different than the hypothesis value, change it to ‘?’ in ‘h’. Else if it is a negative instance, Ignore it. 24-05-2025 38 Dr. Shivashankar-ISE-GAT
  • 39. Conti.. 3.4: Consider the training dataset of 4 instances shown in Table 3.2. It contains the details of the performance of students and their likelihood of getting a job offer or not in their final semester. Apply the Find-S algorithm. Solution: Step 1: Initialize ‘h’ to the most specific hypothesis. There are 6 attributes, so for each attribute, we initially fill ‘j’ in the initial hypothesis ‘h’. h = < <Ψ, Ψ, Ψ, Ψ, Ψ, Ψ> Step 2: Generalize the initial hypothesis for the first positive instance. I1 is a positive instance, so generalize the most specific hypothesis ‘h’ to include this positive instance. Hence, I1 : ≥9 Yes Excellent Good Fast Yes Positive instance h = < ≥9 Yes Excellent Good Fast Yes> 24-05-2025 39 Dr. Shivashankar-ISE-GAT CGPA Instructiveness Practical knowledge Communicati on skill Logical thinking Interest Job offer ≥ 9 Yes Excellent Good Fast Yes Yes ≥ 9 Yes Good Good Fast Yes Yes ≥ 8 No Good Good Fast No No ≥ 9 Yes Good Good Slow No Yes
  • 40. Conti.. Step 3: Scan the next instance I2, since I2 is a positive instance. Generalize ‘h’ to include positive instance I2. For each of the non-matching attribute value in ‘h’ put a ‘?’ to include this positive instance. The third attribute value is mismatching in ‘h’ with I2, so put a ‘?’. I2: ≥9 Yes Good Good Fast Yes Positive instance h = < ≥9 Yes? Good Fast Yes> Now, scan I3. Since it is a negative instance, ignore it. Hence, the hypothesis remains the same without any change after scanning I3. I3: ≥8 No Good Good Fast No Negative instance h = < ≥9 Yes? Good Fast Yes> Now scan I4. Since it is a positive instance, check for mismatch in the hypothesis ‘h’ with I4. The 5th and 6th attribute value are mismatching, so add ‘?’ to those attributes in ’h’. I4: ≥9 Yes Good Good Slow No Positive instance h = < ≥9 Yes? Good? ?> Now, the final hypothesis generated with Find-S algorithm is: h = < ≥9 Yes? Good? ?> It includes all positive instances and obviously ignores any negative instance. 24-05-2025 40 Dr. Shivashankar-ISE-GAT
  • 41. Conti.. 3.6: Consider the training dataset of 4 instances shown in Table 3.6. It contains the details of the weather conditions to paly Football. Apply the Find-S algorithm. • Solution: Initialize h to most specific hypothesis in H • H0 = <Ψ, Ψ, Ψ, Ψ, Ψ, Ψ> I1: <Sunny, Warm, Normal, Strong, Warm, same> Iteration 1: h1 = <Sunny, Warm, Normal, Strong, Warm, same> 24-05-2025 41 Dr. Shivashankar-ISE-GAT Example Sky AirTemp Humidity Wind Water Forecast EnjoySpo rt 1 Sunny Warm Normal Strong Warm Same Yes 2 Sunny Warm High Strong Warm Same Yes 3 Rainy Cold High Strong Warm Change No 4 sunny Warm High Strong Cool Change Yes
  • 42. Conti.. h1: <Sunny, Warm, Normal, Strong, Warm, same> Iteration 2: I2 = <Sunny, Warm, High, Strong, Warm, same> h2: <Sunny, Warm, ?, Strong, Warm, same> Iteration 3 (Rainy): Ignore h3: <Sunny, Warm, ?, Strong, Warm, same> h3: <Sunny, Warm, ?, Strong, Warm, same> Iteration 4: I4 = <Sunny, Warm, High, Strong, Cool, Change> Step 3: Output h4: <Sunny, Warm, ?, Strong, ?, ?> 24-05-2025 42 Dr. Shivashankar-ISE-GAT
  • 43. Conti.. 3.6: Consider the training dataset of 4 instances shown in Table 3.6. It contains the details of the weather conditions to paly tennis. Apply the Find-S algorithm. 24-05-2025 43 Dr. Shivashankar-ISE-GAT
  • 44. Candidate Elimination Algorithm Version space learning is to generate all consistent hypotheses around. This algorithm computes the version space by the combination of the two cases namely, • Specific to General learning – Generalize S to include the positive example • General to Specific learning – Specialize G to exclude the negative example Candidate Elimination Algorithm: Input: Set of instances in the Training dataset Output: Hypothesis G and S 1. Initialize G, to the maximally general hypotheses. 2. Initialize S, to the maximally specific hypotheses. • Generalize the initial hypothesis for the first positive instance. 3. For each subsequent new training instance, • If the instance is positive, ➢ Generalize S to include the positive instance, ➢ Check the attribute value of the positive instance and S, ➢ If the attribute value of positive instance and S are different, fill that field value with ‘?’. ➢ If the attribute value of positive instance and S are same, then do no change. 24-05-2025 44 Dr. Shivashankar-ISE-GAT
  • 45. Candidate Elimination Algorithm ▪ Prune G to exclude all inconsistent hypotheses in G with the positive instance. • If the instance is negative, ▪ Specialize G to exclude the negative instance, ➢ Add to G all minimal specializations to exclude the negative example and be consistent with S. • If the attribute value of S and the negative instance are different, then fill that attribute value with S value. • If the attribute value of S and negative instance are same, no need to update ‘G’ and fill that attribute value with ‘?’. ο Remove from S all inconsistent hypotheses with the negative instance. 24-05-2025 45 Dr. Shivashankar-ISE-GAT
  • 46. Conti.. 3.4: Consider the training dataset of 4 instances shown in Table 3.2. It contains the details of the performance of students and their likelihood of getting a job offer or not in their final semester. Apply the Candidate Elimination algorithm. Solution: Step 1: Initialize ‘G’ boundary to the maximally general hypotheses, G = < ? ? ? ? ? ?> Step 2: Initialize ‘S’ boundary to the maximally S = < <Ψ, Ψ, Ψ, Ψ, Ψ, Ψ> Step 2: Generalize the initial hypothesis for the first positive instance. instance. I1 is a positive instance; so generalize the most specific hypothesis ‘S’ to include this positive instance. Hence, Sridhar, S; Vijayalakshmi, M. Machine Learning (p. 92). Kindle Edition. I1 : ≥9 Yes Excellent Good Fast Yes Positive instance S1 = < ≥9 Yes Excellent Good Fast Yes> G1 = < ? ? ? ? ? ?> 24-05-2025 46 Dr. Shivashankar-ISE-GAT CGPA Instructiveness Practical knowledge Communicati on skill Logical thinking Interest Job offer ≥ 9 Yes Excellent Good Fast Yes Yes ≥ 9 Yes Good Good Fast Yes Yes ≥ 8 No Good Good Fast No No ≥ 9 Yes Good Good Slow No Yes
  • 47. Conti.. Step 3: Iteration 1 The third attribute value is mismatching in ‘S1’ with I2, so put a ‘?’. I2: ≥9 Yes Good Good Fast Yes Positive instance S2= < ≥9 ? Good Fast Yes> Since G1 is consistent with this positive instance, there is no change. The resulting G2 is, G2 = <? ? ? ? ? ?> Iteration 2 Now Scan I3, I3: ≥8 No Good Good Fast No Negative instance There is no inconsistent hypothesis in S2 with the negative instance, hence S3 remains the same. G3 = < ≥9 ? ? ? ? ?> <? Yes ? ? ? ?> < ? ? ? ? ? Yes> S3 = < ≥9 Yes ? Good Fast Yes> Iteration 3 Now Scan I4. Since it is a positive instance, check for mismatch in the hypothesis ‘S3’ with I4. The 5th and 6th attribute value are mismatching, so add ‘?’ to those attributes in ‘S4’. I4: ≥9 Yes Good Good Slow No Positive instance S4 = < ≥9 Yes ? Good ? ?> 24-05-2025 47 Dr. Shivashankar-ISE-GAT
  • 48. Candidate Elimination Algorithm Prune G3 to exclude all inconsistent hypotheses with the positive instance I4. G3 = < ≥9 ? ? ? ? ?> < ? Yes ? ? ? ?> < ? ? ? ? ? Yes> Inconsistent Since the third hypothesis in G3 is inconsistent with this positive instance, remove the third one. The resulting G4 is, G4 = < ≥9 ? ? ? ? ?> < ? Yes ? ? ? ?> Using the two boundary sets, S4 and G4, the version space is converged to contain the set of consistent hypotheses. The final version space is, < ≥9 Yes ? ? ? ?> < ≥9 ? ? Good ? ?> < ? Yes ? Good ? ?> Thus, the algorithm finds the version space to contain only those hypotheses that are most general and most specific. The diagrammatic 24-05-2025 48 Dr. Shivashankar-ISE-GAT
  • 49. The diagrammatic representation of deriving the version space is shown S S1 S2 S3 S4 Vesrion Space G4: G3: G2: G1 G: 24-05-2025 49 Dr. Shivashankar-ISE-GAT Ψ, Ψ, Ψ, Ψ, Ψ, Ψ ≥ 9 Yes Exc Good Fast Yes ≥ 9 Yes ? Good Fast Yes ≥ 9 Yes ? Good Fast Yes ≥ 9 Yes ? Good ? ? ? Yes ? Good ? ? ≥ 9 ? ? Good ? ? ≥ 9 Yes ? ? ? ? ? Yes ? ? ? ? ≥ 9 ? ? ? ? ? ≥ 9 ? ? ? ? ? ≥ 9 ? ? ? ? ? ? ? ? ? ? Yes ≥ 9 ? ? ? ? ? ≥ 9 ? ? ? ? ? ≥ 9 ? ? ? ? ?
  • 50. Conti… Problem 2: Generate consistent hypotheses for the following training datasets using Candidate Elimination algorithm. Table 2: “ Enjoy Sport” 24-05-2025 50 Dr. Shivashankar-ISE-GAT Example Sky AirTemp Humidity Wind Water Forecast Enjoy Sport 1 Sunny Warm Normal Strong Warm Same Yes 2 Sunny Warm High Strong Warm Same Yes 3 Rainy Cold High Strong Warm Change No 4 Sunny Warm High Strong Cool Change Yes