Machine-Learning-Unit2Machine Learning-Machine Learning--Regression.pdf

Machine Learning
REGRESSION
MR. U. A. NULI
COMPUTER SCIENCE AND ENGINEERING DEPARTMENT
TEXTILE AND ENGINEERING INSTITUTE, ICHALKARANJI

What is Regression?
Regression is a technique used to model and analyse the relationships between
variables and how they contribute and are related to producing a particular outcome
together.
Regression analysis is a form of predictive modelling technique which investigates the
relationship between a dependent (target) and independent variable(s) (predictor).
Regression analysis is a conceptually simple method for investigating functional
relationships among variables
Regression predict a real and continuous value y for a given set of input X (X = x1,x2,…)
Regression is a Supervised Learning Technique
2
Mr. U.A. Nuli, Asst.Professor, CSE dept, Textile and Engineering Institute, Ichalkaranji

Regression Fundamentals
The simplest case to examine is one in which a variable Y, referred to as the dependent
or target variable, may be related to one variable X, called an independent or
explanatory variable, predictor variables or simply a regressor.
In simplest terms, the purpose of regression is to try to find the best fit line or equation
that expresses the relationship between Y and X.
A simplest way to express the linear relation between Y and X is to use a line equation.
Y = W0 + W1*X
The relationship is expressed in the form of an equation or a model connecting the
response or dependent variable and one or more explanatory or predictor variables.
3

Regression Typical Examples
This technique is used for forecasting, time series modelling and finding
the relationship between the variables.
For example:
A relationship between rash driving and number of road accidents by a driver is best
studied through regression.
A real estate appraiser may wish to relate the sale price of a home from selected
physical characteristics of the building and taxes (local, school, county) paid on the
building.
4

Regression Applications
Predicting Stock prices
Forecast Sales of a month
Predict airfare
….
5

Regression model establish relation between Response/Dependent variable y with the
Independent /predictor variable x.
We can write the relationship using a hypothesis function h as
y = h(x)
Where h is called hypothesis function
Hypothesis function describes the relationship between x and y variables.
If the relationship is linear then the regression is called as Linear Regression
If the relationship is non-linear then the regression is called as Non-Linear Regression
Some time h(x) is also written as f(x)
h
x y
6

h(x) can be expresses in different way as:
h(x) = w0 + w1x -------------- 1
h(x) = w0 + w1x1 + w2x2 + w3x3 + …. -------2
h(x) = w0 + w1x2 ----------------- 3
h(x) = w0 + w1x1 + w2x2
2 ----------------- 4
Here w1,w2 are called as coefficients of regression or model parameters
x,x1,x2 are independent/predictor variables
7

8
Based on hypothesis function used regression can be categorized as:
Linear Regression:
Relation between independent and dependent variables is linear and usually expressed
By straight line equation
Example – equation 1 and 2
Simple Linear Regression:
There exists only one dependent variable and related to only one independent variable
For Example
y = h(x)
= w0 + w1x
Types of Regression:

https://in.mathworks.com/help/matlab/data_analysis/linear-regression.html
9

10

Multiple Linear Regression:
Most of time output Y can not be predicted by single independent variable but needs
multiple Independent variables.
The Regression that has one output variable and more than one input/independent
variables with Linear relationship between input and output is called as multiple linear
regression.
Example:
y = h(x)
= w0 + w1x1 + w2x2 + w3x3 + ….
Prediction of house price based on size of house, age of house, distance from the center
of city, etc.
11

Graph of more than two independent
Variable is difficult to plot
12

Non-linear Regression:
13
Non linear regression has a non linear relationship between independent variables and
dependent Variable.
Number of independent variables can be one, or more than one possible.
A straight line can not fit data properly hence a linear equation is not suitable, instead
Non linear regression is expressed by a polynomial, hence also called as polynomial
Regression.
Examples:
Y = h(x)
h(x) = w0 + w1x2 h(x) = w0 + w1ln(x) h(x) = w0 + w1ex
h(x) = w0 + w1x1 + w2x2
2 h(x) = w0 + w1sin(x)

14

15

Which Regression to Select?
16
Depends on number of independent variables and their relationship with dependent
variable in The data.
Single input variable and single output variable with linear relationship
– Simple Linear Regression
Multiple input variable and one out variable with linear relationship
– Multiple Linear Regression
Single/multiple input variable and one output variable with nonlinear relationship
- Nonlinear or Polynomial Regression

Assumption for Linear Regression:
17
Before analysing data using linear regression, it is necessary to make sure that the data
you want to analyse can actually be analysed using linear regression.
This can be ensured by following assumptions:
1. Variables used, should be measured at the continuous level(variables need to be continuous
variables).
2. There needs to be a linear relationship between the independent and dependent variables.
(check whether there exists a linear relationship using suitable statistical test (correlation
coefficient))
3. Little or no multi-collinearity.
multi-collinearity – one independent variable is co-related with other independent variable.

18
4. There should be no significant outliers.
An outlier is an observed data point that has a dependent variable value that is very different to
the value predicted by the regression equation.
As such, an outlier will be a point on a scatterplot that is (vertically) far away from the regression
line indicating that it has a large residual, as highlighted below:
https://statistics.laerd.com/spss-
tutorials/linear-regression-using-
spss-statistics.php

Correlation coefficient:
19
For a data set comprising n points of two variables x and y, the following
equation depicts the computation of covariance:

20
However, covariance can be a very large number. It is best to express it as a normalized
number between -1 and 1 to understand the relation between the quantities. This is
achieved by normalizing covariance with standard deviations of both the variables (sx
and sy).
This is called correlation coeffiient between x and y.
This is also called as Pearson correlation

21
Correlation measures the strength of linear dependence between X and Y and lies
between -1 and 1. The following graph gives you a visual understanding of how the
correlation impacts the linear dependence:

Simple Linear Regression:
22
Simple Linear Regression has only one independent variable and one dependent variable
House Size(feet2) -x House Price - y
1 200 250000
2 300 350000
3 400 450000
4 500 550000
5 600 650000
Training Dataset
Terminology:
n = Total number of training examples
ex: 5
x: input/independent/predictor variable
y: actual output variable
(x,y) : one training example
( x(i), y(i) ) Ith training example
Ex: x(1)= 200 , y(1)= 250000

Simple linear regression
23
Response or Target variable is defined as
= h(x) = w0+ w1x
Since there is possibility of difference between actual output value and
Predicted value, we can write actual output as
y = +e = w0+ w1x + e
e = y - w0+ w1x if e is negative, e = - y
= y -

24

Cost Function:
25
Objective: The error e ≈ 0 or difference between
predicted output value and actual output value
Should be nearly zero
Measure of how best the line fits to data, or how
best the hypothesis function predicts the
Output is specified by cost function.
Different values of the weights (w0, w1) gives us
different lines and our task is to find weights for
which we get best fit.

26

Cost Function
27
For linear regression, the most commonly used cost function is the Mean
Squared Error cost function.
It is the average over the various data points (xi, yi) of the squared error
between the predicted value (xi) and yi
, =
1
2
ℎ − h(xi) = w0+w1xi
This is also called as Residual sum of Square or RSS

Cost Function
28
Goal: To find values of w0 and w1 that will minimize J(w0 ,w1 )
Different values w0 ,w1 gives different lines fitting the data. These different lines will have different
Cost.

Cost Vs w0, w1
29

30

31

32

How to estimate parameters w0, w1
33
The Least Squares Approach
Using Normal Equations

34
The predicted output in simple linear regression is
= h(x) = w0+ w1x
The observed or actual output is y
Error e = -y
The Least square starts with a sum of error square as:
, =
1
2
ℎ −

35
1. Take partial derivatives of J with respect to w0 and w1 and equate it to
zero.
= = 0
= = 0
, =
1
2
ℎ −
1
2

36
From equation 1
= = 0
= ∑ ( ℎ − )
( )
= 0 ℎ( ) = +
= ∑ ( ℎ − ) = 0
= ∑ ( ℎ − ) = 0
= ∑ ( + − ) = 0 3

37
From equation 2
= = 0
= ∑ ( ℎ − )
( )
= 0 ℎ( ) = +
= ∑ ( ℎ − ) = 0
= ∑ ( ℎ − ) = 0
= ∑ ( + − ) = 0 4

38
These equations are called as normal Equations
∑ ( + − ) = 0
∑ ( + − ) = 0 5
6

39
∑ ( + − ) = 0
From First Normal Equation (equation no 5)
+ − = 0
= −
=
1
−
= −
7

40
From Second Normal Equation (equation no 6)
∑ ( + − ) = 0
∑ ( − ̅ + − ) = 0 Using eq. 7
∑ ( − ̅ + ( − )) = 0
∑ ( − ̅ ) + ∑ ( − ) = 0
∑ ( − ̅ ) = − ∑ ( − )
∑ − ̅ = ∑ ( − )

41
∑ − ̅ = ∑ ( − )
=
∑ ( )
∑ ̅
=
∑ ( )( ̅)
∑ ̅
Final equation is obtained from
PROBABILITY AND STATISTICS FOR COMPUTER SCIENTISTS SECOND EDITION by Michael Baron
CRC Press Page no. 366 equation 11.4 and 11.5

42
W0 =
∑ ∑
= − w1 ̅
W1 =
∑ ̅
∑ ( ̅)

How to estimate parameters w0, w1
43
Gradient Descent Algorithm

Optimization:
44
Optimization refers to the task of minimizing/maximizing an objective
function f(x) parameterized by x.
In machine/deep learning terminology, it’s the task of minimizing the
cost/loss function J(w) parameterized by the model’s parameters w ∈ Rd.

45
Optimization algorithms (in case of minimization) have one of the
following goals:
•Find the global minimum of the objective function. This is feasible if the
objective function is convex, i.e. any local minimum is a global
minimum.
•Find the lowest possible value of the objective function within its
neighbourhood. That’s usually the case if the objective function is not
convex as the case in most deep learning problems.

46
There are three kinds of optimization algorithms:
•Optimization algorithm that is not iterative and simply solves for one point.
•Optimization algorithm that is iterative in nature and converges to
acceptable solution regardless of the parameters initialization such as
gradient descent applied to regression.
•Optimization algorithm that is iterative in nature and applied to a set of
problems that have non-convex cost functions such as neural networks.
Therefore, parameters’ initialization plays a critical role in speeding up
convergence and achieving lower error rates.

47
Gradient Descent is the most common optimization algorithm in machine
learning and deep learning. It is used to find the values of function
parameters(coefficients)that minimizes cost function as much as possible.
It is a first-order optimization algorithm. This means it only takes into account
the first derivative when performing the updates on the parameters.
On each iteration, we update the parameters in the opposite direction of
the gradient of the objective function J(w) w.r.t the parameters where the
gradient gives the direction of the steepest ascent.
The size of the step we take on each iteration to reach the local minimum is
determined by the learning rate α. Therefore, we follow the direction of the
slope downhill until we reach a local minimum

48
Simplified example of Gradient Descent
Suppose you are at the top of a mountain, and you have to reach a lake which is at
the lowest point of the mountain (a.k.a valley). A twist is that you are blindfolded and
you have zero visibility to see where you are headed. So, what approach will you take
to reach the lake?
The best way is to check the ground
near you and observe where the
land tends to descend.
This will give an idea in what
direction you should take your first
step.
If you follow the descending path, it
is very likely you would reach the
lake
https://www.analyticsvidhya.com/blog/2017/03/introduction-to-gradient-descent-algorithm-along-its-variants/

49
Gradient Descent for Simple linear Regression
The predicted output in simple linear regression is
= h(x) = w0+ w1x
The observed or actual output is y
Error e = -y
The Cost function used is:
, =
1
2
ℎ −

50
The Graph of cost J verses W is as shown below:
w

51
General equation for Gradient Descent :
= − ∇
Here:
is cost function J(W0,W1)
∇ ca b written as
is called as learning rate.
Hence above equation can be written as:
= −

52
, =
1
2
ℎ −
For Following Cost function
, =
1
2
+ −
Gradient Descent equation can be written as
= −
,
Where k = 0,1
We can write the equation for w0 and w1 as
= −
,
= −
,

53
,
=
= ∑ ( ℎ − )
( )
ℎ( ) = +
= ∑ ( ℎ − )

54
,
=
= ∑ ( ℎ − )
( )
ℎ( ) = +
= ∑ ( ℎ − )
= ∑ ( + − )

55
,
=
= ∑ ( ℎ − )
( )
ℎ( ) = +
= ∑ ( ℎ − )
= ∑ ( + − )

56
Basic Gradient Descent Algorithm:
Repeat Until Converge
{
}
= −
This can be written as:
{
}
= −
,
= −
,

57
{
}
= −
1
( + − )
= −
1
( + − )

58
Learning Rate ( )
W
J(W)
W
J(W)

59
Steps in Gradient Descent Algorithm:
1. Initialize W0 and W1 with random initial values.
2. Initialize learning rate
3. Set Epochs
4. Calculate predicted output or h(x) for all the samples in training dataset
5. Calculate cost J
6. Estimate new parameters W0new and W1new .
7. Set new values for parameters W0 and W1 From W0new and W1new .
8. Repeat steps from Step No 4 Until the number of epochs are not over
Complete Dataset
1
2
3
4
5
6
7
Training Dataset
2
4
5
7
Testing Dataset
1
3
6

60
Multiple Linear Regression
Multiple regression is used to predict value of one output variable based on two
or more input variables.
Multiple linear regression (MLR), also known simply as multiple regression, is a
statistical technique that uses several explanatory variables to predict the
outcome of a response variable.
The goal of multiple linear regression (MLR) is to model the linear relationship
between the explanatory (independent) variables and response (dependent)
variable.

61
Examples:
1. House price prediction based on size of the house, number of rooms in the
house, Number of floors, Age of the building, open space around the
building.
Here:
y = House Price
X1 = Size of the House
X2 = Number of Rooms
X3 = Number of floors
X4 = Age of the building
X5 = open space
1. Prediction of person’s income based on education, work-class, country,
experience, etc.
X1 X2 X3 X4 X5 y
X1
0 X2
0 X3
0 X4
0 X5
0 y0
X1
1 X2
1 X3
1 X4
1 X5
1 y1
X1
2 X2
2 X3
2 X4
2 X5
2 y2
X1
3 X2
3 X3
3 X4
3 X5
3 y3
X1
4 X2
4 X3
4 X4
4 X5
4 y4
Sr. No.
0
1
2
3
4

62
Response or Target variable is defined as
= h(x) = w0+ w1x1 + w2x2 + w3x3 + … + wnxn
Where
X1,X2,X3, ….. , Xn are input/independent/predictor variables
is the output variable.
W0,W1,W2, ….. , Wn are parameters or coefficients of regression.
Since there is possibility of difference between actual output value and
Predicted value, we can write actual output as
y = +e = w0+ w1x1 + w2x2 + w3x3 + … + wnxn + e
e = y - w0+ w1x1 + w2x2 + w3x3 + … + wnxn
= y -
if e is negative, e = - y

63
Parameter Estimation in Multiple Linear Regression:
Gradient Descent Algorithm is used to estimate parameters in
Multiple Linear Regression
{
}
= −
The Cost function is:
=
1
2
ℎ −
Xi ith input in the dataset
yi ith output in the dataset

64
{
}
= −
, , . .
= −
, , . .
Where K = 1,2, …. n

65
, ,..
=
= ∑ ( ℎ − )
( …. )
ℎ( ) = + 1 + 2 + … . +
= ∑ ( ℎ − )
= ∑ ( + − )

66
, ,..
=
= ∑ ( ℎ − )
( …. )
= ∑ ( ℎ − ) 1
= ∑ ( + − ) 1

67
, ,..
=
= ∑ ( ℎ − )
( …. )
= ∑ ( ℎ − )
= ∑ ( + − )
Where K = 1,2,3, …. n

68
{
.
.
.
}
= −
1
( + 1 + 2 + … . + − )
= −
1
( + 1 + 2 + … . + − ) 1
= −
1
( + 1 + 2 + … . + − )
Where K = 1,2,3, …. n = kth input variable and ith sample

Machine-Learning-Unit2Machine Learning-Machine Learning--Regression.pdf

More Related Content

What's hot (20)

Similar to Machine-Learning-Unit2Machine Learning-Machine Learning--Regression.pdf (20)

More from SsdSsd5 (11)

Recently uploaded (20)

Machine-Learning-Unit2Machine Learning-Machine Learning--Regression.pdf