SlideShare a Scribd company logo
DA 5230 – Statistical & Machine Learning
Lecture 4 – Linear Regression
Maninda Edirisooriya
manindaw@uom.lk
What is Regression? (Reminder)
• Is a Supervised ML problem type where we have labelled data
• Use some labeled data for training the model
• Other data to test (predict the labeled data)
• Going to predict a continuous variable (dependent variable, Y)
• One or more independent variables (X values)
What is Regression? (Reminder)
• Regression can be achieved with various types of ML algorithms
• All of them would try to explain the given dataset (training set) using
a function defined by the model
• Parameters of the function would be set so that the function can give
predictions for unseen data with least amount of error
• In other words, during the training, a Regression model would try to
set it’s parameters to approximate the true relationship (i.e. try to
minimize the errors)
• Mathematically, መ
𝐟(X) would try to approximate f(X) by minimizing the
error 𝜺 which is the difference between ෡
𝐘 and Y
What is Regression? (Reminder)
Y
X
Regression Model
Data points
Error = 𝜀
Y1
෡
Y1
X1
Predicted value
Actual Value
Linear Regression
• A specific type of Regression
• Using a straight line as the Regression model
• Model is in the form of, y = mx + c
• Where m and c are model parameters
• The training process should find values for m and c so that the errors are
minimized
• But there is no single error to be minimized but errors for all the data
points
• We use sum of Mean Square of each Error (a.k.a. residuals which is MSE
=
1
n
෍
i=1
n
Yi − ෡
Yi
2
) where n is the number of data points
Linear Regression - Example
• Problem: Investment company is looking
for companies to invest. They are
interested in modeling how the profit of
a company varies with the R&D cost
• Analysis: We do an Exploratory Data
Analysis and draw a scatterplot to see
the relationship between the R&D cost
(X) and the profit (Y)
• As the relationship looks linear we may
use linear regression
Linear Regression - Example
• With linear regression we model the data to a straight line after
finding m and c for the data points in the training dataset
• We can evaluate/test the model using the test dataset
• Either using the Mean Squared Error (MSE)
• Or using the R-squired
Multiple Linear Regression
• When there are more than one independent variables the model cannot be
shown in a 2 dimensional scatterplot
• For example when there are 2 independent variables X1 and X2 the graph should be 3
dimensional
• When there are more than that the data points cannot be visualized
• Anyway, for Linear Regression the model can become a Multidimensional
Hyper Plane
• For example when there are 2 independent variables X1 and X2 the model can be
represented with a flat 2D plane instead of a straight line
• This kind of generic Linear Regression is known as Multiple Linear
Regression where Simple Linear Regression is called when there is only
one independent variable
Multiple Linear Regression
That model hyper plane can be represented as,
Y = β0 + β1*X1 + β2*X2 + ... + βn*Xn
Where,
β0 : the intercept (like c in Y = mX + c)
βi : the coefficient of each of the variable (like m in Y = mX + c) for each
variable index i
Like we find values for m and c in Y = mX + c, we have to find βi for all i
(where 0 ≤ 𝐢 ≤ n) in Multiple Linear Regression
Linear Regression - Assumptions
There are several assumptions to be held true for applying Linear
Regression on a dataset
• Linearity
• Homoscedasticity of Residuals or Equal Variances
• No Autocorrelation in residuals
• Residuals are distributed Normally
Linearity
• As we are going to model the data into a linear model each of the
independent variable should be linearly correlated with the
dependent variable
• A scatterplot can be used to view the relationship between each X
variable and the Y variable
• If there is no linearity you can try to apply a non-linear function on
each of the variable to make it linear
• Use Polynomials of X (described later in Polynomial Regression)
• Use exponentials/logarithms of X
• …
Homoscedasticity or Equal Variances of
Residuals
• Variance of the residuals of any given independent variable should be
the same for all the values of that variable
• Opposite of this phenomena is known as Heteroscedasticity where
the variance changes with the value of that variable
• Residuals can be plotted against the X values to view consistent
spread of residuals or a statistical test like Breusch-Pagan test or
White's test
• When Heteroscedasticity is present Weighted Least Square
Regression can be used to down-weight the data with higher
variances instead of Linear Regression
No Autocorrelation in Residuals
• Errors/residuals in each data point should be independent from each other
for any independent variable
• Otherwise, autocorrelation would underestimate the standard error which
would cause other issues like Heteroscedasticity and less precise variances
• This issue is often found in time series data as there can be correlation of
errors in adjacent data points
• Autocorrelation for 1 timestep residuals can be measured with Durbin
Watson statistic d where timestep is t
d =
෌t=2
T
et−et−1
2
෌t=1
T
et
2
d < 1.8 ⇒ Positive Autocorrelation ; d > 2.2 ⇒ Negative Autocorrelation
Residuals are Distributed Normally
• Residuals of each independent variable should be normally
distributed
• Normality of the data points can be viewed in a histogram, KDE or in a
QQ plot
• Statistics tests like Shapiro-Wilk test or Anderson-Darling test can be
used
No Multicollinearity of Data
• Each of the independent variable should be having little or no correlation
with other independent variables
• If correlated, that means that particular variable has the similar
information as the other variable which adds a redundant variable to the
model without new information. This makes the model more complex
without additional information
• Pairwise correlation can be viewed using a Pair plot or a Heatmap
• Variance Inflation Factor (VIF) can be used to measure the Multicollinearity
between two independent variables (R2 is explained later)
VIF =
1
1−𝑅2
• VIF = 1 ⇒ No collinearity ; VIF > 10 ⇒ Problematic, in general
Training
• During training (fitting data to a model) we try to assign best possible
values for βi for each of the i where,
• Y = β0 + β1*X1 + β2*X2 + ... + βn*Xn + ε or,
• Y = XT * β + ε where, X = [1, X1, X2, … Xn]T in the vector form
• The basis for selecting values for βi is so that Mean Square Errors
(MSE) are minimized for the training data where,
• MSE =
1
n
෍
i=1
n
Yi − ෡
Yi
2
, where n is the number of data points or,
• MSE =
1
n
෌i=1
n
ei
2
=
1
n
eT
e, where 𝐞𝐢 is Yi − ෡
Yi and e is a n x 1 column
vector in the vector form
Training
• There are 2 main types of techniques used in linear regression
• Close form solutions – that gives the exact solution minimizing residuals
• E.g.: Ordinary Least Squares (OLS)
• These forms can lead to sub-optimal solutions when linear regression assumptions
(linearity, homoscedasticity and normality of residuals) are violated
• Noise in the data can cause overfitting where the model can contain unnecessary
complexity fitting with noise
• Easier to compute when the number of parameters are less (e.g.: Simple Linear
Regression) but expensive when the number of parameters are large
• Faster as there are no iterations
• Iterative methods – iteratively compute towards the solution with numerical
methods
• E.g: Gradient Descent method – will be discussing later
Testing/Evaluating
• Once the model is created in the training phase it has to be evaluated
for the accuracy of predictions using the test dataset
• There are 2 main evaluation methods for measuring the accuracy of a
linear regression model inside the training set itself
• Mean Squared Error (MSE) method – find the MSE for the test dataset which
will measure the inaccuracy level of the model
• R-squired (R2) method – find the R-square measure for the model with test
dataset which is a measure for the accuracy of the model (When the model is
better R2 →1)
R-squired (R2) Measure
• Mean = ഥ
Y =
1
n
σi=1
n
Yi
• Residual Sum of Squares = SSres =෌i
Y ሶ
i − fi
2 = σi ei
• Total Sum of Squares = SStot = ෌i
Yሶ
i − ഥ
Y 2
R2 = 1 −
SSres
SStot
• In a perfect model, Yሶ
i = fi ⇒ SSres = 0 ⇒ R2 = 1
Polynomial Regression
• In Simple Linear Regression we learned above that we have to assume
X variable is linearly correlated with the Y variable
• But in nature that is not the only existing X-Y relationship where there
can be different non-linear relationships as well
Polynomial Regression
• In such cases, though the exact feature of X is not having a linear
relationship with Y, but X2 may have a linear relationship with Y
• Or X3 may have a linear relationship with Y too
• Or both X2 and X3 may have linear relationships with Y too
• In such cases we can engineer (create) new features by taking the
independent variable to a power of some number (e.g.: X2)
• Generally we start with power 2 and increase iteratively
• Then we can train and test it as a Multiple Linear Regression problem
• This is known as Polynomial Regression
Polynomial Regression - Example
• Suppose we have Xas independent variables and Y as the dependent
variable
• Multiple Linear Regression would be, Y = β0 + β1*X
• This is a polynomial of X to the power 1
• Let’s move to the 2nd order polynomials (quadratic form)
Y = β0 + β1*X + β2*X2
• Depending on the performance results of the tests, we can increase
the power to 3 too (cubic form)
Y = β0 + β1*X + β2*X2 + β3*X3
Multivariate Polynomial Regression
• Let’s consider the case when there are more than one independent
variables
• When we consider a polynomial of degree n (e.g.: n=2 for quadratic)
not only the exact Xi variables are raised to the power n but there
would be new features which are combinations of already engineered
variables as well
• We can iteratively add/remove such newly engineered features
depending on their significance using an algorithm as there would be
exponentially many combinations of new features with the degree of
the polynomial (We will discuss such algorithms later in this subject
module)
Multivariate Polynomial Regression - Example
• Suppose we have X1 and X2 as independent variables and Y as the
dependent variable
• Multiple Linear Regression would use, Y = β0 + β1*X1 + β2*X2
• This is a polynomial of X to the power 1
• Let’s move to the 2nd order polynomials
Y = β0 + β1*X1 + β2*X2 + β3*X1
2 + β4*X2
2 + β5*X1*X2
• Features in the cubic form would be,
X1, X2, X1
2, X2
2, X1
3, X2
3, X1*X2, X1
2*X2 , X1*X2
2
Two Hour Homework
• Officially we have two more hours to do after the end of the lectures
• Therefore, for this week’s extra hours you have a homework
• After today’s tutorial figure out all the difficult sections in it
• Try yourself to complete it and refer the Internet when needed
• Try to apply linear regression in different types of ML problems you have and
get familiar with them
• Try to identify that the linear regression assumptions are satisfied in each of
the problems and try to feature engineer with EDA done iteratively
• We need you to know linear regression for further learning ML and SL ahead
• Good Luck!
Questions?
Ad

Recommended

regression analysis presentation slides.
regression analysis presentation slides.
nsnatraj23
 
Dr. Shivu___Machine Learning_Module 2pdf
Dr. Shivu___Machine Learning_Module 2pdf
Dr. Shivashankar
 
11Polynomial RegressionPolynomial RegressionPolynomial RegressionPolynomial R...
11Polynomial RegressionPolynomial RegressionPolynomial RegressionPolynomial R...
FaizaKhan720183
 
Lecture 5 - Linear Regression Linear Regression
Lecture 5 - Linear Regression Linear Regression
viyah59114
 
MachineLearning_Unit-II.pptxScrum.pptxAgile Model.pptxAgile Model.pptxAgile M...
MachineLearning_Unit-II.pptxScrum.pptxAgile Model.pptxAgile Model.pptxAgile M...
22eg105n11
 
MachineLearning_Unit-II.FHDGFHJKpptx.pdf
MachineLearning_Unit-II.FHDGFHJKpptx.pdf
22eg105n49
 
Lecture 8 Linear and Multiple Regression (1).pptx
Lecture 8 Linear and Multiple Regression (1).pptx
haseebayy45
 
Linear Regression for Data Mining Application
Linear Regression for Data Mining Application
SudiptaDas684406
 
604_multiplee.ppt
604_multiplee.ppt
Rufesh
 
need help with stats 301 assignment help
need help with stats 301 assignment help
realnerdovo
 
CORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptx
Vitalis Adongo
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
Anusuya123
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
마이캠퍼스
 
REGRESSION METasdfghjklmjhgftrHODS1.pptx
REGRESSION METasdfghjklmjhgftrHODS1.pptx
cajativ595
 
Machine learning mathematicals.pdf
Machine learning mathematicals.pdf
King Khalid University
 
UNIT 3.pptx.......................................
UNIT 3.pptx.......................................
vijayannamratha
 
Regression Analysis-Machine Learning -Different Types
Regression Analysis-Machine Learning -Different Types
Global Academy of Technology
 
Regression vs Neural Net
Regression vs Neural Net
Ratul Alahy
 
Regression analysis refers to assessing the relationship between the outcome ...
Regression analysis refers to assessing the relationship between the outcome ...
sureshm491823
 
Stat 1163 -correlation and regression
Stat 1163 -correlation and regression
Khulna University
 
Lecture 4
Lecture 4
Nika Gigashvili
 
Correlation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptx
krunal soni
 
Simple egression.pptx
Simple egression.pptx
AbdalrahmanTahaJaya
 
Simple Linear Regression.pptx
Simple Linear Regression.pptx
AbdalrahmanTahaJaya
 
Principal component analysis and lda
Principal component analysis and lda
Suresh Pokharel
 
Linear functions and modeling
Linear functions and modeling
IVY SOLIS
 
Regression analysis ppt
Regression analysis ppt
Elkana Rorio
 
Fundamentals of Data Science Probability Distributions
Fundamentals of Data Science Probability Distributions
RBeze58
 
Lecture - 10 Transformer Model, Motivation to Transformers, Principles, and ...
Lecture - 10 Transformer Model, Motivation to Transformers, Principles, and ...
Maninda Edirisooriya
 
Lecture 11 - Advance Learning Techniques
Lecture 11 - Advance Learning Techniques
Maninda Edirisooriya
 

More Related Content

Similar to Lecture 4 - Linear Regression, a lecture in subject module Statistical & Machine Learning (20)

604_multiplee.ppt
604_multiplee.ppt
Rufesh
 
need help with stats 301 assignment help
need help with stats 301 assignment help
realnerdovo
 
CORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptx
Vitalis Adongo
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
Anusuya123
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
마이캠퍼스
 
REGRESSION METasdfghjklmjhgftrHODS1.pptx
REGRESSION METasdfghjklmjhgftrHODS1.pptx
cajativ595
 
Machine learning mathematicals.pdf
Machine learning mathematicals.pdf
King Khalid University
 
UNIT 3.pptx.......................................
UNIT 3.pptx.......................................
vijayannamratha
 
Regression Analysis-Machine Learning -Different Types
Regression Analysis-Machine Learning -Different Types
Global Academy of Technology
 
Regression vs Neural Net
Regression vs Neural Net
Ratul Alahy
 
Regression analysis refers to assessing the relationship between the outcome ...
Regression analysis refers to assessing the relationship between the outcome ...
sureshm491823
 
Stat 1163 -correlation and regression
Stat 1163 -correlation and regression
Khulna University
 
Lecture 4
Lecture 4
Nika Gigashvili
 
Correlation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptx
krunal soni
 
Simple egression.pptx
Simple egression.pptx
AbdalrahmanTahaJaya
 
Simple Linear Regression.pptx
Simple Linear Regression.pptx
AbdalrahmanTahaJaya
 
Principal component analysis and lda
Principal component analysis and lda
Suresh Pokharel
 
Linear functions and modeling
Linear functions and modeling
IVY SOLIS
 
Regression analysis ppt
Regression analysis ppt
Elkana Rorio
 
Fundamentals of Data Science Probability Distributions
Fundamentals of Data Science Probability Distributions
RBeze58
 
604_multiplee.ppt
604_multiplee.ppt
Rufesh
 
need help with stats 301 assignment help
need help with stats 301 assignment help
realnerdovo
 
CORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptx
Vitalis Adongo
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
Anusuya123
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
마이캠퍼스
 
REGRESSION METasdfghjklmjhgftrHODS1.pptx
REGRESSION METasdfghjklmjhgftrHODS1.pptx
cajativ595
 
UNIT 3.pptx.......................................
UNIT 3.pptx.......................................
vijayannamratha
 
Regression Analysis-Machine Learning -Different Types
Regression Analysis-Machine Learning -Different Types
Global Academy of Technology
 
Regression vs Neural Net
Regression vs Neural Net
Ratul Alahy
 
Regression analysis refers to assessing the relationship between the outcome ...
Regression analysis refers to assessing the relationship between the outcome ...
sureshm491823
 
Stat 1163 -correlation and regression
Stat 1163 -correlation and regression
Khulna University
 
Correlation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptx
krunal soni
 
Principal component analysis and lda
Principal component analysis and lda
Suresh Pokharel
 
Linear functions and modeling
Linear functions and modeling
IVY SOLIS
 
Regression analysis ppt
Regression analysis ppt
Elkana Rorio
 
Fundamentals of Data Science Probability Distributions
Fundamentals of Data Science Probability Distributions
RBeze58
 

More from Maninda Edirisooriya (20)

Lecture - 10 Transformer Model, Motivation to Transformers, Principles, and ...
Lecture - 10 Transformer Model, Motivation to Transformers, Principles, and ...
Maninda Edirisooriya
 
Lecture 11 - Advance Learning Techniques
Lecture 11 - Advance Learning Techniques
Maninda Edirisooriya
 
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
Maninda Edirisooriya
 
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Maninda Edirisooriya
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Maninda Edirisooriya
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Maninda Edirisooriya
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Maninda Edirisooriya
 
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Maninda Edirisooriya
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Maninda Edirisooriya
 
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Maninda Edirisooriya
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Maninda Edirisooriya
 
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Maninda Edirisooriya
 
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Maninda Edirisooriya
 
Analyzing the effectiveness of mobile and web channels using WSO2 BAM
Analyzing the effectiveness of mobile and web channels using WSO2 BAM
Maninda Edirisooriya
 
WSO2 BAM - Your big data toolbox
WSO2 BAM - Your big data toolbox
Maninda Edirisooriya
 
Training Report
Training Report
Maninda Edirisooriya
 
GViz - Project Report
GViz - Project Report
Maninda Edirisooriya
 
Mortivation
Mortivation
Maninda Edirisooriya
 
Hafnium impact 2008
Hafnium impact 2008
Maninda Edirisooriya
 
ChatCrypt
ChatCrypt
Maninda Edirisooriya
 
Lecture - 10 Transformer Model, Motivation to Transformers, Principles, and ...
Lecture - 10 Transformer Model, Motivation to Transformers, Principles, and ...
Maninda Edirisooriya
 
Lecture 11 - Advance Learning Techniques
Lecture 11 - Advance Learning Techniques
Maninda Edirisooriya
 
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
Maninda Edirisooriya
 
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Maninda Edirisooriya
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Maninda Edirisooriya
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Maninda Edirisooriya
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Maninda Edirisooriya
 
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Maninda Edirisooriya
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Maninda Edirisooriya
 
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Maninda Edirisooriya
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Maninda Edirisooriya
 
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Maninda Edirisooriya
 
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Maninda Edirisooriya
 
Analyzing the effectiveness of mobile and web channels using WSO2 BAM
Analyzing the effectiveness of mobile and web channels using WSO2 BAM
Maninda Edirisooriya
 
Ad

Recently uploaded (20)

Machine Learning - Classification Algorithms
Machine Learning - Classification Algorithms
resming1
 
Learning – Types of Machine Learning – Supervised Learning – Unsupervised UNI...
Learning – Types of Machine Learning – Supervised Learning – Unsupervised UNI...
23Q95A6706
 
60 Years and Beyond eBook 1234567891.pdf
60 Years and Beyond eBook 1234567891.pdf
waseemalazzeh
 
How to Un-Obsolete Your Legacy Keypad Design
How to Un-Obsolete Your Legacy Keypad Design
Epec Engineered Technologies
 
362 Alec Data Center Solutions-Slysium Data Center-AUH-Adaptaflex.pdf
362 Alec Data Center Solutions-Slysium Data Center-AUH-Adaptaflex.pdf
djiceramil
 
最新版美国圣莫尼卡学院毕业证(SMC毕业证书)原版定制
最新版美国圣莫尼卡学院毕业证(SMC毕业证书)原版定制
Taqyea
 
Structural Wonderers_new and ancient.pptx
Structural Wonderers_new and ancient.pptx
nikopapa113
 
Abraham Silberschatz-Operating System Concepts (9th,2012.12).pdf
Abraham Silberschatz-Operating System Concepts (9th,2012.12).pdf
Shabista Imam
 
Introduction to sensing and Week-1.pptx
Introduction to sensing and Week-1.pptx
KNaveenKumarECE
 
Complete guidance book of Asp.Net Web API
Complete guidance book of Asp.Net Web API
Shabista Imam
 
Decoding Kotlin - Your Guide to Solving the Mysterious in Kotlin - Devoxx PL ...
Decoding Kotlin - Your Guide to Solving the Mysterious in Kotlin - Devoxx PL ...
João Esperancinha
 
A Cluster-Based Trusted Secure Multipath Routing Protocol for Mobile Ad Hoc N...
A Cluster-Based Trusted Secure Multipath Routing Protocol for Mobile Ad Hoc N...
IJCNCJournal
 
IntroSlides-June-GDG-Cloud-Munich community [email protected]
IntroSlides-June-GDG-Cloud-Munich community [email protected]
Luiz Carneiro
 
Tally.ERP 9 at a Glance.book - Tally Solutions .pdf
Tally.ERP 9 at a Glance.book - Tally Solutions .pdf
Shabista Imam
 
IPL_Logic_Flow.pdf Mainframe IPLMainframe IPL
IPL_Logic_Flow.pdf Mainframe IPLMainframe IPL
KhadijaKhadijaAouadi
 
System design handwritten notes guidance
System design handwritten notes guidance
Shabista Imam
 
David Boutry - Mentors Junior Developers
David Boutry - Mentors Junior Developers
David Boutry
 
Mechanical Vibration_MIC 202_iit roorkee.pdf
Mechanical Vibration_MIC 202_iit roorkee.pdf
isahiliitr
 
FUNDAMENTALS OF COMPUTER ORGANIZATION AND ARCHITECTURE
FUNDAMENTALS OF COMPUTER ORGANIZATION AND ARCHITECTURE
Shabista Imam
 
Deep Learning for Image Processing on 16 June 2025 MITS.pptx
Deep Learning for Image Processing on 16 June 2025 MITS.pptx
resming1
 
Machine Learning - Classification Algorithms
Machine Learning - Classification Algorithms
resming1
 
Learning – Types of Machine Learning – Supervised Learning – Unsupervised UNI...
Learning – Types of Machine Learning – Supervised Learning – Unsupervised UNI...
23Q95A6706
 
60 Years and Beyond eBook 1234567891.pdf
60 Years and Beyond eBook 1234567891.pdf
waseemalazzeh
 
362 Alec Data Center Solutions-Slysium Data Center-AUH-Adaptaflex.pdf
362 Alec Data Center Solutions-Slysium Data Center-AUH-Adaptaflex.pdf
djiceramil
 
最新版美国圣莫尼卡学院毕业证(SMC毕业证书)原版定制
最新版美国圣莫尼卡学院毕业证(SMC毕业证书)原版定制
Taqyea
 
Structural Wonderers_new and ancient.pptx
Structural Wonderers_new and ancient.pptx
nikopapa113
 
Abraham Silberschatz-Operating System Concepts (9th,2012.12).pdf
Abraham Silberschatz-Operating System Concepts (9th,2012.12).pdf
Shabista Imam
 
Introduction to sensing and Week-1.pptx
Introduction to sensing and Week-1.pptx
KNaveenKumarECE
 
Complete guidance book of Asp.Net Web API
Complete guidance book of Asp.Net Web API
Shabista Imam
 
Decoding Kotlin - Your Guide to Solving the Mysterious in Kotlin - Devoxx PL ...
Decoding Kotlin - Your Guide to Solving the Mysterious in Kotlin - Devoxx PL ...
João Esperancinha
 
A Cluster-Based Trusted Secure Multipath Routing Protocol for Mobile Ad Hoc N...
A Cluster-Based Trusted Secure Multipath Routing Protocol for Mobile Ad Hoc N...
IJCNCJournal
 
Tally.ERP 9 at a Glance.book - Tally Solutions .pdf
Tally.ERP 9 at a Glance.book - Tally Solutions .pdf
Shabista Imam
 
IPL_Logic_Flow.pdf Mainframe IPLMainframe IPL
IPL_Logic_Flow.pdf Mainframe IPLMainframe IPL
KhadijaKhadijaAouadi
 
System design handwritten notes guidance
System design handwritten notes guidance
Shabista Imam
 
David Boutry - Mentors Junior Developers
David Boutry - Mentors Junior Developers
David Boutry
 
Mechanical Vibration_MIC 202_iit roorkee.pdf
Mechanical Vibration_MIC 202_iit roorkee.pdf
isahiliitr
 
FUNDAMENTALS OF COMPUTER ORGANIZATION AND ARCHITECTURE
FUNDAMENTALS OF COMPUTER ORGANIZATION AND ARCHITECTURE
Shabista Imam
 
Deep Learning for Image Processing on 16 June 2025 MITS.pptx
Deep Learning for Image Processing on 16 June 2025 MITS.pptx
resming1
 
Ad

Lecture 4 - Linear Regression, a lecture in subject module Statistical & Machine Learning

  • 1. DA 5230 – Statistical & Machine Learning Lecture 4 – Linear Regression Maninda Edirisooriya [email protected]
  • 2. What is Regression? (Reminder) • Is a Supervised ML problem type where we have labelled data • Use some labeled data for training the model • Other data to test (predict the labeled data) • Going to predict a continuous variable (dependent variable, Y) • One or more independent variables (X values)
  • 3. What is Regression? (Reminder) • Regression can be achieved with various types of ML algorithms • All of them would try to explain the given dataset (training set) using a function defined by the model • Parameters of the function would be set so that the function can give predictions for unseen data with least amount of error • In other words, during the training, a Regression model would try to set it’s parameters to approximate the true relationship (i.e. try to minimize the errors) • Mathematically, መ 𝐟(X) would try to approximate f(X) by minimizing the error 𝜺 which is the difference between ෡ 𝐘 and Y
  • 4. What is Regression? (Reminder) Y X Regression Model Data points Error = 𝜀 Y1 ෡ Y1 X1 Predicted value Actual Value
  • 5. Linear Regression • A specific type of Regression • Using a straight line as the Regression model • Model is in the form of, y = mx + c • Where m and c are model parameters • The training process should find values for m and c so that the errors are minimized • But there is no single error to be minimized but errors for all the data points • We use sum of Mean Square of each Error (a.k.a. residuals which is MSE = 1 n ෍ i=1 n Yi − ෡ Yi 2 ) where n is the number of data points
  • 6. Linear Regression - Example • Problem: Investment company is looking for companies to invest. They are interested in modeling how the profit of a company varies with the R&D cost • Analysis: We do an Exploratory Data Analysis and draw a scatterplot to see the relationship between the R&D cost (X) and the profit (Y) • As the relationship looks linear we may use linear regression
  • 7. Linear Regression - Example • With linear regression we model the data to a straight line after finding m and c for the data points in the training dataset • We can evaluate/test the model using the test dataset • Either using the Mean Squared Error (MSE) • Or using the R-squired
  • 8. Multiple Linear Regression • When there are more than one independent variables the model cannot be shown in a 2 dimensional scatterplot • For example when there are 2 independent variables X1 and X2 the graph should be 3 dimensional • When there are more than that the data points cannot be visualized • Anyway, for Linear Regression the model can become a Multidimensional Hyper Plane • For example when there are 2 independent variables X1 and X2 the model can be represented with a flat 2D plane instead of a straight line • This kind of generic Linear Regression is known as Multiple Linear Regression where Simple Linear Regression is called when there is only one independent variable
  • 9. Multiple Linear Regression That model hyper plane can be represented as, Y = β0 + β1*X1 + β2*X2 + ... + βn*Xn Where, β0 : the intercept (like c in Y = mX + c) βi : the coefficient of each of the variable (like m in Y = mX + c) for each variable index i Like we find values for m and c in Y = mX + c, we have to find βi for all i (where 0 ≤ 𝐢 ≤ n) in Multiple Linear Regression
  • 10. Linear Regression - Assumptions There are several assumptions to be held true for applying Linear Regression on a dataset • Linearity • Homoscedasticity of Residuals or Equal Variances • No Autocorrelation in residuals • Residuals are distributed Normally
  • 11. Linearity • As we are going to model the data into a linear model each of the independent variable should be linearly correlated with the dependent variable • A scatterplot can be used to view the relationship between each X variable and the Y variable • If there is no linearity you can try to apply a non-linear function on each of the variable to make it linear • Use Polynomials of X (described later in Polynomial Regression) • Use exponentials/logarithms of X • …
  • 12. Homoscedasticity or Equal Variances of Residuals • Variance of the residuals of any given independent variable should be the same for all the values of that variable • Opposite of this phenomena is known as Heteroscedasticity where the variance changes with the value of that variable • Residuals can be plotted against the X values to view consistent spread of residuals or a statistical test like Breusch-Pagan test or White's test • When Heteroscedasticity is present Weighted Least Square Regression can be used to down-weight the data with higher variances instead of Linear Regression
  • 13. No Autocorrelation in Residuals • Errors/residuals in each data point should be independent from each other for any independent variable • Otherwise, autocorrelation would underestimate the standard error which would cause other issues like Heteroscedasticity and less precise variances • This issue is often found in time series data as there can be correlation of errors in adjacent data points • Autocorrelation for 1 timestep residuals can be measured with Durbin Watson statistic d where timestep is t d = ෌t=2 T et−et−1 2 ෌t=1 T et 2 d < 1.8 ⇒ Positive Autocorrelation ; d > 2.2 ⇒ Negative Autocorrelation
  • 14. Residuals are Distributed Normally • Residuals of each independent variable should be normally distributed • Normality of the data points can be viewed in a histogram, KDE or in a QQ plot • Statistics tests like Shapiro-Wilk test or Anderson-Darling test can be used
  • 15. No Multicollinearity of Data • Each of the independent variable should be having little or no correlation with other independent variables • If correlated, that means that particular variable has the similar information as the other variable which adds a redundant variable to the model without new information. This makes the model more complex without additional information • Pairwise correlation can be viewed using a Pair plot or a Heatmap • Variance Inflation Factor (VIF) can be used to measure the Multicollinearity between two independent variables (R2 is explained later) VIF = 1 1−𝑅2 • VIF = 1 ⇒ No collinearity ; VIF > 10 ⇒ Problematic, in general
  • 16. Training • During training (fitting data to a model) we try to assign best possible values for βi for each of the i where, • Y = β0 + β1*X1 + β2*X2 + ... + βn*Xn + ε or, • Y = XT * β + ε where, X = [1, X1, X2, … Xn]T in the vector form • The basis for selecting values for βi is so that Mean Square Errors (MSE) are minimized for the training data where, • MSE = 1 n ෍ i=1 n Yi − ෡ Yi 2 , where n is the number of data points or, • MSE = 1 n ෌i=1 n ei 2 = 1 n eT e, where 𝐞𝐢 is Yi − ෡ Yi and e is a n x 1 column vector in the vector form
  • 17. Training • There are 2 main types of techniques used in linear regression • Close form solutions – that gives the exact solution minimizing residuals • E.g.: Ordinary Least Squares (OLS) • These forms can lead to sub-optimal solutions when linear regression assumptions (linearity, homoscedasticity and normality of residuals) are violated • Noise in the data can cause overfitting where the model can contain unnecessary complexity fitting with noise • Easier to compute when the number of parameters are less (e.g.: Simple Linear Regression) but expensive when the number of parameters are large • Faster as there are no iterations • Iterative methods – iteratively compute towards the solution with numerical methods • E.g: Gradient Descent method – will be discussing later
  • 18. Testing/Evaluating • Once the model is created in the training phase it has to be evaluated for the accuracy of predictions using the test dataset • There are 2 main evaluation methods for measuring the accuracy of a linear regression model inside the training set itself • Mean Squared Error (MSE) method – find the MSE for the test dataset which will measure the inaccuracy level of the model • R-squired (R2) method – find the R-square measure for the model with test dataset which is a measure for the accuracy of the model (When the model is better R2 →1)
  • 19. R-squired (R2) Measure • Mean = ഥ Y = 1 n σi=1 n Yi • Residual Sum of Squares = SSres =෌i Y ሶ i − fi 2 = σi ei • Total Sum of Squares = SStot = ෌i Yሶ i − ഥ Y 2 R2 = 1 − SSres SStot • In a perfect model, Yሶ i = fi ⇒ SSres = 0 ⇒ R2 = 1
  • 20. Polynomial Regression • In Simple Linear Regression we learned above that we have to assume X variable is linearly correlated with the Y variable • But in nature that is not the only existing X-Y relationship where there can be different non-linear relationships as well
  • 21. Polynomial Regression • In such cases, though the exact feature of X is not having a linear relationship with Y, but X2 may have a linear relationship with Y • Or X3 may have a linear relationship with Y too • Or both X2 and X3 may have linear relationships with Y too • In such cases we can engineer (create) new features by taking the independent variable to a power of some number (e.g.: X2) • Generally we start with power 2 and increase iteratively • Then we can train and test it as a Multiple Linear Regression problem • This is known as Polynomial Regression
  • 22. Polynomial Regression - Example • Suppose we have Xas independent variables and Y as the dependent variable • Multiple Linear Regression would be, Y = β0 + β1*X • This is a polynomial of X to the power 1 • Let’s move to the 2nd order polynomials (quadratic form) Y = β0 + β1*X + β2*X2 • Depending on the performance results of the tests, we can increase the power to 3 too (cubic form) Y = β0 + β1*X + β2*X2 + β3*X3
  • 23. Multivariate Polynomial Regression • Let’s consider the case when there are more than one independent variables • When we consider a polynomial of degree n (e.g.: n=2 for quadratic) not only the exact Xi variables are raised to the power n but there would be new features which are combinations of already engineered variables as well • We can iteratively add/remove such newly engineered features depending on their significance using an algorithm as there would be exponentially many combinations of new features with the degree of the polynomial (We will discuss such algorithms later in this subject module)
  • 24. Multivariate Polynomial Regression - Example • Suppose we have X1 and X2 as independent variables and Y as the dependent variable • Multiple Linear Regression would use, Y = β0 + β1*X1 + β2*X2 • This is a polynomial of X to the power 1 • Let’s move to the 2nd order polynomials Y = β0 + β1*X1 + β2*X2 + β3*X1 2 + β4*X2 2 + β5*X1*X2 • Features in the cubic form would be, X1, X2, X1 2, X2 2, X1 3, X2 3, X1*X2, X1 2*X2 , X1*X2 2
  • 25. Two Hour Homework • Officially we have two more hours to do after the end of the lectures • Therefore, for this week’s extra hours you have a homework • After today’s tutorial figure out all the difficult sections in it • Try yourself to complete it and refer the Internet when needed • Try to apply linear regression in different types of ML problems you have and get familiar with them • Try to identify that the linear regression assumptions are satisfied in each of the problems and try to feature engineer with EDA done iteratively • We need you to know linear regression for further learning ML and SL ahead • Good Luck!