Linear Regression Analysis | Linear Regression in Python | Machine Learning Algorithms | Simplilearn
Profit Estimation of a Company
Which companies
shall we invest?
Venture Capital firm
A Venture Capital firm is trying to understand which companies should they invest
Profit Estimation of a Company
Idea
Based on companies expenses
Predict the profit companies make
Decide companies to invest
Profit Estimation of a Company
Administration
Marketing
State
R&D
Based on
Expenditure and
Location
Company Calculate profit
Profit Estimation of a Company
For simplicity, lets consider a single variable (R&D) and find out which companies to invest in
R&D
Profit
R&D
Profit
Companies spending
more on R&D make
good profit, let’s
invest in them
Plotting profit based on R&D
expenditure
Prediction line to estimate profit
What’s in it for you?
Machine Learning Algorithms
Understanding Linear Regression
Introduction to Machine Learning
Applications of Linear Regression
Multiple Linear Regression
Use Case – Profit Estimation of Companies
Introduction to Machine Learning
Introduction to Machine Learning
Based on the amount of rainfall, how much would be the crop yield?
Crop Field Predict crop yieldBased on Rainfall
Independent and Dependent Variables
Independent variable Dependent variable
A variable whose value does not change
by the effect of other variables and is
used to manipulate the dependent
variable. It is often denoted as X.
A variable whose value change when
there is any manipulation in the values of
independent variables. It is often denoted
as Y.
Crop yield depends on the amount of
rainfall received
Rainfall – Independent variable Crop yield – Dependent variable
In our example:
Numerical and Categorical Values
Data
SalaryAge Height Gender
Dog’s
BreedColor
12345
167891
46920
12345
90984
Numerical Categorical
A
C
D
E
B
Machine Learning Algorithms
Machine Learning
Algorithms
Supervised
Unsupervised
Reinforcement
Machine Learning Algorithms
Machine Learning
Algorithms
Supervised
ClassificationRegression
Machine Learning Algorithms
Machine Learning
Algorithms
Supervised
Regression
Simple Linear
Regression
Polynomial Linear
Regression
Multiple Linear
Regression
Applications of Linear Regression
Applications of Linear Regression
Economic Growth
Used to determine the Economic Growth of a country or a state in the coming
quarter, can also be used to predict the GDP of a country
Applications of Linear Regression
Product price
Can be used to predict what would be the price of a product in the future
Applications of Linear Regression
Housing sales
To estimate the number of houses a builder would sell and at what price in the
coming months
Applications of Linear Regression
Score Prediction
To predict the number of runs a player would score in the coming matches based
on previous performance
Understanding Linear Regression
Understanding Linear Regression
Linear Regression is a statistical model used to predict the relationship between
independent and dependent variables.
Examine 2 factors
Which variables in
particular are
significant predictors
of the outcome
variables?
1
How significant is
the Regression line
to make predictions
with highest
possible accuracy
2
Regression Equation
The simplest form of a simple linear regression equation with one dependent and one independent variable is represented by:
y = m x + c*
y ---> Dependent Variable
x ---> Independent Variable
c ---> Coefficient of the line
y2 - y1
x2 – x1
m =m ---> Slope of the line
Y
X
c
m
y2
y1
x2x1
Prediction using the Regression line
Rainfall (X)
CropYield(Y)
Plotting the amount of Crop Yield based on
the amount of Rainfall
Prediction using the Regression line
Rainfall (X)
CropYield(Y)
Plotting the amount of Crop Yield based on
the amount of Rainfall
Rainfall (X)
CropYield(Y)
Prediction using the Regression line
The Red point on the Y axis is the amount of Crop
Yield you can expect for some amount of Rainfall
(X) represented by Green dot
Rainfall (X)
CropYield(Y)
Plotting the amount of Crop Yield based on
the amount of Rainfall
Rainfall (X)
CropYield(Y)
Regression Line
Intuition behind the Regression line
Lets consider a sample dataset with 5 rows and find out how to draw the regression
line
X Y
1 2
2 4
3 5
4 4
5 5
Independent
variable
Dependent variable
Plotting the data points
Intuition behind the Regression line
Calculate the mean of X and Y and plot the values
X Y
1 2
2 4
3 5
4 4
5 5
Independent
variable
Dependent variable
Plotting the mean of X and Y
Mean 3 4
Intuition behind the Regression line
Regression line should ideally pass through the mean of X and Y
X Y
1 2
2 4
3 5
4 4
5 5
Independent
variable
Dependent variable
Regression line
Mean 3 4
(3,4)
Intuition behind the Regression line
Drawing the equation of the Regression line
_
_ _
X Y (X ) (Y ) (X Y)
1 2 1 4 2
2 4 4 16 8
3 5 9 25 15
4 4 16 16 16
5 5 25 25 25
= 66
Linear equation is represented as Y = m X + c
=m =
*
Y = m X + c
= 0.6 3 + 2.2
= 4
*
2 2
= 55 = 86
*
= 15 = 20
c=
*
((n (X Y))-( (X) (Y))***
((n (X ))-( (X) )*
2 2
((5 66)-(15 20))* *
((5 55))-(225)*
=0.6
(( (Y) (X ))-( (X) (X Y)*
2
* *
((n (X ))-( (X) )*
2 2 = 2.2
Intuition behind the Regression line
Lets find out the predicted values of Y for corresponding values of X using the linear
equation where m=0.6 and c=2.2
Here the blue points represent the actual Y
values and the brown points represent the
predicted Y values. The distance between the
actual and predicted values are known as
residuals or errors. The best fit line should
have the least sum of squares of these errors
also known as
e square.
(3,4)
Y
Y=0.6 1+2.2=2.8
Y=0.6 2+2.2=3.4
Y=0.6 3+2.2=4
Y=0.6 4+2.2=4.6
Y=0.6 5+2.2=5.2
pred
*
*
*
*
*
Intuition behind the Regression line
Lets find out the predicted values of Y for corresponding values of X using the linear
equation where m=0.6 and c=2.2
(3,4)
_ _
X Y Y (Y-Y ) (Y-Y )
1 2 2.8 -0.8 0.64
2 4 3.4 0.6 0.36
3 5 4 1 1
4 4 4.6 -0.6 0.36
5 5 5.2 -0.2 0.04
= 2.4
pred pred pred
2
The sum of squared errors for this regression line is 2.4. We check this
error for each line and conclude the best fit line having the least e square
value.
Finding the Best fit line
Minimizing the Distance: There are lots of ways to minimize the distance between the line and the data points like Sum of
Squared errors, Sum of Absolute errors, Root Mean Square error etc.
We keep moving this line through the
data points to make sure the Best fit
line has the least square distance
between the data points and the
regression line
Multiple Linear Regression
Multiple Linear Regression
Simple Linear
Regression
Multiple Linear
Regression
Y = m x + c*
2 *
Y = m x + m x + m x + ………. + m x + c*1 1 * 2 3 3*2 n n*
Independent variables (IDV’s)
Dependent variable (DV) Coefficient
nm1, m2, m3…..m
Slopes
Implementation of Linear Regression
Use case implementation of Linear Regression
Let’s understand how
Multiple Linear
Regression works by
implementing it in Python
Use case implementation of Linear Regression
1000 Companies
Profit
Expenditure
Based on
Predict
Use case implementation of Linear Regression
Predicting Profit of 1000 companies based on the attributes mentioned in the figure:
Profit
Estimation
Use case implementation of Linear Regression
Predicting Profit of 1000 companies based on the attributes mentioned in the figure:
R&D Spend
1
Profit
Estimation
Use case implementation of Linear Regression
Predicting Profit of 1000 companies based on the attributes mentioned in the figure:
R&D Spend
1
Administration
2
Profit
Estimation
Use case implementation of Linear Regression
Predicting Profit of 1000 companies based on the attributes mentioned in the figure:
R&D Spend
1
Marketing Spend
3
Administration
2
Profit
Estimation
Use case implementation of Linear Regression
Predicting Profit of 1000 companies based on the attributes mentioned in the figure:
R&D Spend
1
State
4
Marketing Spend
3
Administration
2
Profit
Estimation
Use case implementation of Linear Regression
Predicting Profit of 1000 companies based on the attributes mentioned in the figure:
R&D Spend
1
State
4
Marketing Spend
3
Administration
2
ProfitProfit
Profit
Estimation
Use case implementation of Linear Regression
Predicting Profit of 1000 companies based on the attributes mentioned in the figure:
R&D Spend
1
State
4
Marketing Spend
3
Administration
2
ProfitProfit
Profit
Estimation
Predict Profit
Use case implementation of Linear Regression
1. Import the libraries:
Use case implementation of Linear Regression
2. Load the Dataset and extract independent and dependent variables:
Use case implementation of Linear Regression
3. Data Visualization:
Use case implementation of Linear Regression
4. Encoding Categorical Data:
5. Avoiding Dummy Variable Trap:
Use case implementation of Linear Regression
6. Splitting the data into Train and Test set:
7. Fitting Multiple Linear Regression Model to Training set:
Use case implementation of Linear Regression
8. Predicting the Test set results:
Use case implementation of Linear Regression
9. Calculating the Coefficients and Intercepts:
Use case implementation of Linear Regression
10. Evaluating the model:
R squared value of 0.91 proves the model is a good model
Use case summary
We successfully trained our model with
certain predictors and estimated the
profit of companies using linear
regression
Key Takeaways
Linear Regression Analysis | Linear Regression in Python | Machine Learning Algorithms | Simplilearn

More Related Content

PPTX
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
PDF
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
PPTX
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
PPTX
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
PPTX
Machine Learning-Linear regression
PPTX
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
PPTX
Machine Learning Using Python
PPTX
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Machine Learning-Linear regression
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
Machine Learning Using Python
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...

What's hot (20)

PPTX
Decision Tree Learning
PDF
Bias and variance trade off
PPTX
Classification and Regression
ODP
Machine Learning With Logistic Regression
PPTX
Random forest
ODP
Machine Learning with Decision trees
PDF
Decision trees in Machine Learning
PPTX
Unsupervised learning (clustering)
PDF
Support Vector Machines ( SVM )
PDF
Naive Bayes
PDF
Introduction to Machine Learning Classifiers
PPTX
Data preprocessing in Machine learning
PDF
Introduction to Statistical Machine Learning
PPTX
Introduction to Deep Learning
PPTX
supervised learning
PDF
Dimensionality Reduction
PDF
Decision tree
PDF
K - Nearest neighbor ( KNN )
PPTX
Support vector machine
PDF
Decision tree
Decision Tree Learning
Bias and variance trade off
Classification and Regression
Machine Learning With Logistic Regression
Random forest
Machine Learning with Decision trees
Decision trees in Machine Learning
Unsupervised learning (clustering)
Support Vector Machines ( SVM )
Naive Bayes
Introduction to Machine Learning Classifiers
Data preprocessing in Machine learning
Introduction to Statistical Machine Learning
Introduction to Deep Learning
supervised learning
Dimensionality Reduction
Decision tree
K - Nearest neighbor ( KNN )
Support vector machine
Decision tree
Ad

Similar to Linear Regression Analysis | Linear Regression in Python | Machine Learning Algorithms | Simplilearn (20)

PDF
The Suitcase Case
PPTX
Regression is A statistical procedure used to find relationships among a set...
PPTX
ch12 (1).pptx Himalaya shampoo and conditioner for the baby is the message fo...
PPTX
Forecasting Using the Predictive Analytics
PPT
Rsh qam11 ch04 ge
PPTX
PPTX
Intro to econometrics
PPT
Tbs910 regression models
PPT
15.Simple Linear Regression of case study-530 (2).ppt
PPT
Demand forecasting methods 1 gp
DOC
Statistics project2
PPT
Bba 3274 qm week 6 part 1 regression models
PPT
Demand Forcasting
PDF
Course Title: Introduction to Machine Learning, Chapter 2- Supervised Learning
PPTX
Different Types of Machine Learning Algorithms
PPT
Demand Estimation AND FORECASTING
PPT
Data Analysison Regression
PPTX
PPT
Multiple Regression.ppt
PPT
Simple lin regress_inference
The Suitcase Case
Regression is A statistical procedure used to find relationships among a set...
ch12 (1).pptx Himalaya shampoo and conditioner for the baby is the message fo...
Forecasting Using the Predictive Analytics
Rsh qam11 ch04 ge
Intro to econometrics
Tbs910 regression models
15.Simple Linear Regression of case study-530 (2).ppt
Demand forecasting methods 1 gp
Statistics project2
Bba 3274 qm week 6 part 1 regression models
Demand Forcasting
Course Title: Introduction to Machine Learning, Chapter 2- Supervised Learning
Different Types of Machine Learning Algorithms
Demand Estimation AND FORECASTING
Data Analysison Regression
Multiple Regression.ppt
Simple lin regress_inference
Ad

More from Simplilearn (20)

PPTX
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
PPTX
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
PPTX
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
PPTX
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
PPTX
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
PPTX
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
PPTX
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
PPTX
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
PPTX
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
PPTX
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
PPTX
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
PPTX
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
PPTX
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
PPTX
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
PPTX
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
PPTX
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
PPTX
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
PPTX
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
PPTX
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
PPTX
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...

Recently uploaded (20)

PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
PDF
advance database management system book.pdf
PPTX
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
PDF
Uderstanding digital marketing and marketing stratergie for engaging the digi...
PPTX
Module on health assessment of CHN. pptx
PDF
Empowerment Technology for Senior High School Guide
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PDF
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PDF
English Textual Question & Ans (12th Class).pdf
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
PDF
LIFE & LIVING TRILOGY - PART (3) REALITY & MYSTERY.pdf
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PPTX
Unit 4 Computer Architecture Multicore Processor.pptx
PPTX
What’s under the hood: Parsing standardized learning content for AI
PDF
HVAC Specification 2024 according to central public works department
PDF
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
advance database management system book.pdf
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
Uderstanding digital marketing and marketing stratergie for engaging the digi...
Module on health assessment of CHN. pptx
Empowerment Technology for Senior High School Guide
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
A powerpoint presentation on the Revised K-10 Science Shaping Paper
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
Share_Module_2_Power_conflict_and_negotiation.pptx
English Textual Question & Ans (12th Class).pdf
B.Sc. DS Unit 2 Software Engineering.pptx
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
LIFE & LIVING TRILOGY - PART (3) REALITY & MYSTERY.pdf
Paper A Mock Exam 9_ Attempt review.pdf.
Unit 4 Computer Architecture Multicore Processor.pptx
What’s under the hood: Parsing standardized learning content for AI
HVAC Specification 2024 according to central public works department
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf

Linear Regression Analysis | Linear Regression in Python | Machine Learning Algorithms | Simplilearn

  • 2. Profit Estimation of a Company Which companies shall we invest? Venture Capital firm A Venture Capital firm is trying to understand which companies should they invest
  • 3. Profit Estimation of a Company Idea Based on companies expenses Predict the profit companies make Decide companies to invest
  • 4. Profit Estimation of a Company Administration Marketing State R&D Based on Expenditure and Location Company Calculate profit
  • 5. Profit Estimation of a Company For simplicity, lets consider a single variable (R&D) and find out which companies to invest in R&D Profit R&D Profit Companies spending more on R&D make good profit, let’s invest in them Plotting profit based on R&D expenditure Prediction line to estimate profit
  • 6. What’s in it for you? Machine Learning Algorithms Understanding Linear Regression Introduction to Machine Learning Applications of Linear Regression Multiple Linear Regression Use Case – Profit Estimation of Companies
  • 8. Introduction to Machine Learning Based on the amount of rainfall, how much would be the crop yield? Crop Field Predict crop yieldBased on Rainfall
  • 9. Independent and Dependent Variables Independent variable Dependent variable A variable whose value does not change by the effect of other variables and is used to manipulate the dependent variable. It is often denoted as X. A variable whose value change when there is any manipulation in the values of independent variables. It is often denoted as Y. Crop yield depends on the amount of rainfall received Rainfall – Independent variable Crop yield – Dependent variable In our example:
  • 10. Numerical and Categorical Values Data SalaryAge Height Gender Dog’s BreedColor 12345 167891 46920 12345 90984 Numerical Categorical A C D E B
  • 11. Machine Learning Algorithms Machine Learning Algorithms Supervised Unsupervised Reinforcement
  • 12. Machine Learning Algorithms Machine Learning Algorithms Supervised ClassificationRegression
  • 13. Machine Learning Algorithms Machine Learning Algorithms Supervised Regression Simple Linear Regression Polynomial Linear Regression Multiple Linear Regression
  • 15. Applications of Linear Regression Economic Growth Used to determine the Economic Growth of a country or a state in the coming quarter, can also be used to predict the GDP of a country
  • 16. Applications of Linear Regression Product price Can be used to predict what would be the price of a product in the future
  • 17. Applications of Linear Regression Housing sales To estimate the number of houses a builder would sell and at what price in the coming months
  • 18. Applications of Linear Regression Score Prediction To predict the number of runs a player would score in the coming matches based on previous performance
  • 20. Understanding Linear Regression Linear Regression is a statistical model used to predict the relationship between independent and dependent variables. Examine 2 factors Which variables in particular are significant predictors of the outcome variables? 1 How significant is the Regression line to make predictions with highest possible accuracy 2
  • 21. Regression Equation The simplest form of a simple linear regression equation with one dependent and one independent variable is represented by: y = m x + c* y ---> Dependent Variable x ---> Independent Variable c ---> Coefficient of the line y2 - y1 x2 – x1 m =m ---> Slope of the line Y X c m y2 y1 x2x1
  • 22. Prediction using the Regression line Rainfall (X) CropYield(Y) Plotting the amount of Crop Yield based on the amount of Rainfall
  • 23. Prediction using the Regression line Rainfall (X) CropYield(Y) Plotting the amount of Crop Yield based on the amount of Rainfall Rainfall (X) CropYield(Y)
  • 24. Prediction using the Regression line The Red point on the Y axis is the amount of Crop Yield you can expect for some amount of Rainfall (X) represented by Green dot Rainfall (X) CropYield(Y) Plotting the amount of Crop Yield based on the amount of Rainfall Rainfall (X) CropYield(Y) Regression Line
  • 25. Intuition behind the Regression line Lets consider a sample dataset with 5 rows and find out how to draw the regression line X Y 1 2 2 4 3 5 4 4 5 5 Independent variable Dependent variable Plotting the data points
  • 26. Intuition behind the Regression line Calculate the mean of X and Y and plot the values X Y 1 2 2 4 3 5 4 4 5 5 Independent variable Dependent variable Plotting the mean of X and Y Mean 3 4
  • 27. Intuition behind the Regression line Regression line should ideally pass through the mean of X and Y X Y 1 2 2 4 3 5 4 4 5 5 Independent variable Dependent variable Regression line Mean 3 4 (3,4)
  • 28. Intuition behind the Regression line Drawing the equation of the Regression line _ _ _ X Y (X ) (Y ) (X Y) 1 2 1 4 2 2 4 4 16 8 3 5 9 25 15 4 4 16 16 16 5 5 25 25 25 = 66 Linear equation is represented as Y = m X + c =m = * Y = m X + c = 0.6 3 + 2.2 = 4 * 2 2 = 55 = 86 * = 15 = 20 c= * ((n (X Y))-( (X) (Y))*** ((n (X ))-( (X) )* 2 2 ((5 66)-(15 20))* * ((5 55))-(225)* =0.6 (( (Y) (X ))-( (X) (X Y)* 2 * * ((n (X ))-( (X) )* 2 2 = 2.2
  • 29. Intuition behind the Regression line Lets find out the predicted values of Y for corresponding values of X using the linear equation where m=0.6 and c=2.2 Here the blue points represent the actual Y values and the brown points represent the predicted Y values. The distance between the actual and predicted values are known as residuals or errors. The best fit line should have the least sum of squares of these errors also known as e square. (3,4) Y Y=0.6 1+2.2=2.8 Y=0.6 2+2.2=3.4 Y=0.6 3+2.2=4 Y=0.6 4+2.2=4.6 Y=0.6 5+2.2=5.2 pred * * * * *
  • 30. Intuition behind the Regression line Lets find out the predicted values of Y for corresponding values of X using the linear equation where m=0.6 and c=2.2 (3,4) _ _ X Y Y (Y-Y ) (Y-Y ) 1 2 2.8 -0.8 0.64 2 4 3.4 0.6 0.36 3 5 4 1 1 4 4 4.6 -0.6 0.36 5 5 5.2 -0.2 0.04 = 2.4 pred pred pred 2 The sum of squared errors for this regression line is 2.4. We check this error for each line and conclude the best fit line having the least e square value.
  • 31. Finding the Best fit line Minimizing the Distance: There are lots of ways to minimize the distance between the line and the data points like Sum of Squared errors, Sum of Absolute errors, Root Mean Square error etc. We keep moving this line through the data points to make sure the Best fit line has the least square distance between the data points and the regression line
  • 33. Multiple Linear Regression Simple Linear Regression Multiple Linear Regression Y = m x + c* 2 * Y = m x + m x + m x + ………. + m x + c*1 1 * 2 3 3*2 n n* Independent variables (IDV’s) Dependent variable (DV) Coefficient nm1, m2, m3…..m Slopes
  • 35. Use case implementation of Linear Regression Let’s understand how Multiple Linear Regression works by implementing it in Python
  • 36. Use case implementation of Linear Regression 1000 Companies Profit Expenditure Based on Predict
  • 37. Use case implementation of Linear Regression Predicting Profit of 1000 companies based on the attributes mentioned in the figure: Profit Estimation
  • 38. Use case implementation of Linear Regression Predicting Profit of 1000 companies based on the attributes mentioned in the figure: R&D Spend 1 Profit Estimation
  • 39. Use case implementation of Linear Regression Predicting Profit of 1000 companies based on the attributes mentioned in the figure: R&D Spend 1 Administration 2 Profit Estimation
  • 40. Use case implementation of Linear Regression Predicting Profit of 1000 companies based on the attributes mentioned in the figure: R&D Spend 1 Marketing Spend 3 Administration 2 Profit Estimation
  • 41. Use case implementation of Linear Regression Predicting Profit of 1000 companies based on the attributes mentioned in the figure: R&D Spend 1 State 4 Marketing Spend 3 Administration 2 Profit Estimation
  • 42. Use case implementation of Linear Regression Predicting Profit of 1000 companies based on the attributes mentioned in the figure: R&D Spend 1 State 4 Marketing Spend 3 Administration 2 ProfitProfit Profit Estimation
  • 43. Use case implementation of Linear Regression Predicting Profit of 1000 companies based on the attributes mentioned in the figure: R&D Spend 1 State 4 Marketing Spend 3 Administration 2 ProfitProfit Profit Estimation Predict Profit
  • 44. Use case implementation of Linear Regression 1. Import the libraries:
  • 45. Use case implementation of Linear Regression 2. Load the Dataset and extract independent and dependent variables:
  • 46. Use case implementation of Linear Regression 3. Data Visualization:
  • 47. Use case implementation of Linear Regression 4. Encoding Categorical Data: 5. Avoiding Dummy Variable Trap:
  • 48. Use case implementation of Linear Regression 6. Splitting the data into Train and Test set: 7. Fitting Multiple Linear Regression Model to Training set:
  • 49. Use case implementation of Linear Regression 8. Predicting the Test set results:
  • 50. Use case implementation of Linear Regression 9. Calculating the Coefficients and Intercepts:
  • 51. Use case implementation of Linear Regression 10. Evaluating the model: R squared value of 0.91 proves the model is a good model
  • 52. Use case summary We successfully trained our model with certain predictors and estimated the profit of companies using linear regression

Editor's Notes