SlideShare a Scribd company logo
8
Most read
19
Most read
20
Most read
Presented by
Eshan Agarwal
Implement Principal Component
Analysis(PCA) in python
How do we choose the
right features ?
Given a
classification
problem ….
 PCA is a method for reducing the dimensionality of data.
 It can be thought of as a projection method where data with m-columns
(features) is projected into a subspace with m or fewer columns, while
retaining the essence of the original data.
An PCA
Xn
km
Introduction to PCA
In this presentation, we will discover the PCA
method for dimensionality reduction and how to
implement it from scratch in Python.
 Before go in deep of PCA let us understand
some key points of PCA
 Variance
 The variance of each variable is the average squared deviation of its
n values around the mean of that variable. It can also think of as
spread of data points.
Geometric Rationale of PCA
 Covariance
Covariance of
variables i and j
Sum over all
n objects
Value of
variable i
in object m
Mean of
variable i
Value of
variable j
in object m
Mean of
variable j
 Degree to which the variables are linearly correlated is represented by
their covariances.
Geometric Rationale of PCA
Objective of PCA
 Objective of PCA is to rigidly rotate the axes of this m-dimensional space to new positions
(principal axes)
 PCA is ordered such that principal axis 1 has the highest variance, axis 2 has the next
highest variance .... , and axis p has the lowest variance
Implement PCA in Python (Scratch)
 Load the Data-Set :
 We can use Boston Housing dataset for PCA. Boston dataset has 13
features. So question here is how to visualize the data ?. We can
reduce the dimensions of data by using PCA and then visualize.
 Standardize data:
 PCA is largely affected by scales and different features might have different
scales. So it is better to standardize data before finding PCA components.
Sklearn’s StandardScaler scales data to scale of zero mean and unit variance.
The Algebra of PCA
 Calculating PCA involves following steps:
a. Calculating the covariance matrix.
b. Calculating the eigenvalues and eigenvector.
c. Forming Principal Components.
d. Projection into the new feature space.
a b dc+ + ++ =
 Calculating the covariance matrix (S) :
 Covariance matrix is a matrix of variances and covariances (or correlations) among
every pair of the m variables .
 It is square, symmetric matrix.
 Covariance matrix (S) = X.T * X, we can find it by using numpy matmul() function
in python.
Calculating the eigenvalues and eigenvector :
 ƛ is an eigenvalue for a matrix X if it is a solution of the characteristic
equation:
det( ƛ*I - A ) = 0
Where, I is the identity matrix of the same dimension as X.
 The sum of all m eigenvalues equals the trace of S (the sum of the variances of
the original variables).
 For each eigenvalue ƛ, a corresponding eigen-vector v, can be found by
solving :
( ƛ*I - A )v = 0
 The eigenvalues, 1, 2, ... m are the variances of the coordinates
on each principal component axis.
Calculating the eigenvalues and eigenvector :
 We are using scipy.linalg, which have eigh function for finding the top eigen-
values & eigen-vector, we are finding top 2 eigenvalues and eigenvectors as follow.
Code for finding eigenvalues and eigenvector :
Forming Principal Components :
 Below is code for forming principal components, formed by two principal eigen
vectors by vector-vector multiplication
 Projection into the new feature space :
 Creating a Data Frame having 1st principal & 2nd Principal components.
Visualize Data after PCA
Steps for PCA
 Standardize the Data.
 Calculate the covariance matrix.
 Find the eigenvalues and eigenvectors of the covariance matrix.
 Plot the eigenvectors / principal components over the scaled data.
1) [ True or False ] PCA can be used for projecting and visualizing data in lower
dimensions.
A. TRUE
B. FALSE
2) We apply PCA on image dataset.
A. TRUE
B. FALSE
3) PCA is based on variance maximization and distance minimization.
A. TRUE
B. FALSE
 Implement PCA for number of components = 3 and then visualize data, also load
iris dataset and perform same task
Assessment and Evaluation
Ans:1-A,2-A,3-A
For full code : https://github.com/Eshan2203/PCA-on-Boston-House-price-Data-
Set/blob/master/PCA_BOston.ipynb

More Related Content

PDF
Principal Component Analysis
PPTX
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
PPTX
PPTX
Naive Bayes
PDF
Network time protocol
PPTX
Presentation on data preparation with pandas
PPTX
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
PPTX
National income in India
Principal Component Analysis
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Naive Bayes
Network time protocol
Presentation on data preparation with pandas
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
National income in India

What's hot (20)

PPTX
PPT
Support Vector Machines
PPTX
Computational learning theory
PDF
Principal component analysis and lda
PPSX
PPTX
Optimization/Gradient Descent
PDF
Naive Bayes
PDF
Dimensionality Reduction
PPTX
Statistics for data science
PPTX
Pca(principal components analysis)
PPTX
Principal Component Analysis (PCA) and LDA PPT Slides
PPTX
Principal component analysis
PPTX
Hit and-miss transform
PDF
Machine Learning: Generative and Discriminative Models
PPT
Adaptive Resonance Theory
PPTX
Random forest
PDF
Introduction to Machine Learning with SciKit-Learn
PDF
Exploratory data analysis data visualization
PPTX
Lect5 principal component analysis
PPTX
Naive bayes
Support Vector Machines
Computational learning theory
Principal component analysis and lda
Optimization/Gradient Descent
Naive Bayes
Dimensionality Reduction
Statistics for data science
Pca(principal components analysis)
Principal Component Analysis (PCA) and LDA PPT Slides
Principal component analysis
Hit and-miss transform
Machine Learning: Generative and Discriminative Models
Adaptive Resonance Theory
Random forest
Introduction to Machine Learning with SciKit-Learn
Exploratory data analysis data visualization
Lect5 principal component analysis
Naive bayes
Ad

Similar to Implement principal component analysis (PCA) in python from scratch (20)

PPTX
Feature selection using PCA.pptx
PDF
PPTX
PCA Algorithmthatincludespcathatispca.pptx
PPTX
Dimensionality Reduction and feature extraction.pptx
PDF
Beginners Guide to Non-Negative Matrix Factorization
PPTX
PCACONFUSIONMATRIX.pptx
PPTX
PCA-LDA-Lobo.pptxttvertyuytreiopkjhgftfv
PPTX
Unit3_1.pptx
PDF
pca.pdf polymer nanoparticles and sensors
PPT
PCA-SIFT: A More Distinctive Representation for Local Image Descriptors
PPTX
Image recogonization
PDF
Panoramic Imaging using SIFT and SURF
ODP
Introduction to Principle Component Analysis
PDF
Pca analysis
PPTX
PCA and LDA in machine learning
DOCX
Principal Component Analysis
PDF
Machine Learning.pdf
PPTX
ML-Lec-18-NEW Dimensionality Reduction-PCA (1).pptx
PDF
4-RSSI-Spectral Domain Image Transforms_1.pdf
Feature selection using PCA.pptx
PCA Algorithmthatincludespcathatispca.pptx
Dimensionality Reduction and feature extraction.pptx
Beginners Guide to Non-Negative Matrix Factorization
PCACONFUSIONMATRIX.pptx
PCA-LDA-Lobo.pptxttvertyuytreiopkjhgftfv
Unit3_1.pptx
pca.pdf polymer nanoparticles and sensors
PCA-SIFT: A More Distinctive Representation for Local Image Descriptors
Image recogonization
Panoramic Imaging using SIFT and SURF
Introduction to Principle Component Analysis
Pca analysis
PCA and LDA in machine learning
Principal Component Analysis
Machine Learning.pdf
ML-Lec-18-NEW Dimensionality Reduction-PCA (1).pptx
4-RSSI-Spectral Domain Image Transforms_1.pdf
Ad

Recently uploaded (20)

PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Master Databricks SQL with AccentFuture – The Future of Data Warehousing
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Computer network topology notes for revision
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Foundation of Data Science unit number two notes
PDF
Mega Projects Data Mega Projects Data
PPTX
Measurement of Afordability for Water Supply and Sanitation in Bangladesh .pptx
PPTX
1_Introduction to advance data techniques.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPT
Quality review (1)_presentation of this 21
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Major-Components-ofNKJNNKNKNKNKronment.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Master Databricks SQL with AccentFuture – The Future of Data Warehousing
IB Computer Science - Internal Assessment.pptx
climate analysis of Dhaka ,Banglades.pptx
Fluorescence-microscope_Botany_detailed content
Computer network topology notes for revision
Introduction-to-Cloud-ComputingFinal.pptx
.pdf is not working space design for the following data for the following dat...
Clinical guidelines as a resource for EBP(1).pdf
Foundation of Data Science unit number two notes
Mega Projects Data Mega Projects Data
Measurement of Afordability for Water Supply and Sanitation in Bangladesh .pptx
1_Introduction to advance data techniques.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Quality review (1)_presentation of this 21
Supervised vs unsupervised machine learning algorithms
Introduction to Knowledge Engineering Part 1
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...

Implement principal component analysis (PCA) in python from scratch

  • 1. Presented by Eshan Agarwal Implement Principal Component Analysis(PCA) in python
  • 2. How do we choose the right features ? Given a classification problem ….
  • 3.  PCA is a method for reducing the dimensionality of data.  It can be thought of as a projection method where data with m-columns (features) is projected into a subspace with m or fewer columns, while retaining the essence of the original data. An PCA Xn km Introduction to PCA
  • 4. In this presentation, we will discover the PCA method for dimensionality reduction and how to implement it from scratch in Python.  Before go in deep of PCA let us understand some key points of PCA
  • 5.  Variance  The variance of each variable is the average squared deviation of its n values around the mean of that variable. It can also think of as spread of data points. Geometric Rationale of PCA
  • 6.  Covariance Covariance of variables i and j Sum over all n objects Value of variable i in object m Mean of variable i Value of variable j in object m Mean of variable j  Degree to which the variables are linearly correlated is represented by their covariances. Geometric Rationale of PCA
  • 7. Objective of PCA  Objective of PCA is to rigidly rotate the axes of this m-dimensional space to new positions (principal axes)  PCA is ordered such that principal axis 1 has the highest variance, axis 2 has the next highest variance .... , and axis p has the lowest variance
  • 8. Implement PCA in Python (Scratch)  Load the Data-Set :  We can use Boston Housing dataset for PCA. Boston dataset has 13 features. So question here is how to visualize the data ?. We can reduce the dimensions of data by using PCA and then visualize.
  • 9.  Standardize data:  PCA is largely affected by scales and different features might have different scales. So it is better to standardize data before finding PCA components. Sklearn’s StandardScaler scales data to scale of zero mean and unit variance.
  • 10. The Algebra of PCA  Calculating PCA involves following steps: a. Calculating the covariance matrix. b. Calculating the eigenvalues and eigenvector. c. Forming Principal Components. d. Projection into the new feature space. a b dc+ + ++ =
  • 11.  Calculating the covariance matrix (S) :  Covariance matrix is a matrix of variances and covariances (or correlations) among every pair of the m variables .  It is square, symmetric matrix.  Covariance matrix (S) = X.T * X, we can find it by using numpy matmul() function in python.
  • 12. Calculating the eigenvalues and eigenvector :  ƛ is an eigenvalue for a matrix X if it is a solution of the characteristic equation: det( ƛ*I - A ) = 0 Where, I is the identity matrix of the same dimension as X.  The sum of all m eigenvalues equals the trace of S (the sum of the variances of the original variables).
  • 13.  For each eigenvalue ƛ, a corresponding eigen-vector v, can be found by solving : ( ƛ*I - A )v = 0  The eigenvalues, 1, 2, ... m are the variances of the coordinates on each principal component axis. Calculating the eigenvalues and eigenvector :
  • 14.  We are using scipy.linalg, which have eigh function for finding the top eigen- values & eigen-vector, we are finding top 2 eigenvalues and eigenvectors as follow. Code for finding eigenvalues and eigenvector :
  • 15. Forming Principal Components :  Below is code for forming principal components, formed by two principal eigen vectors by vector-vector multiplication
  • 16.  Projection into the new feature space :  Creating a Data Frame having 1st principal & 2nd Principal components.
  • 18. Steps for PCA  Standardize the Data.  Calculate the covariance matrix.  Find the eigenvalues and eigenvectors of the covariance matrix.  Plot the eigenvectors / principal components over the scaled data.
  • 19. 1) [ True or False ] PCA can be used for projecting and visualizing data in lower dimensions. A. TRUE B. FALSE 2) We apply PCA on image dataset. A. TRUE B. FALSE 3) PCA is based on variance maximization and distance minimization. A. TRUE B. FALSE  Implement PCA for number of components = 3 and then visualize data, also load iris dataset and perform same task Assessment and Evaluation Ans:1-A,2-A,3-A
  • 20. For full code : https://github.com/Eshan2203/PCA-on-Boston-House-price-Data- Set/blob/master/PCA_BOston.ipynb

Editor's Notes

  • #4: How presentation will benefit audience: Adult learners are more interested in a subject if they know how or why it is important to them. Presenter’s level of expertise in the subject: Briefly state your credentials in this area, or explain why participants should listen to you.
  • #6: Lesson descriptions should be brief.
  • #7: Example objectives At the end of this lesson, you will be able to: Save files to the team Web server. Move files to different locations on the team Web server. Share files on the team Web server.