SlideShare a Scribd company logo
How to
Standardize Your Data:
A ML Recipe
DAMIAN MINGLE
CHIEF DATA SCIENTIST, WPC Healthcare
@DamianMingle
GET THE FULL STORY
bit.ly/UseSciKitNow
What’s Standardization Anyway?
• Often referred to as “functions and transformers that change raw
feature vectors into a representation that is more suitable for the
downstream estimator”
• Shifting the distribution of each attribute to have a mean of “0”
and a standard deviation of “1”.
Why Standardization Matters
• It’s a common requirement of models
• Models may behave badly without it
• It’s useful for models that rely on the distribution of attributes
such as Gaussian processes.
Power in SciKit Learn
• Preprocessing
• Clustering
• Regression
• Classification
• Dimensionality Reduction
• Model Selection
Power of SciKit Learn
Let’s Look at ML Recipe
Standardization
The Imports
from sklearn.datasets import load_iris
from sklearn import preprocessing
Separate Features from Target
iris = load_iris()
print(iris.data.shape)
X = iris.data
y = iris.target
Standardize the Features
normalized_X = preprocessing.scale(X)
Standardization Recipe
# Normalize the data attributes for the Iris
dataset.
from sklearn.datasets import load_iris
from sklearn import preprocessing
# load the iris dataset iris = load_iris()
print(iris.data.shape)
# separate the data from the target attributes
X = iris.data
y = iris.target
# normalize the data attributes
normalized_X = preprocessing.scale(X)
How to
Standardize Your Data:
An ML Recipe
DAMIAN MINGLE
CHIEF DATA SCIENTIST, WPC Healthcare
@DamianMingle
GET THE FULL STORY
bit.ly/UseSciKitNow
Resources
• Society of Data Scientists
• SciKit Learn
• Also:
• Scaling features to a range (MinMaxScaler or MaxAbsScaler)
• Scaling sparse data (StandardScaler)
• Scaling data with outliers (RobustScaler)

More Related Content

PDF
Redux data flow with angular
PDF
Data Quality Everywhere
PDF
Oracle Ucm General Presentation Linked In
PPTX
Crm strategy of call centre
PPTX
Scikit Learn: Data Normalization Techniques That Work
PPTX
Feature Scaling and Normalization Feature Scaling and Normalization.pptx
PPTX
Data Preprocessing
PPTX
Data Transformation – Standardization & Normalization PPM.pptx
Redux data flow with angular
Data Quality Everywhere
Oracle Ucm General Presentation Linked In
Crm strategy of call centre
Scikit Learn: Data Normalization Techniques That Work
Feature Scaling and Normalization Feature Scaling and Normalization.pptx
Data Preprocessing
Data Transformation – Standardization & Normalization PPM.pptx

Similar to SciKit Learn: How to Standardize Your Data (20)

PPTX
Data Preprocessing:Feature scaling methods
PPTX
Preparing your data for Machine Learning with Feature Scaling
PPTX
Machine learning session 5
PPTX
Pandas Data Cleaning and Preprocessing PPT.pptx
PPTX
Introduction to ML_Data Preprocessing.pptx
PDF
The model interacts with the environment seeking ways to maximize the reward....
PDF
Preparing Data
PDF
Introduction to Artificial Intelligence_ Lec 5
PDF
Feature Scaling with R.pdf
PPTX
Data preprocessing in Machine learning
PDF
Normalization and standardization in machine learning
PPTX
Data_Preparation.pptx
PDF
ML-Unit-4.pdf
PDF
13_Data Preprocessing in Python.pptx (1).pdf
PPTX
Machine Learning - Dataset Preparation
PDF
Ijsws14 423 (1)-paper-17-normalization of data in (1)
PPTX
Human breastcancer
PPTX
Feature scaling
PDF
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
PDF
overview of_data_processing
 
Data Preprocessing:Feature scaling methods
Preparing your data for Machine Learning with Feature Scaling
Machine learning session 5
Pandas Data Cleaning and Preprocessing PPT.pptx
Introduction to ML_Data Preprocessing.pptx
The model interacts with the environment seeking ways to maximize the reward....
Preparing Data
Introduction to Artificial Intelligence_ Lec 5
Feature Scaling with R.pdf
Data preprocessing in Machine learning
Normalization and standardization in machine learning
Data_Preparation.pptx
ML-Unit-4.pdf
13_Data Preprocessing in Python.pptx (1).pdf
Machine Learning - Dataset Preparation
Ijsws14 423 (1)-paper-17-normalization of data in (1)
Human breastcancer
Feature scaling
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
overview of_data_processing
 
Ad

More from Damian R. Mingle, MBA (12)

PDF
Classify Rice Disease Using Self-Optimizing Models and Edge Computing with A...
DOCX
Predicting Diabetic Readmission Rates: Moving Beyond HbA1c
PDF
Greek Letters with LaTeX Cheat Sheet
PPTX
Clustering: A Scikit Learn Tutorial
PPTX
Scikit Learn: How to Deal with Missing Values
PPTX
What is sepsis?
PDF
Controlling informative features for improved accuracy and faster predictions...
PDF
The evolving definition of sepsis
PPTX
Data and the Changing Role of the Tech Savvy CFO
PDF
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
PPTX
Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...
PDF
A Multi-Pronged Approach to Data Mining Post-Acute Care Episodes
Classify Rice Disease Using Self-Optimizing Models and Edge Computing with A...
Predicting Diabetic Readmission Rates: Moving Beyond HbA1c
Greek Letters with LaTeX Cheat Sheet
Clustering: A Scikit Learn Tutorial
Scikit Learn: How to Deal with Missing Values
What is sepsis?
Controlling informative features for improved accuracy and faster predictions...
The evolving definition of sepsis
Data and the Changing Role of the Tech Savvy CFO
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...
A Multi-Pronged Approach to Data Mining Post-Acute Care Episodes
Ad

Recently uploaded (20)

PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Leprosy and NLEP programme community medicine
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Managing Community Partner Relationships
PDF
Transcultural that can help you someday.
PDF
Introduction to the R Programming Language
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
modul_python (1).pptx for professional and student
PDF
Lecture1 pattern recognition............
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Business Analytics and business intelligence.pdf
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
IB Computer Science - Internal Assessment.pptx
Leprosy and NLEP programme community medicine
climate analysis of Dhaka ,Banglades.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Qualitative Qantitative and Mixed Methods.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Reliability_Chapter_ presentation 1221.5784
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Managing Community Partner Relationships
Transcultural that can help you someday.
Introduction to the R Programming Language
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
modul_python (1).pptx for professional and student
Lecture1 pattern recognition............
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Business Analytics and business intelligence.pdf
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...

SciKit Learn: How to Standardize Your Data