How to Standardize Data in a Pandas DataFrame?
Last Updated :
14 Jun, 2025
Standardization is an essential step when preparing data for machine learning and analysis. Real-world datasets often contain columns or features with different scales for example, one feature might be age ranging from 20 to 70, while another might be income ranging from ₹30,000 to ₹10,00,000. If we feed such unscaled data into a model, features with large values can dominate the learning process.
To avoid this, standardization transforms each numeric column so that it has a mean of 0 and a standard deviation of 1. This brings all features to a common scale, allowing the model to treat them equally and learn patterns more effectively and fairly.
Standardized Data CurveLet’s explore some effective methods to standardize numeric columns in a Pandas DataFrame.
Using StandardScaler()
This is the most efficient method for standardizing data, especially in machine learning. It automatically transforms all numeric columns so they have a mean of 0 and standard deviation of 1. Fast, accurate and perfect for large datasets.
Python
import pandas as pd
from sklearn.preprocessing import StandardScaler
data = {
'c1': [1, 3, 5, 7, 9],
'c2': [7, 4, 35, 14, 56]
}
df = pd.DataFrame(data)
sc = StandardScaler() # Standardization
scaled = sc.fit_transform(df)
res = pd.DataFrame(scaled, columns=df.columns)
print(res)
Output
c1 c2
0 -1.414214 -0.824387
1 -0.707107 -0.977052
2 0.000000 0.600480
3 0.707107 -0.468171
4 1.414214 1.669130
Explanation: sklearn.preprocessing uses StandardScaler() to scale columns like c1 and c2 to a mean of 0 and standard deviation of 1, ensuring uniform feature scaling. fit_transform() performs both computation and transformation.
Using zscore()
A quick and easy way to standardize a single column. It tells how far each value is from the average in terms of standard deviation. Great for quick checks or small tasks.
Python
import pandas as pd
from scipy.stats import zscore
df = pd.DataFrame({
'col1': [1, 3, 5, 7, 9],
'col2': [7, 4, 35, 14, 56]
})
df['col2'] = zscore(df['col2'])
print(df)
Output
col1 col2
0 1 -0.824387
1 3 -0.977052
2 5 0.600480
3 7 -0.468171
4 9 1.669130
Explanation: scipy.stats uses zscore() to standardize a column like col2, scaling it to have a mean of 0 and standard deviation of 1. This ensures consistent feature scaling. zscore() handles both the calculation and transformation.
Using vectorized pandas standardization
You apply the standardization formula manually using Pandas, (value - mean) / std. It’s great for learning and gives you control without needing extra libraries.
Python
import pandas as pd
df = pd.DataFrame({
'col1': [1, 3, 5, 7, 9],
'col2': [7, 4, 35, 14, 56]
})
df['col1'] = (df['col1'] - df['col1'].mean()) / df['col1'].std()
print(df)
Output col1 col2
0 -1.264911 7
1 -0.632456 4
2 0.000000 35
3 0.632456 14
4 1.264911 56
Explanation: This code manually standardizes the col1 column by subtracting its mean and dividing by its standard deviation, ensuring it has a mean of 0 and standard deviation of 1.
Using .apply()
This method lets you apply the standardization formula to all columns at once. It’s cleaner and shorter than doing each column manually and useful for small to medium datasets.
Python
import pandas as pd
df = pd.DataFrame({
'col1': [1, 3, 5, 7, 9],
'col2': [7, 4, 35, 14, 56]
})
res = df.apply(lambda x: (x - x.mean()) / x.std())
print(res)
Output col1 col2
0 -1.264911 -0.737355
1 -0.632456 -0.873902
2 0.000000 0.537085
3 0.632456 -0.418745
4 1.264911 1.492915
Explanation: This code standardizes all columns in the DataFrame using apply() with a lambda function, scaling each to have a mean of 0 and standard deviation of 1.
Related articles