Open In App

How to Standardize Data in a Pandas DataFrame?

Last Updated : 14 Jun, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Standardization is an essential step when preparing data for machine learning and analysis. Real-world datasets often contain columns or features with different scales for example, one feature might be age ranging from 20 to 70, while another might be income ranging from ₹30,000 to ₹10,00,000. If we feed such unscaled data into a model, features with large values can dominate the learning process.

To avoid this, standardization transforms each numeric column so that it has a mean of 0 and a standard deviation of 1. This brings all features to a common scale, allowing the model to treat them equally and learn patterns more effectively and fairly.

HighFrequency
Standardized Data Curve

Let’s explore some effective methods to standardize numeric columns in a Pandas DataFrame.

Using StandardScaler()

This is the most efficient method for standardizing data, especially in machine learning. It automatically transforms all numeric columns so they have a mean of 0 and standard deviation of 1. Fast, accurate and perfect for large datasets.

Python
import pandas as pd
from sklearn.preprocessing import StandardScaler
data = {
    'c1': [1, 3, 5, 7, 9],
    'c2': [7, 4, 35, 14, 56]
}
df = pd.DataFrame(data)
sc = StandardScaler() # Standardization
scaled = sc.fit_transform(df)

res = pd.DataFrame(scaled, columns=df.columns)
print(res)

Output

         c1        c2
0 -1.414214 -0.824387
1 -0.707107 -0.977052
2 0.000000 0.600480
3 0.707107 -0.468171
4 1.414214 1.669130

Explanation: sklearn.preprocessing uses StandardScaler() to scale columns like c1 and c2 to a mean of 0 and standard deviation of 1, ensuring uniform feature scaling. fit_transform() performs both computation and transformation.

Using zscore()

A quick and easy way to standardize a single column. It tells how far each value is from the average in terms of standard deviation. Great for quick checks or small tasks.

Python
import pandas as pd
from scipy.stats import zscore

df = pd.DataFrame({
    'col1': [1, 3, 5, 7, 9],
    'col2': [7, 4, 35, 14, 56]
})
df['col2'] = zscore(df['col2'])
print(df)

Output

   col1      col2
0 1 -0.824387
1 3 -0.977052
2 5 0.600480
3 7 -0.468171
4 9 1.669130

Explanation: scipy.stats uses zscore() to standardize a column like col2, scaling it to have a mean of 0 and standard deviation of 1. This ensures consistent feature scaling. zscore() handles both the calculation and transformation.

Using vectorized pandas standardization

You apply the standardization formula manually using Pandas, (value - mean) / std. It’s great for learning and gives you control without needing extra libraries.

Python
import pandas as pd
df = pd.DataFrame({
    'col1': [1, 3, 5, 7, 9],
    'col2': [7, 4, 35, 14, 56]
})
df['col1'] = (df['col1'] - df['col1'].mean()) / df['col1'].std()
print(df)

Output
       col1  col2
0 -1.264911     7
1 -0.632456     4
2  0.000000    35
3  0.632456    14
4  1.264911    56

Explanation: This code manually standardizes the col1 column by subtracting its mean and dividing by its standard deviation, ensuring it has a mean of 0 and standard deviation of 1.

Using .apply()

This method lets you apply the standardization formula to all columns at once. It’s cleaner and shorter than doing each column manually and useful for small to medium datasets.

Python
import pandas as pd
df = pd.DataFrame({
    'col1': [1, 3, 5, 7, 9],
    'col2': [7, 4, 35, 14, 56]
})

res = df.apply(lambda x: (x - x.mean()) / x.std())
print(res)

Output
       col1      col2
0 -1.264911 -0.737355
1 -0.632456 -0.873902
2  0.000000  0.537085
3  0.632456 -0.418745
4  1.264911  1.492915

Explanation: This code standardizes all columns in the DataFrame using apply() with a lambda function, scaling each to have a mean of 0 and standard deviation of 1.


Next Article
Practice Tags :

Similar Reads