Data Normalization with Pandas
Last Updated :
12 May, 2025
Data normalization is the process of scaling numeric features to a standard range, preventing large values from dominating the learning process in machine learning models. It is a important step in machine learning and data analysis ensure that numerical features are on a similar scale for optimal model performance. Normalization helps to improve algorithm performance particularly for distance-based models like K-Nearest Neighbors (KNN) and Support Vector Machines (SVM). It is important because:
- Avoids numerical instability in models
- Speeds up convergence in gradient-based algorithms
- Ensures all features contribute equally to the analysis
Steps for Data Normalization in Pandas
Here we will apply some techniques to normalize the data and discuss these with the help of examples. For this let's understand the steps needed for data normalization with Pandas.
- Import the required libraries
- Load or create a dataset
- Apply different normalization techniques
- Visualize the results
Let's create a sample dataset using Pandas and visualize it.
Python
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame([
[180000, 110, 18.9, 1400],
[360000, 905, 23.4, 1800],
[230000, 230, 14.0, 1300],
[60000, 450, 13.5, 1500]
], columns=['Col A', 'Col B', 'Col C', 'Col D'])
print(df)
df.plot(kind='bar')
plt.show()
Output:


Normalization Techniques in Pandas
1. Maximum Absolute Scaling
This technique rescales each feature between -1 and 1 by dividing all values by the maximum absolute value in that column. This technique is especially useful when your data doesn’t contain negative numbers and you want to preserve the data’s sparsity. We can apply the maximum absolute scaling in Pandas using the .max() and .abs() methods as shown below. Let's apply normalization techniques one by one.
Python
max_scaled = df.copy()
for column in df_max_scaled.columns:
max_scaled[column] = max_scaled[column] / max_scaled[column].abs().max()
print(max_scaled)
max_scaled.plot(kind='bar')
plt.show()
Output :


As we can see in above output all values now lie between -1 and 1. Each value is shown in relation to the largest value in that column.
2. The min-max feature scaling
The min-max approach also called normalization rescales the feature to a hard and fast range of [0,1] by subtracting the minimum value of the feature then dividing by the range. . It works well for models like K-Nearest Neighbors (KNN) which compare distance between data points. We can apply the min-max scaling in Pandas using the .min() and .max() methods.
Python
scaled = df.copy()
for column in df_min_max_scaled.columns:
scaled[column] = (scaled[column] - scaled[column].min()) / (scaled[column].max() - scaled[column].min())
print(scaled)
scaled.plot(kind='bar')
plt.show()
Output :


After scaling the smallest value becomes 0 and the largest becomes 1. All other values lie between these two. This makes it easier for the machine learning model to handle features fairly.
3. The z-score method
The z-score method often called standardization changes the values in each column so that they have a mean of 0 and a standard deviation of 1. This technique is best when your data follow a normal distribution or when you want to treat values in terms of how far they are from the average.
Python
z_scaled = df.copy()
for column in z_scaled.columns:
z_scaled[column] = (z_scaled[column] - z_scaled[column].mean()) / z_scaled[column].std()
print(z_scaled)
z_scaled.plot(kind='bar')
plt.show()
Output :


After applying this method each feature is centered around zero and its spread is standardized. This helps in models like logistic regression, SVM and neural networks to perform better.
Similar Reads
Data Manipulation in Python using Pandas In Machine Learning, the model requires a dataset to operate, i.e. to train and test. But data doesnât come fully prepared and ready to use. There are discrepancies like Nan/ Null / NA values in many rows and columns. Sometimes the data set also contains some of the rows and columns which are not ev
6 min read
Manipulating DataFrames with Pandas - Python Before manipulating the dataframe with pandas we have to understand what is data manipulation. The data in the real world is very unpleasant & unordered so by performing certain operations we can make data understandable based on one's requirements, this process of converting unordered data into
4 min read
Streamlined Data Ingestion with Pandas Data Ingestion is the process of, transferring data, from varied sources to an approach, where it can be analyzed, archived, or utilized by an establishment. The usual steps, involved in this process, are drawing out data, from its current place, converting the data, and, finally loading it, in a lo
9 min read
Normalize A Column In Pandas In this article, we will learn how to normalize a column in Pandas. Let's discuss some concepts first : Pandas: Pandas is an open-source library that's built on top of the NumPy library. It is a Python package that provides various data structures and operations for manipulating numerical data and s
3 min read
Creating a Pandas Series A Pandas Series is like a single column of data in a spreadsheet. It is a one-dimensional array that can hold many types of data such as numbers, words or even other Python objects. Each value in a Series is associated with an index, which makes data retrieval and manipulation easy. This article exp
3 min read