Drop Duplicates Ignoring One Column-Pandas
Last Updated :
28 Apr, 2025
Pandas provide various features for users to implement on datasets. One such feature is dropping the duplicate rows, which can be done using the drop_duplicates function available in Pandas. There are some cases where the user wants to eliminate the duplicates but does not consider any certain column while removing duplicates. We will explore four approaches to drop duplicates ignoring one column in pandas.
Drop Duplicates Ignoring One Column-Pandas
- Using the subset parameter
- Using duplicated and boolean indexing
- Using drop_duplicates and keep parameter
- Using group by and first
Using the subset parameter
The drop_duplicates function has one crucial parameter, called subset, which allows the user to put the function only on specified columns. In this method, we will see how to drop the duplicates ignoring one column by stating other columns that we don't want to ignore as a list in the subset parameter.
Syntax:
dropped_df = df.drop_duplicates(subset=['#column-1', '#column-2'])
Here,
- column-1, column-2: These are the columns that you don't want to ignore.
- column-3: It is the column that you want to ignore.
- df: It is the data frame from which duplicates need to be dropped.
In the below example, we have defined the data with three columns, first_name, last_name, and fees. Then, we have removed the duplicates ignoring the first_name column, by stating the last_name and fees columns in the subset parameter.
Python3
# Import the pandas library
import pandas as pd
# Define the data
data = {'first_name': ['Arun', 'Ishita', 'Ruchir', 'Vinayak'], 'last_name': [
'Kumar', 'Rai', 'Jha', 'Rai'], 'fees': [5000, 6000, 5000, 6000]}
# Convert data to Pandas dataframe
df = pd.DataFrame(data)
# Print the actual data frame
print('Actual DataFrame:\n', df)
# Defining the list of columns that you want to consider
dropped_df = df.drop_duplicates(subset=['last_name', 'fees'])
# Print the dataframe without duplicates
print('DataFrame after removing duplicates:\n', dropped_df)
Output:
Actual DataFrame:
first_name last_name fees
0 Arun Kumar 5000
1 Ishita Rai 6000
2 Ruchir Jha 5000
3 Vinayak Rai 6000
DataFrame after removing duplicates:
first_name last_name fees
0 Arun Kumar 5000
1 Ishita Rai 6000
2 Ruchir Jha 5000
Using duplicated() and Boolean Indexing
The ~ arrow denotes the boolean indexing for the dataset, while the duplicated() function gives true or false as an output denoting if the row is duplicate or not. In this approach, we will see how to drop duplicates ignoring one column using duplicated and boolean indexing.
Syntax:
dropped_df = df[~df.duplicated(subset=['#column-1', '#column-2'])]
Here,
- column-1, column-2: These are the columns that you don't want to ignore.
- df: It is the data frame from which duplicates need to be dropped.
In the below example, we have defined the data with three columns, first_name, last_name, and fees. Then, we removed the duplicates ignoring the first_name column, using duplicated and boolean indexing.
Python3
# Import the pandas library
import pandas as pd
# Define the data
data = {'first_name': ['Arun', 'Ishita', 'Ruchir', 'Vinayak'], 'last_name': [
'Kumar', 'Rai', 'Jha', 'Rai'], 'fees': [5000, 6000, 5000, 6000]}
# Convert data to Pandas dataframe
df = pd.DataFrame(data)
# Print the actual data frame
print('Actual DataFrame:\n', df)
# Dropping the duplicates using duplicated and boolean indexing
dropped_df = df[~df.duplicated(subset=['last_name', 'fees'])]
# Print the dataframe without duplicates
print('DataFrame after removing duplicates:\n', dropped_df)
Output:
Actual DataFrame:
first_name last_name fees
0 Arun Kumar 5000
1 Ishita Rai 6000
2 Ruchir Jha 5000
3 Vinayak Rai 6000
DataFrame after removing duplicates:
first_name last_name fees
0 Arun Kumar 5000
1 Ishita Rai 6000
2 Ruchir Jha 5000
Using drop_duplicates and keep Parameter
The function dataframe.columns.difference() allows the users to create a new data frame keeping certain columns and ignoring certain columns. In this method, we will first create a new data frame ignoring the column to be ignored, and then remove duplicates from the new data frame.
Syntax:
dropped_df=df.drop_duplicates(subset=source_df.columns.difference(['#column-3']))
Here,
- column-1, column-2: These are the columns that you don't want to ignore
- column-3: It is the column that you want to ignore
- df: It is the data frame from which duplicates need to be dropped.
In the below example, we have defined the data with three columns, first_name, last_name, and fees. Then, we removed the duplicates ignoring the first_name column, by stating the first_name column in the difference function.
Python3
# Import the pandas library
import pandas as pd
# Define the data
data = {'first_name': ['Arun', 'Ishita', 'Ruchir', 'Vinayak'], 'last_name': [
'Kumar', 'Rai', 'Jha', 'Rai'], 'fees': [5000, 6000, 5000, 6000]}
# Convert data to Pandas dataframe
df = pd.DataFrame(data)
# Print the actual data frame
print('Actual DataFrame:\n', df)
# Stating the column that you want to ignore
dropped_df = df.drop_duplicates(subset=df.columns.difference(['first_name']))
# Print the dataframe without duplicates
print('DataFrame after removing duplicates:\n', dropped_df)
Output:
Actual DataFrame:
first_name last_name fees
0 Arun Kumar 5000
1 Ishita Rai 6000
2 Ruchir Jha 5000
3 Vinayak Rai 6000
DataFrame after removing duplicates:
first_name last_name fees
0 Arun Kumar 5000
1 Ishita Rai 6000
2 Ruchir Jha 5000
Using groupby() and first() fUNCTION
The way to remove all other duplicates keeping the first one is called the first function, while the way of grouping large amounts of data is called groupby() function. In this method, we will see how to drop duplicates ignoring one column using group by and first function.
Syntax:
dropped_df = df.groupby(['#column-1', '#column-2']).first()
Here,
- column-1, column-2: These are the columns that you don't want to ignore.
- df: It is the data frame from which duplicates need to be dropped.
In the below example, we have defined the data with three columns, first_name, last_name, and fees. Then, we removed the duplicates ignoring the first_name column, using group by and first.
Python3
# Import the pandas library
import pandas as pd
# Define the data
data = {'first_name': ['Arun', 'Ishita', 'Ruchir', 'Vinayak'], 'last_name': [
'Kumar', 'Rai', 'Jha', 'Rai'], 'fees': [5000, 6000, 5000, 6000]}
# Convert data to Pandas dataframe
df = pd.DataFrame(data)
# Print the actual data frame
print('Actual DataFrame:\n', df)
# Dropping the duplicates using groupby and first
dropped_df = df.groupby(['last_name', 'fees']).first()
# Print the dataframe without duplicates
print('DataFrame after removing duplicates:\n', dropped_df)
Output:
Actual DataFrame:
first_name last_name fees
0 Arun Kumar 5000
1 Ishita Rai 6000
2 Ruchir Jha 5000
3 Vinayak Rai 6000
DataFrame after removing duplicates:
first_name last_name fees
0 Arun Kumar 5000
1 Ishita Rai 6000
2 Ruchir Jha 5000
Similar Reads
Drop Empty Columns in Pandas Cleaning data is an essential step in data analysis. In this guide we will explore different ways to drop empty, null and zero-value columns in a Pandas DataFrame using Python. By the end you'll know how to efficiently clean your dataset using the dropna() and replace() methods. Understanding dropna
3 min read
How to Find & Drop duplicate columns in a Pandas DataFrame? Letâs discuss How to Find and drop duplicate columns in a Pandas DataFrame. First, Letâs create a simple Dataframe with column names 'Name', 'Age', 'Domicile', and 'Age'/'Marks'. Find Duplicate Columns from a DataFrameTo find duplicate columns we need to iterate through all columns of a DataFrame a
4 min read
Python | Pandas Index.drop_duplicates() Pandas Index.drop_duplicates() function return Index with duplicate values removed in Python. Syntax of Pandas Index.drop_duplicates() Syntax: Index.drop_duplicates(labels, errors='raise')Â Parameters : keep : {âfirstâ, âlastâ, False} âfirstâ : Drop duplicates except for the first occurrence.(defaul
2 min read
How to Drop Index Column in Pandas? When working with Pandas DataFrames, it's common to reset or remove custom indexing, especially after filtering or modifying rows. Dropping the index is useful when:We no longer need a custom index.We want to restore default integer indexing (0, 1, 2, ...).We're preparing data for exports or transfo
2 min read
Delete duplicates in a Pandas Dataframe based on two columns A dataframe is a two-dimensional, size-mutable tabular data structure with labeled axes (rows and columns). It can contain duplicate entries and to delete them there are several ways. The dataframe contains duplicate values in column order_id and customer_id. Below are the methods to remove duplica
2 min read