Python | Pandas DataFrame.set_index()
Last Updated :
11 Jul, 2025
Pandas set_index() method is used to set one or more columns of a DataFrame as the index. This is useful when we need to modify or add new indices to our data as it enhances data retrieval, indexing and merging tasks. Setting the index is helpful for organizing the data more efficiently, especially when we have meaningful column values that can act as identifiers such as employee names, IDs or dates.
Lets see a basic example:
Here we are using a Employee Dataset which you can download it from here. Let’s first load the Employee Dataset to see how to use set_index().
Python
import pandas as pd
data = pd.read_csv("/content/employees.csv")
print("Employee Dataset:")
display(data.head(5))
Output:
Employee DatasetNow we are using Pandas DataFrame.set_index() to set a Single Column as Index.
Python
data.set_index("First Name", inplace=True)
print("\nEmployee Dataset with 'First Name' as Index:")
display(data.head(5))
Output:
Index is replaced with the "First Name" columnWe set the "First Name" column as the index which makes it easier to access data by the employee's first name.
Syntax:
DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)
Parameters:
- keys: A single column name or a list of column names to set as the index.
- drop: Boolean (default: True). If True, the specified column will be removed from the DataFrame. If False, they are retained as regular columns.
- append: Boolean (default: False). If True, the column will be added to the existing index, creating a multi-level index.
- inplace: Boolean (default: False). If True, modifies the original DataFrame without returning a new one.
- verify_integrity: Boolean (default: False). If True, checks for duplicate index values.
Return: Return type is a new DataFrame with the specified index, unless inplace=True which modifies the original DataFrame directly.
Now let see some practical examples better understand how to use the Pandas set_index() function.
1. Setting Multiple Columns as Index (MultiIndex)
In this example, we set both First Name and Gender as the index columns using the set_index() method with the append and drop parameters. This is useful when we want to organize data by multiple columns.
Python
import pandas as pd
data = pd.read_csv("employees.csv")
data.set_index(["First Name", "Gender"], inplace=True, append=True, drop=False)
data.head()
Output:
Set Multiple Columns as MultiIndex2. Setting a Float Column as Index
In some cases, we may want to use numeric or float columns as the index which is useful for datasets with scores or other numeric data that should act as unique identifiers. Here, we set the Agg_Marks (a float column) as the index for a DataFrame containing student data.
Python
import pandas as pd
students = [['jack', 34, 'Sydeny', 'Australia', 85.96],
['Riti', 30, 'Delhi', 'India', 95.20],
['Vansh', 31, 'Delhi', 'India', 85.25],
['Nanyu', 32, 'Tokyo', 'Japan', 74.21],
['Maychan', 16, 'New York', 'US', 99.63],
['Mike', 17, 'Las Vegas', 'US', 47.28]]
df = pd.DataFrame(students, columns=['Name', 'Age', 'City', 'Country', 'Agg_Marks'])
df.set_index('Agg_Marks', inplace=True)
display(df)
Output:
Setting a Float Column as Index3. Setting Index of Specific Column (with drop=False)
By default, set_index() removes the column used as the index. However, if we want to keep the column after it’s set as the index, we can use the drop=False parameter.
Python
import pandas as pd
data = pd.read_csv("/content/employees.csv")
data.set_index("First Name", drop=False, inplace=True)
print(data.head())
Output:
Using drop=False Using drop=False ensures that the "First Name" column is retained even after it is set as the index.
4. Setting Index Using inplace=True
When we want to modify the original DataFrame directly rather than creating a new DataFrame, we can use inplace=True.
Python
import pandas as pd
data = {'Name': ['Geek1', 'Geek2', 'Geek3'],
'Age': [25, 30, 35],
'City': ['New York', 'San Francisco', 'Los Angeles']}
df = pd.DataFrame(data)
df.set_index('Name', inplace=True)
display(df)
Output:
Setting Index Using inplace=TrueWith set_index(), we can easily organize our data, making it simpler to access and analyze, ultimately improving our workflow.
Similar Reads
Pandas DataFrame.where()-Python DataFrame.where() function replace values in a DataFrame based on a condition. It allows you to keep the original value where a condition is True and replace it with something else e.g., NaN or a custom value where the condition is False. For Example:Pythonimport pandas as pd import numpy as np df =
2 min read
Python | Delete rows/columns from DataFrame using Pandas.drop() Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages which makes importing and analyzing data much easier. In this article, we will how to delete a row in Excel using Pandas as well as delete
4 min read
Pandas dataframe.groupby() Method Pandas groupby() function is a powerful tool used to split a DataFrame into groups based on one or more columns, allowing for efficient data analysis and aggregation. It follows a "split-apply-combine" strategy, where data is divided into groups, a function is applied to each group, and the results
6 min read
Pandas DataFrame corr() Method Pandas dataframe.corr() is used to find the pairwise correlation of all columns in the Pandas Dataframe in Python. Any NaN values are automatically excluded. To ignore any non-numeric values, use the parameter numeric_only = True. In this article, we will learn about DataFrame.corr() method in Pytho
4 min read
Pandas query() Method Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages that makes importing and analyzing data much easier. Analyzing data requires a lot of filtering operations. Pandas Dataframe provide many
2 min read
Pandas dataframe.insert()-Python DataFrame.insert() function in pandas inserts a new column into a DataFrame at a specified position. It allows you to specify the column index, column label and values to insert. This is particularly useful when you want to place a new column in a specific position instead of just appending it at th
4 min read
Pandas dataframe.sum() DataFrame.sum() function in Pandas allows users to compute the sum of values along a specified axis. It can be used to sum values along either the index (rows) or columns, while also providing flexibility in handling missing (NaN) values. Example:Pythonimport pandas as pd data = { 'A': [1, 2, 3], 'B
4 min read
Pandas DataFrame mean() Method Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas DataFrame mean()Â Pandas dataframe.mean() function returns the mean of the value
2 min read
Python | Pandas dataframe.median() Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas dataframe.median() function return the median of the values for the requested a
2 min read
Python | Pandas Series.std() Pandas series is a One-dimensional ndarray with axis labels. The labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Pandas Series.std() function return sample
2 min read