Open In App

Python | Pandas DataFrame.set_index()

Last Updated : 11 Jul, 2025
Summarize
Comments
Improve
Suggest changes
Share
Like Article
Like
Report

Pandas set_index() method is used to set one or more columns of a DataFrame as the index. This is useful when we need to modify or add new indices to our data as it enhances data retrieval, indexing and merging tasks. Setting the index is helpful for organizing the data more efficiently, especially when we have meaningful column values that can act as identifiers such as employee names, IDs or dates.

Lets see a basic example:

Here we are using a Employee Dataset which you can download it from here. Let’s first load the Employee Dataset to see how to use set_index().

Python
import pandas as pd

data = pd.read_csv("/content/employees.csv")
print("Employee Dataset:")
display(data.head(5))

Output:

Employee-Dataset
Employee Dataset

Now we are using Pandas DataFrame.set_index() to set a Single Column as Index.

Python
data.set_index("First Name", inplace=True)
print("\nEmployee Dataset with 'First Name' as Index:")
display(data.head(5))

Output:

Employee-Dataset
Index is replaced with the "First Name" column

We set the "First Name" column as the index which makes it easier to access data by the employee's first name.

Syntax:

DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)

Parameters:

  1. keys: A single column name or a list of column names to set as the index.
  2. drop: Boolean (default: True). If True, the specified column will be removed from the DataFrame. If False, they are retained as regular columns.
  3. append: Boolean (default: False). If True, the column will be added to the existing index, creating a multi-level index.
  4. inplace: Boolean (default: False). If True, modifies the original DataFrame without returning a new one.
  5. verify_integrity: Boolean (default: False). If True, checks for duplicate index values.

Return: Return type is a new DataFrame with the specified index, unless inplace=True which modifies the original DataFrame directly.

Now let see some practical examples better understand how to use the Pandas set_index() function.

1. Setting Multiple Columns as Index (MultiIndex)

In this example, we set both First Name and Gender as the index columns using the set_index() method with the append and drop parameters. This is useful when we want to organize data by multiple columns.

Python
import pandas as pd
data = pd.read_csv("employees.csv")

data.set_index(["First Name", "Gender"], inplace=True, append=True, drop=False)
data.head()

Output:

Set-multiple-columns-as-Index
Set Multiple Columns as MultiIndex

2. Setting a Float Column as Index

In some cases, we may want to use numeric or float columns as the index which is useful for datasets with scores or other numeric data that should act as unique identifiers. Here, we set the Agg_Marks (a float column) as the index for a DataFrame containing student data.

Python
import pandas as pd

students = [['jack', 34, 'Sydeny', 'Australia', 85.96],
            ['Riti', 30, 'Delhi', 'India', 95.20],
            ['Vansh', 31, 'Delhi', 'India', 85.25],
            ['Nanyu', 32, 'Tokyo', 'Japan', 74.21],
            ['Maychan', 16, 'New York', 'US', 99.63],
            ['Mike', 17, 'Las Vegas', 'US', 47.28]]

df = pd.DataFrame(students, columns=['Name', 'Age', 'City', 'Country', 'Agg_Marks'])

df.set_index('Agg_Marks', inplace=True)
display(df)

Output:

Set-a-Float-Column-as-Index
Setting a Float Column as Index

3. Setting Index of Specific Column (with drop=False)

By default, set_index() removes the column used as the index. However, if we want to keep the column after it’s set as the index, we can use the drop=False parameter.

Python
import pandas as pd

data = pd.read_csv("/content/employees.csv")

data.set_index("First Name", drop=False, inplace=True)

print(data.head())

Output:

set1
Using drop=False

Using drop=False ensures that the "First Name" column is retained even after it is set as the index.

4. Setting Index Using inplace=True

When we want to modify the original DataFrame directly rather than creating a new DataFrame, we can use inplace=True.

Python
import pandas as pd

data = {'Name': ['Geek1', 'Geek2', 'Geek3'],
        'Age': [25, 30, 35],
        'City': ['New York', 'San Francisco', 'Los Angeles']}

df = pd.DataFrame(data)

df.set_index('Name', inplace=True)
display(df)

Output:

Set-Index-of-Specific-Column
Setting Index Using inplace=True

With set_index(), we can easily organize our data, making it simpler to access and analyze, ultimately improving our workflow.


Similar Reads