When working with large datasets, there are often columns that are irrelevant or redundant. Pandas provides an efficient way to remove these unnecessary columns using the `drop()` function. In this article, we will cover various methods to drop columns from a DataFrame.
Python
import pandas as pd
data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)
df = df.drop('B', axis=1)
print(df)
Output:
A C
0 1 7
1 2 8
2 3 9
In this example, the DataFrame df
will now only contain columns 'A' and 'C'.
We will load this dataset into a Pandas DataFrame and explore different ways to drop columns. Dataset: data.csv
Displaying the first five rows of DatasetMethod 1: Dropping Columns by Name
The most common method for removing columns is by specifying the column name. drop() function allows us to easily remove one or more columns.
A) Call the drop() method with the column name to drop.
Python
# Drop the 'Gender' column
df = df.drop('Gender', axis=1)
print(df)
Output:
Dropping by NamesIn this example, the Gender column is dropped by specifying its name. The axis=1 argument tells Pandas to drop a column (since axis=0 refers to rows).
B) If we need to drop multiple columns, we can pass a list of column names to the drop() function.
Python
# Drop both 'Age' and 'Gender' columns
df = df.drop(['Age', 'Gender'], axis=1)
print(df)
Output:
Dropping multiple columns This method removes both the Age and Gender columns from the DataFrame. By passing a list of column names, you can drop multiple columns at once.
You can refer this article for more detailed explanation: How to drop one or multiple columns in pandas dataframe
Method 2: Dropping Columns Inplace
If you want to modify the original DataFrame directly, you can use the inplace=True argument. This avoids creating a new DataFrame and applies the change to the existing one.
- Use inplace=True to remove columns directly.
Python
import pandas as pd
data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)
# Drop column 'B' inplace
df.drop('B', axis=1, inplace=True)
print(df)
Output A C
0 1 7
1 2 8
2 3 9
Using inplace=True removes the Salary column from the DataFrame without creating a new variable. The change is applied directly to the original DataFrame.
Method 3: Dropping Columns Using del
Another way to drop a column from a DataFrame is by using the `del` statement. This method directly deletes the column from the DataFrame.
- Use del to remove a column.
Python
import pandas as pd
data = {'A': [10, 20, 30], 'B': [40, 50, 60], 'C': [70, 80, 90]}
df = pd.DataFrame(data)
# Drop column 'C' using del
del df['C']
print(df)
Output A B
0 10 40
1 20 50
2 30 60
The del statement is a simple and direct way to remove a column from the DataFrame. It directly deletes the column, and no new DataFrame is created.
Method 4: Dropping Columns with Missing Values
You can also drop columns that contain missing values using the dropna() method. This is useful when you want to clean the dataset by removing columns with NaN values.
- Use dropna(axis=1) to remove columns with missing values.
Python
import pandas as pd
data = {'A': [10, None, 30], 'B': [None, 50, 60], 'C': [70, 80, 90]}
df = pd.DataFrame(data)
# Drop columns with missing values
df.dropna(axis=1, inplace=True)
print(df)
This removes any column that contains missing values (NaN). You can also specify thresh to drop columns that don’t meet a certain number of non-null values.
You can refer this article for more detailed explanation: Working with Missing Data in Pandas
Recommendation: For general column removal, using drop() is the most straightforward method. For cleaning data with missing values, dropna() is ideal. If you prefer to modify the original DataFrame without creating a new one, use inplace=True.
Similar Reads
Drop Empty Columns in Pandas Cleaning data is an essential step in data analysis. In this guide we will explore different ways to drop empty, null and zero-value columns in a Pandas DataFrame using Python. By the end you'll know how to efficiently clean your dataset using the dropna() and replace() methods. Understanding dropna
3 min read
How to Drop Index Column in Pandas? When working with Pandas DataFrames, it's common to reset or remove custom indexing, especially after filtering or modifying rows. Dropping the index is useful when:We no longer need a custom index.We want to restore default integer indexing (0, 1, 2, ...).We're preparing data for exports or transfo
2 min read
Pandas Select Columns Simplest way to select a specific or multiple columns in pandas dataframe is by using bracket notation, where you place the column name inside square brackets. Let's consider following example: Pythonimport pandas as pd data = {'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'], 'Age': [25, 30, 22,
3 min read
How to Exclude Columns in Pandas? Excluding columns in a Pandas DataFrame is a common operation when you want to work with only relevant data. In this article, we will discuss various methods to exclude columns from a DataFrame, including using .loc[], .drop(), and other techniques.Exclude One Column using .loc[]We can exclude a col
2 min read
Show all columns of Pandas DataFrame Pandas sometimes hides some columns by default if the DataFrame is too wide. To view all the columns in a DataFrame pandas provides a simple way to change the display settings using the pd.set_option() function. This function allow you to control how many rows or columns are displayed in the output.
2 min read