Select Rows From List of Values in Pandas DataFrame
Last Updated :
10 Dec, 2024
Let's learn how to select rows from a list of values in Pandas DataFrame using isin() method.
Using isin() to Select Rows from a List of Values
The isin() function is one of the most commonly used methods for filtering data based on a list of values. Let’s walk through a simple example to illustrate this.
Python
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Department': ['HR', 'IT', 'IT', 'Finance']
}
df = pd.DataFrame(data)
departments = ['HR', 'Finance']
filtered_df = df[df['Department'].isin(departments)]
print(filtered_df)
Output Name Department
0 Alice HR
3 David Finance
In this example, the isin() function filters rows where the Department column matches 'HR' or 'Finance'. This is a simple and effective way to filter data in Pandas.
Using isin() to Filter Based on Multiple Conditions
You can also apply multiple conditions using isin() along with logical operators like & (AND) or | (OR). Let us consider a dataframe. Here we will filter out those rows using isin() that satisfy the column named 'Status' as 'Pending' or 'Completed'.
Python
import pandas as pd
data = {
'OrderID': [1011, 1021, 1052],
'Customer': ['Alice', 'Bob', 'Eve'],
'Status': ['Completed', 'Pending', 'Canceled']
}
df = pd.DataFrame(data)
status = ['Completed','Pending']
filtered_df = df[df['Status'].isin(status)]
print(filtered_df)
Output OrderID Customer Status
0 1011 Alice Completed
1 1021 Bob Pending
Using Negation with isin() to Exclude Rows
You can exclude rows from the DataFrame that match certain values using the negation operator ~ with isin(). Let us consider the following example:
Python
import pandas as pd
# Sample DataFrame
data = {
'Item': ['Carrot', 'Banana', 'Broccoli', 'Orange'],
'Category': ['Vegetable', 'Fruit', 'Vegetable', 'Fruit']
}
df = pd.DataFrame(data)
# Exclude fruits
filtered_df = df[~df['Category'].isin(['Fruit'])]
print(filtered_df)
Output Item Category
0 Carrot Vegetable
2 Broccoli Vegetable
Using multiple isin() along with conditional operators
You can filter rows by using multiple isin() methods in combination with the AND (&) operator. This is especially helpful when you want to check multiple columns against different lists of values.
Python
import pandas as pd
# Sample DataFrame
data = {
'Fruit': ['Apple', 'Carrot', 'Banana', 'Grapes'],
'Category': ['Fruit', 'Vegetable', 'Fruit', 'Fruit'],
}
df = pd.DataFrame(data)
# Conditional filtering with AND
filtered_df = df[df['Fruit'].isin(['Apple', 'Banana', 'Carrot']) & df['Category'].isin(['Fruit'])]
print(filtered_df)
Output Fruit Category
0 Apple Fruit
2 Banana Fruit
In this example, we apply the AND condition to ensure that only fruits from the ['Apple', 'Banana', 'Carrot'] list in the Fruit column are included, and that their Category is Fruit. The isin() checks if the value in each column belongs to the specified list, and the logical AND (&) ensures both conditions must be true for the row to be included.
Using query() for Advanced Filtering
The query() method in Pandas is similar to SQL queries. It allows you to filter rows using an in operator, making it ideal for filtering based on a list of values. Let's consider following example, where we filter rows based on specific grades.
Python
import pandas as pd
# Sample DataFrame with student data
data = {
'Student': ['John', 'Emily', 'Sarah', 'Michael'],
'Age': [18, 19, 17, 20],
'Grade': ['A', 'B', 'A', 'C'],
}
df = pd.DataFrame(data)
# Select rows where 'Student' is in a tuple
filtered_df = df.query('Grade in ("A", "C")')
print(filtered_df)
Output Student Age Grade
0 John 18 A
2 Sarah 17 A
3 Michael 20 C
In this example, query() filters the DataFrame to only include students with grades 'A' or 'C'. This approach is clean and efficient, especially when dealing with more complex queries involving multiple columns.
Using apply() and Lambda Functions for Complex Filters
For more complex filtering, apply() combined with lambda functions is a versatile option. This method is especially useful when you need to check conditions that cannot be easily captured by isin().
Python
import pandas as pd
# Sample DataFrame with employee data
data = {
'Employee': ['John','Michael', 'David', 'Sophia'],
'Department': ['HR', 'HR', 'Finance', 'IT'],
'Salary': [50000, 65000, 70000, 62000]
}
df = pd.DataFrame(data)
# Select rows where 'Department' is either 'HR' or 'Finance' using apply
filtered_df = df[df['Department'].apply(lambda x: x in ['HR', 'Finance'])]
print(filtered_df)
Output Employee Department Salary
0 John HR 50000
1 Michael HR 65000
2 David Finance 70000
In this case, apply() along with a lambda function checks if each value in the Department column matches either 'HR' or 'Finance'. This approach is particularly useful when the filtering condition is complex or involves custom logic that cannot be directly captured by simple methods like isin().
Conclusion
In this article, we explored several techniques for filtering rows in a Pandas DataFrame using a list of values. The most common and efficient approach is the isin() method, which works well for basic filtering. For more complex conditions, methods like query() and apply() with lambda functions provide greater flexibility.
Similar Reads
Get the specified row value of a given Pandas DataFrame
Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Now let's see how to get the specified row value of a given DataFrame. We shall be using loc[ ], iloc[ ], and [ ] for a data frame object to select rows and col
2 min read
Create a list from rows in Pandas dataframe
Python lists are one of the most versatile data structures, offering a range of built-in functions for efficient data manipulation. When working with Pandas, we often need to extract entire rows from a DataFrame and store them in a list for further processing. Unlike columns, which are easily access
4 min read
How to Randomly Select rows from Pandas DataFrame
In Pandas, it is possible to select rows randomly from a DataFrame with different methods. Randomly selecting rows can be useful for tasks like sampling, testing or data exploration.Creating Sample Pandas DataFrameFirst, we will create a sample Pandas DataFrame that we will use further in our articl
3 min read
Drop a list of rows from a Pandas DataFrame
Let us see how to drop a list of rows in a Pandas DataFrame. We can do this using the Pandas drop() function. We will also pass inplace = True and axis=0 to denote row, as it makes the changes we make in the instance stored in that instance without doing any assignment. Creating Dataframe to drop a
3 min read
How to Drop Rows with NaN Values in Pandas DataFrame?
In Pandas missing values are represented as NaN (Not a Number) which can lead to inaccurate analyses. One common approach to handling missing data is to drop rows containing NaN values using pandas. Below are some methods that can be used:Method 1: Using dropna()The dropna() method is the most strai
2 min read
How to Get Cell Value from Pandas DataFrame?
In this article, we will explore various methods to retrieve cell values from a Pandas DataFrame in Python. Pandas provides several functions to access specific cell values, either by label or by position.Get value from a cell of Dataframe using loc() functionThe .loc[] function in Pandas allows you
3 min read
Select row with maximum and minimum value in Pandas dataframe
Let's see how can we select rows with maximum and minimum values in Pandas Dataframe with help of different examples using Python. Creating a Dataframe to select rows with max and min values in DataframePython3 # importing pandas and numpy import pandas as pd import numpy as np # data of 2018 driver
2 min read
Get First and Second Largest Values in Pandas DataFrame
When analyzing data in Python using the pandas library, you may encounter situations where you need to find the highest and second-highest values in a DataFrame's columns. This task can be crucial in various contexts, such as ranking, filtering top performers, or performing threshold-based analysis.
4 min read
How to select a range of rows from a dataframe in PySpark ?
In this article, we are going to select a range of rows from a PySpark dataframe. It can be done in these ways: Using filter().Using where().Using SQL expression. Creating Dataframe for demonstration: Python3 # importing module import pyspark # importing sparksession from pyspark.sql module from pys
3 min read
How to Select Rows from a Dataframe based on Column Values ?
Selecting rows from a Pandas DataFrame based on column values is a fundamental operation in data analysis using pandas. The process allows to filter data, making it easier to perform analyses or visualizations on specific subsets. Key takeaway is that pandas provides several methods to achieve this,
4 min read