How to filter R dataframe by multiple conditions?
Last Updated :
23 May, 2021
In R programming Language, dataframe columns can be subjected to constraints, and produce smaller subsets. However, while the conditions are applied, the following properties are maintained :
- Rows are considered to be a subset of the input.
- Rows in the subset appear in the same order as the original data frame.
- Columns remain unmodified.
- The number of groups may be reduced, based on conditions.
- Data frame attributes are preserved during the data filter.
- Row numbers may not be retained in the final output
The data frame rows can be subjected to multiple conditions by combining them using logical operators, like AND (&) , OR (|). The rows returning TRUE are retained in the final output.
Method 1: Using indexing method and which() function
Any data frame column in R can be referenced either through its name df$col-name or using its index position in the data frame df[col-index]. The cell values of this column can then be subjected to constraints, logical or comparative conditions, and then data frame subset can be obtained. These conditions are applied to the row index of the data frame so that the satisfied rows are returned. Multiple conditions can also be combined using which() method in R. The which() function in R returns the position of the value which satisfies the given condition.
Syntax: which( vec, arr.ind = F)
Parameter :
vec - The vector to be subjected to conditions
The %in% operator is used to check a value in the vector specified.
Syntax:
val %in% vec
Example:
R
# declaring a data frame
data_frame = data.frame(col1 = c("b","b","d","e","d") ,
col2 = c(0,2,1,4,5),
col3= c(TRUE,FALSE,FALSE,TRUE, TRUE))
print ("Original dataframe")
print (data_frame)
# checking which values of col1 are
# equivalent to b or e or the col2
# value is greater than 4
data_frame_mod <- data_frame[which(data_frame$col1 %in% c("b","e")
| data_frame$col2 > 4),]
print ("Modified dataframe")
print (data_frame_mod)
Output
[1] "Original dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
3 d 1 FALSE
4 e 4 TRUE
5 d 5 TRUE
[1] "Modified dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
4 e 4 TRUE
5 d 5 TRUE
The conditions can be aggregated together, without the use of which method also.
Example:
R
# declaring a data frame
data_frame = data.frame(col1 = c("b","b","d","e","d") ,
col2 = c(0,2,1,4,5),
col3= c(TRUE,FALSE,FALSE,TRUE, TRUE))
print ("Original dataframe")
print (data_frame)
# checking which values of col1
# are equivalent to b or e
data_frame_mod <- data_frame[data_frame$col1 %in% c("b","e")
& data_frame$col2 > 4,]
print ("Modified dataframe")
print (data_frame_mod)
Output
[1] "Original dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
3 d 1 FALSE
4 e 4 TRUE
5 d 5 TRUE
[1] "Modified dataframe"
[1] col1 col2 col3
<0 rows> (or 0-length row.names)
Method 2: Using dplyr package
The dplyr library can be installed and loaded into the working space which is used to perform data manipulation. The filter() function is used to produce a subset of the data frame, retaining all rows that satisfy the specified conditions. The filter() method in R can be applied to both grouped and ungrouped data. The expressions include comparison operators (==, >, >= ) , logical operators (&, |, !, xor()) , range operators (between(), near()) as well as NA value check against the column values. The subset data frame has to be retained in a separate variable.
Syntax: filter(df , cond)
Parameter :
df - The data frame object
cond - The condition to filter the data upon
The difference in the application of this approach is that it doesn't retain the original row numbers of the data frame.
Example:
R
library ("dplyr")
# declaring a data frame
data_frame = data.frame(col1 = c("b","b","d","e","e") ,
col2 = c(0,2,1,4,5),
col3= c(TRUE,FALSE,FALSE,TRUE, TRUE))
print ("Original dataframe")
print (data_frame)
# checking which values of col1 are
# equivalent to b and col3 is not
# TRUE
data_frame_mod <- filter(
data_frame,col1 == "b" & col3!=TRUE)
print ("Modified dataframe")
print (data_frame_mod)
Output
[1] "Original dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
3 d 1 FALSE
4 e 4 TRUE
5 d 5 TRUE
[1] "Modified dataframe"
col1 col2 col3
1 b 2 FALSE
Method 3: Using subset method
The subset() method in base R is used to return subsets of vectors, matrices, or data frames which satisfy the applied conditions. The subset() method is concerned with the rows. The row numbers are retained while applying this method.
Syntax: subset(df , cond)
Arguments :
df - The data frame object
cond - The condition to filter the data upon
Example:
R
# declaring a data frame
data_frame = data.frame(col1 = c("b","b","d","e","d") ,
col2 = c(0,2,1,4,5),
col3= c(TRUE,FALSE,FALSE,TRUE, TRUE))
print ("Original dataframe")
print (data_frame)
# checking which values of col1 are
# equivalent to b or col2 value is
# greater than 4
data_frame_mod <- subset(data_frame, col1=="b" | col2 > 4)
print ("Modified dataframe")
print (data_frame_mod)
Output
[1] "Original dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
3 d 1 FALSE
4 e 4 TRUE
5 d 5 TRUE
[1] "Modified dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
5 d 5 TRUE
Similar Reads
Non-linear Components
In electrical circuits, Non-linear Components are electronic devices that need an external power source to operate actively. Non-Linear Components are those that are changed with respect to the voltage and current. Elements that do not follow ohm's law are called Non-linear Components. Non-linear Co
11 min read
Spring Boot Tutorial
Spring Boot is a Java framework that makes it easier to create and run Java applications. It simplifies the configuration and setup process, allowing developers to focus more on writing code for their applications. This Spring Boot Tutorial is a comprehensive guide that covers both basic and advance
10 min read
Class Diagram | Unified Modeling Language (UML)
A UML class diagram is a visual tool that represents the structure of a system by showing its classes, attributes, methods, and the relationships between them. It helps everyone involved in a projectâlike developers and designersâunderstand how the system is organized and how its components interact
12 min read
Steady State Response
In this article, we are going to discuss the steady-state response. We will see what is steady state response in Time domain analysis. We will then discuss some of the standard test signals used in finding the response of a response. We also discuss the first-order response for different signals. We
9 min read
Backpropagation in Neural Network
Back Propagation is also known as "Backward Propagation of Errors" is a method used to train neural network . Its goal is to reduce the difference between the modelâs predicted output and the actual output by adjusting the weights and biases in the network.It works iteratively to adjust weights and
9 min read
Polymorphism in Java
Polymorphism in Java is one of the core concepts in object-oriented programming (OOP) that allows objects to behave differently based on their specific class type. The word polymorphism means having many forms, and it comes from the Greek words poly (many) and morph (forms), this means one entity ca
7 min read
3-Phase Inverter
An inverter is a fundamental electrical device designed primarily for the conversion of direct current into alternating current . This versatile device , also known as a variable frequency drive , plays a vital role in a wide range of applications , including variable frequency drives and high power
13 min read
What is Vacuum Circuit Breaker?
A vacuum circuit breaker is a type of breaker that utilizes a vacuum as the medium to extinguish electrical arcs. Within this circuit breaker, there is a vacuum interrupter that houses the stationary and mobile contacts in a permanently sealed enclosure. When the contacts are separated in a high vac
13 min read
AVL Tree Data Structure
An AVL tree defined as a self-balancing Binary Search Tree (BST) where the difference between heights of left and right subtrees for any node cannot be more than one. The absolute difference between the heights of the left subtree and the right subtree for any node is known as the balance factor of
4 min read
What is a Neural Network?
Neural networks are machine learning models that mimic the complex functions of the human brain. These models consist of interconnected nodes or neurons that process data, learn patterns, and enable tasks such as pattern recognition and decision-making.In this article, we will explore the fundamenta
14 min read