How to Extract random sample of rows in R DataFrame with nested condition
Last Updated :
24 Jun, 2021
In this article, we will learn how to extract random samples of rows in a DataFrame in R programming language with a nested condition.
Method 1: Using sample()
We will be using the sample() function to carry out this task. sample() function in R Language creates random samples based on the parameters provided in the function call. It takes either a vector or a positive integer as the object in the function parameter.
Another function which we will be using is which(). This function will help us provide conditions according to which samples will be extracted. which() function returns the elements (along with indices of the elements) which satisfy the condition given in the parameters.
Syntax: df[ sample(which ( conditions ) ,n), ]
Parameters:
- df: DataFrame
- n: number of samples to be generated
- conditions: samples are extracted according to this condition. Ex: df$year > 5
DataFrame in Use:
| name | year | length | education |
---|
1 | Welcome | 10 | 40 | yes |
2 | to | 51 | NA | yes |
3 | Geeks | 19 | NA | no |
4 | for | 126 | 100 | no |
5 | Geeks | 99 | 95 | yes |
Thus, to realize this approach the dataframe is first created and then passed to sample() along with the condition that will be used to extract rows from the dataframe. Given below are implementations that uses the above dataframe to illustrate the same.
Example 1:
R
df <- data.frame( name = c("Welcome", "to", "Geeks",
"for", "Geeks"),
year = c(10, 51, 19, 126, 99),
length = c(40, NA, NA, 100, 95),
education = c("yes", "yes", "no",
"no", "yes") )
df
# Printing 2 rows
print("2 samples")
df[ sample(which (df$year > 5) ,2), ]
Output:
name year length education
1 Welcome 10 40 yes
2 to 51 NA yes
3 Geeks 19 NA no
4 for 126 100 no
5 Geeks 99 95 yes
[1] "2 samples"
name year length education
1 Welcome 10 40 yes
2 to 51 NA yes
Example 2:
R
df <- data.frame( name = c("Welcome", "to", "Geeks",
"for", "Geeks"),
year = c(10, 51, 19, 126, 99),
length = c(40, NA, NA, 100, 95),
education = c("yes", "yes", "no",
"no", "yes") )
df
# Printing 3 rows
print("3 samples")
df[ sample(which (df$education !="no") ,3), ]
Output:
name year length education
1 Welcome 10 40 yes
2 to 51 NA yes
3 Geeks 19 NA no
4 for 126 100 no
5 Geeks 99 95 yes
[1] "3 samples"
name year length education
5 Geeks 99 95 yes
1 Welcome 10 40 yes
2 to 51 NA yes
Method 2: Using sample_n() function
sample_n() function in R Language is used to take random sample specimens from a data frame.
Syntax: sample_n(x, n)
Parameters:
- x: Data Frame
- n: size/number of items to select
Along with sample_n() function, we have also used filter() function. The filter() function in R Language is used to choose cases and filtering out the values based on the filtering expression.
Syntax: filter(x, expr)
Parameters:
- x: Object to be filtered
- expr: expression as a base for filtering
We have loaded the dplyr package as it contains both filter() and sample_n() function. In the parameters of the filter function, we have passed our sample dataframe->df and our Nested conditional as arguments. Then we have used our sample_n() function to extract the "n" number of samples from the dataframe after satisfying the conditions.
Syntax: filter(df, condition) %>% sample_n(., n)
Parameters:
- df: Dataframe Object
- condition: Nested conditionals. Ex: df$name != "to"
- n: Number of samples
Example 1:
R
library(dplyr)
df <- data.frame( name = c("Welcome", "to", "Geeks",
"for", "Geeks"),
year = c(10, 51, 19, 126, 99),
length = c(40, NA, NA, 100, 95),
education = c("yes", "yes", "no",
"no", "yes") )
df
# Printing 2 rows
print("2 samples")
filter(df, df$name != "to") %>% sample_n(., 2)
Output:
name year length education
1 Welcome 10 40 yes
2 to 51 NA yes
3 Geeks 19 NA no
4 for 126 100 no
5 Geeks 99 95 yes
[1] "2 samples"
name year length education
1 Welcome 10 40 yes
2 Geeks 99 95 yes
Example 2:
R
library(dplyr)
df <- data.frame( name = c("Welcome", "to", "Geeks",
"for", "Geeks"),
year = c(10, 51, 19, 126, 99),
length = c(40, NA, NA, 100, 95),
education = c("yes", "yes", "no",
"no", "yes") )
df
# Printing 2 rows
print("2 samples")
filter(df, df$year >20 ) %>% sample_n(., 2)
Output:
name year length education
1 Welcome 10 40 yes
2 to 51 NA yes
3 Geeks 19 NA no
4 for 126 100 no
5 Geeks 99 95 yes
[1] "2 samples"
name year length education
1 for 126 100 no
2 to 51 NA yes
Similar Reads
Non-linear Components In electrical circuits, Non-linear Components are electronic devices that need an external power source to operate actively. Non-Linear Components are those that are changed with respect to the voltage and current. Elements that do not follow ohm's law are called Non-linear Components. Non-linear Co
11 min read
Spring Boot Tutorial Spring Boot is a Java framework that makes it easier to create and run Java applications. It simplifies the configuration and setup process, allowing developers to focus more on writing code for their applications. This Spring Boot Tutorial is a comprehensive guide that covers both basic and advance
10 min read
Class Diagram | Unified Modeling Language (UML) A UML class diagram is a visual tool that represents the structure of a system by showing its classes, attributes, methods, and the relationships between them. It helps everyone involved in a projectâlike developers and designersâunderstand how the system is organized and how its components interact
12 min read
Steady State Response In this article, we are going to discuss the steady-state response. We will see what is steady state response in Time domain analysis. We will then discuss some of the standard test signals used in finding the response of a response. We also discuss the first-order response for different signals. We
9 min read
Backpropagation in Neural Network Back Propagation is also known as "Backward Propagation of Errors" is a method used to train neural network . Its goal is to reduce the difference between the modelâs predicted output and the actual output by adjusting the weights and biases in the network.It works iteratively to adjust weights and
9 min read
Polymorphism in Java Polymorphism in Java is one of the core concepts in object-oriented programming (OOP) that allows objects to behave differently based on their specific class type. The word polymorphism means having many forms, and it comes from the Greek words poly (many) and morph (forms), this means one entity ca
7 min read
3-Phase Inverter An inverter is a fundamental electrical device designed primarily for the conversion of direct current into alternating current . This versatile device , also known as a variable frequency drive , plays a vital role in a wide range of applications , including variable frequency drives and high power
13 min read
What is Vacuum Circuit Breaker? A vacuum circuit breaker is a type of breaker that utilizes a vacuum as the medium to extinguish electrical arcs. Within this circuit breaker, there is a vacuum interrupter that houses the stationary and mobile contacts in a permanently sealed enclosure. When the contacts are separated in a high vac
13 min read
AVL Tree Data Structure An AVL tree defined as a self-balancing Binary Search Tree (BST) where the difference between heights of left and right subtrees for any node cannot be more than one. The absolute difference between the heights of the left subtree and the right subtree for any node is known as the balance factor of
4 min read
CTE in SQL In SQL, a Common Table Expression (CTE) is an essential tool for simplifying complex queries and making them more readable. By defining temporary result sets that can be referenced multiple times, a CTE in SQL allows developers to break down complicated logic into manageable parts. CTEs help with hi
6 min read