How to select row with maximum value in each group in R Language?
Last Updated :
01 Apr, 2021
In R Programming Language, to select the row with the maximum value in each group from a data frame, we can use various approaches as discussed below.
Consider the following dataset with multiple observations in sub-column. This dataset contains three columns as sr_no, sub, and marks.
Creating Dataset :
Here we are creating dataframe for demonstration.
Code block
Output:
roll sub marks
1 1 A 2
2 2 A 3
3 3 B 5
4 4 B 2
5 5 B 5
6 6 C 8
7 7 C 17
8 8 A 3
9 9 C 5
10 10 C 5
Here, roll and marks are integer value and sub is the categorical value (char) have category A, B, C. In this dataset A, B, C represent different subjects and marks are marks obtained in the corresponding sub.
As we can see subject A, B, C has the maximum value (marks) of 3,5,17 respectively in the group. We can select the max row in the group using the following two approaches.
Methods 1: Using R base.
Step 1: Load the dataset into a variable (group).
R
# Creating a dataset.
no <- c( 1 : 10)
subject <- c('A', 'A', 'B', 'B', 'B',
'C', 'C', 'A', 'C', 'C')
mark <- c(2, 3, 5, 2, 5, 8, 17, 3, 5, 5)
group <- data.frame(roll = no, sub = subject,
marks = mark )
group
Output:
roll sub marks
1 1 A 2
2 2 A 3
3 3 B 5
4 4 B 2
5 5 B 5
6 6 C 8
7 7 C 17
8 8 A 3
9 9 C 5
10 10 C 5
Step 2: Sorted the marks in descending order for each group (A, B, C).
R
# Creating a dataset.
no <- c( 1 : 10)
subject <- c('A', 'A', 'B', 'B', 'B',
'C', 'C', 'A', 'C', 'C')
mark <- c(2, 3, 5, 2, 5, 8, 17, 3, 5, 5)
group <- data.frame(roll = no, sub = subject,
marks = mark )
# sorting the sub and marks.
sorted_group <- group[order(group$sub, -group$marks),]
sorted_group
Output:
roll sub marks
2 2 A 3
8 8 A 3
1 1 A 2
3 3 B 5
5 5 B 5
4 4 B 2
7 7 C 17
6 6 C 8
9 9 C 5
10 10 C 5
As our sub is now in ascending order, and we are ready to select the row with max value in each group, here groups are A, B, C.
Step 3: Remove the duplicate rows from the sorted subject column.
R
# Creating a dataset.
no <- c( 1 : 10)
subject <- c('A', 'A', 'B', 'B', 'B',
'C', 'C', 'A', 'C', 'C')
mark <- c(2, 3, 5, 2, 5, 8, 17, 3, 5, 5)
group <- data.frame(roll = no, sub = subject,
marks = mark )
# sorting the sub and marks.
sorted_group <- group[order(group$sub, -group$marks),]
# removing duplicates from the sorted sub column
ans <- sorted_group[!duplicated(sorted_group$sub),]
ans
Output:
These are the selected row with the maximum value in each group.
Methods 2: Using dplyr package
dplyr is an R package which is most commonly used to manipulate the data frame. dplyr provides various verbs (functions) for data manipulation such as filter, arrange, select, rename, mutate etc.
To install dplyr package we have to run the following command in the R console.
install.packages("dplyr")
Step1: Load the dataset and library.
R
# Creating a dataset.
no <- c( 1 : 10)
subject <- c('A', 'A', 'B', 'B', 'B',
'C', 'C', 'A', 'C', 'C')
mark <- c(2, 3, 5, 2, 5, 8, 17, 3, 5, 5)
group <- data.frame(roll = no, sub = subject,
marks = mark )
# loading library
library("dplyr")
Step 2: Now group the data frame sub using group_ by verb (function) and select the row having maximum marks using which.max().
R
# Creating a dataset.
no <- c( 1 : 10)
subject <- c('A', 'A', 'B', 'B', 'B',
'C', 'C', 'A', 'C', 'C')
mark <- c(2, 3, 5, 2, 5,
8, 17, 3, 5, 5)
group <- data.frame(roll = no, sub = subject,
marks = mark )
# loading library
library("dplyr")
group %>% group_by(sub) %>% slice(which.max(marks))
Output:
As we can see these are the selected row with the maximum value in each group.
Similar Reads
Select Top N Highest Values by Group in R
In this article, we are going to see how to select the Top Nth highest value by the group in R language. Method 1: Using Reduce method The dataframe can be ordered by group in descending order of their values by the order method. The corresponding dataframe is then accessed using the indexing method
5 min read
How to extract the dataframe row with min or max values in R ?
The tabular arrangement of rows and columns to form a data frame in R Programming Language supports many ways to access and modify the data. Application of queries and aggregate functions, like min, max and count can easily be made over the data frame cell values. Therefore, it is relatively very ea
5 min read
Select DataFrame Rows where Column Values are in Range in R
In this article, we will discuss how to select dataframe rows where column values are in a range in R programming language. Data frame indexing can be used to extract rows or columns from the dataframe. The condition can be applied to the specific columns of the dataframe and combined using the logi
2 min read
Find the index of the maximum value in R DataFrame
In this article, we will see how to find the index of the maximum value from a DataFrame in the R Programming Language We can find the maximum value index in a dataframe using the which.max() function. Syntax: which.max(dataframe_name$columnname) "$" is used to access particular column of a datafram
2 min read
How to find Nth smallest value in vector in R ?
In this article, we will discuss how to find the Nth smallest in vector in the R programming language. Steps -Create vectorTake input from the user using the function readline().Convert data from string to int using the function as.integer().In this step, we are finding nth largest number using Synt
1 min read
How to count values per level in a factor in R
In this article, we will discuss how to count the values per level in a given factor in R Programming Language. Method 1 : Using summary() method summary() method in base R is a generic function used to produce result summaries of the results of the functions computed based on the class of the argum
5 min read
Select First Row of Each Group in DataFrame in R
In this article, we will discuss how to select the first row of each group in Dataframe using R programming language. The duplicated() method is used to determine which of the elements of a dataframe are duplicates of other elements. The method returns a logical vector which tells which of the rows
2 min read
Select rows from a DataFrame based on values in a vector in R
In this article, we will discuss how to select rows from a DataFrame based on values in a vector in R Programming Language. Method 1: Using %in% operator %in% operator in R, is used to identify if an element belongs to a vector or Dataframe. It is used to perform a selection of the elements satisfyi
5 min read
Select rows of a matrix in R that meet a condition
A large dataset is often required to be filtered according to our requirements. In this article, we will be discussing how we can select a row from a matrix in R that meets the condition. For better understanding let's understand the problem statement with the help of an example. Example: Data in us
2 min read
Select Rows if Value in One Column is Smaller Than in Another in R Dataframe
In this article, we will discuss how to select rows if the value in one column is smaller than another in dataframe in R programming language. Data frame in use: Method 1: Using Square Brackets By using < operator inside the square bracket we can return the required rows. Syntax: dataframe[datafr
2 min read