Count number of rows within each group in R DataFrame
Last Updated :
30 May, 2021
DataFrame in R Programming Language may contain columns where not all values are unique. The duplicate values in the dataframe can be sectioned together into one group. The frequencies corresponding to the same columns' sequence can be captured using various external packages in R programming language.
Method 1 : Using dplyr package
The "dplyr" package in R is used to perform data enhancements and manipulations. We can use certain functions from this method that can help to realize our functionality.
- Using tally() and group_by() method
group_by() method in R can be used to categorize data into groups based on either a single column or a group of multiple columns. All the plausible unique combinations of the input columns are stacked together as a single group.
Syntax:
group_by(args .. )
Where, the args contain a sequence of column to group data upon
The tally() method in R is used to summarize the data and count the number of values that each group belongs to. Upon successive application of these methods, the dataframe mutations are carried out to return a table where the particular input columns are returned in order of their appearance in the group_by() method, followed by a column 'n' containing frequency counts for these groups.
This method is considered to be better than other approaches because it returns detailed information about the column classes of the specified dataframe.
Example:
R
library("dplyr")
# creating a dataframe
data_frame <- data.frame(col1 = rep(c(1:3), each = 3),
col2 = letters[1:3])
print ("Original DataFrame")
print (data_frame)
# group by column1 values and count
# the total in each
data_frame %>% group_by(col1) %>%tally()
Output
[1] "Original DataFrame"
col1 col2
1 1 a
2 1 b
3 1 c
4 2 a
5 2 b
6 2 c
7 3 a
8 3 b
9 3 c >
# A tibble: 3 x 2
col1 n
<int> <int>
1 1 3
2 2 3
3 3 3
- Using dplyr::count() method
The count() method can be applied to the input dataframe containing one or more columns and returns a frequency count corresponding to each of the groups. The columns returned on the application of this method is a proper subset of the columns of the original dataframe. The columns appearing in the result are the columns appearing in the count() method.
Syntax:
count(args .. ),
Where, the args contain a sequence of column to group data upon
Example:
R
library("dplyr")
# creating a dataframe
data_frame <- data.frame(col1 = rep(c(1:3), each = 3),
col2 = letters[1:3],
col3 = c(1,4,1,2,2,3,1,2,2))
print ("Original DataFrame")
print (data_frame)
print ("Modified DataFrame")
# count rows by col1 and col3 group
data_frame %>% dplyr::count(col1, col3)
Output:
[1] "Original DataFrame"
col1 col2 col3
1 1 a 1
2 1 b 4
3 1 c 1
4 2 a 2
5 2 b 2
6 2 c 3
7 3 a 1
8 3 b 2
9 3 c 2
[1] "Modified DataFrame"
col1 col3 n
1 1 1 2
2 1 4 1
3 2 2 2
4 2 3 1
5 3 1 1
6 3 2 2
Method 2 : Using data.table package
The data.table package in R can be used to retrieve and store data in an organized tabular structure. The .N attribute of the data_table indexing can be used to categorically keep a count of the frequency of the encountered specified columns' combinations. The columns are specified in the "by" attribute using the list() method in R, which is an alternative to the group_by() method.
Syntax:
data_table[, .N, by = list(cols..)]
Example:
R
library(data.table)
# creating a dataframe
data_frame <- data.frame(col1 = rep(c(1:3), each = 3),
col2 = letters[1:3],
col3 = c(1,4,1,2,2,3,1,2,2))
print ("Original DataFrame")
print (data_frame)
print ("Modified DataFrame")
data_table <- data.table(data_frame)
data_table[, .N, by = list(col1, col3)]
Output
[1] "Original DataFrame"
col1 col2 col3
1 1 a 1
2 1 b 4
3 1 c 1
4 2 a 2
5 2 b 2
6 2 c 3
7 3 a 1
8 3 b 2
9 3 c 2
[1] "Modified DataFrame"
col1 col3 N
1: 1 1 2
2: 1 4 1
3: 2 2 2
4: 2 3 1
5: 3 1 1
6: 3 2 2
Method 3 : Using aggregate method
aggregate() method in R programming language is a generic function used to summarize and evaluate both time series as well dataframes.
Syntax:
aggregate(formula, data, FUN)
Parameter :
- formula : such as y ~ x where the y variables are numeric data to be split into groups according to the grouping x variables.
- by - grouping elements
- FUN - function to be applied
The function to be applied here is the length, which counts the frequency associated with each group. It computes the plausible combinations of all the columns mentioned in the formula, and displays each one with a frequency associated. Thus, it is used to perform an aggregation over all the columns.
Example:
R
data_frame <- data.frame(col1 = sample(1:2,9,replace = TRUE),
col2 = letters[1:3],
col3 = c(1,4,1,2,2,3,1,2,2))
print ("Original DataFrame")
print (data_frame)
print ("keeping a count of all groups")
data_mod <- aggregate(col3 ~ col1 + col2,
data = data_frame,
FUN = length)
print (data_mod)
Output
[1] "Original DataFrame"
col1 col2 col3
1 2 a 1
2 2 b 4
3 1 c 1
4 1 a 2
5 1 b 2
6 2 c 3
7 2 a 1
8 2 b 2
9 1 c 2
[1] "keeping a count of all groups"
col1 col2 col3
1 1 a 1
2 2 a 2
3 1 b 1
4 2 b 2
5 1 c 2
6 2 c 1
Similar Reads
Non-linear Components In electrical circuits, Non-linear Components are electronic devices that need an external power source to operate actively. Non-Linear Components are those that are changed with respect to the voltage and current. Elements that do not follow ohm's law are called Non-linear Components. Non-linear Co
11 min read
Spring Boot Tutorial Spring Boot is a Java framework that makes it easier to create and run Java applications. It simplifies the configuration and setup process, allowing developers to focus more on writing code for their applications. This Spring Boot Tutorial is a comprehensive guide that covers both basic and advance
10 min read
Class Diagram | Unified Modeling Language (UML) A UML class diagram is a visual tool that represents the structure of a system by showing its classes, attributes, methods, and the relationships between them. It helps everyone involved in a projectâlike developers and designersâunderstand how the system is organized and how its components interact
12 min read
Steady State Response In this article, we are going to discuss the steady-state response. We will see what is steady state response in Time domain analysis. We will then discuss some of the standard test signals used in finding the response of a response. We also discuss the first-order response for different signals. We
9 min read
Backpropagation in Neural Network Back Propagation is also known as "Backward Propagation of Errors" is a method used to train neural network . Its goal is to reduce the difference between the modelâs predicted output and the actual output by adjusting the weights and biases in the network.It works iteratively to adjust weights and
9 min read
Polymorphism in Java Polymorphism in Java is one of the core concepts in object-oriented programming (OOP) that allows objects to behave differently based on their specific class type. The word polymorphism means having many forms, and it comes from the Greek words poly (many) and morph (forms), this means one entity ca
7 min read
3-Phase Inverter An inverter is a fundamental electrical device designed primarily for the conversion of direct current into alternating current . This versatile device , also known as a variable frequency drive , plays a vital role in a wide range of applications , including variable frequency drives and high power
13 min read
What is Vacuum Circuit Breaker? A vacuum circuit breaker is a type of breaker that utilizes a vacuum as the medium to extinguish electrical arcs. Within this circuit breaker, there is a vacuum interrupter that houses the stationary and mobile contacts in a permanently sealed enclosure. When the contacts are separated in a high vac
13 min read
AVL Tree Data Structure An AVL tree defined as a self-balancing Binary Search Tree (BST) where the difference between heights of left and right subtrees for any node cannot be more than one. The absolute difference between the heights of the left subtree and the right subtree for any node is known as the balance factor of
4 min read
CTE in SQL In SQL, a Common Table Expression (CTE) is an essential tool for simplifying complex queries and making them more readable. By defining temporary result sets that can be referenced multiple times, a CTE in SQL allows developers to break down complicated logic into manageable parts. CTEs help with hi
6 min read