Frequency count of multiple variables in R Dataframe
Last Updated :
30 May, 2021
A data frame may contain repeated or missing values. Each column may contain any number of duplicate or repeated instances of the same variable. Data statistics and analysis mostly rely on the task of computing the frequency or count of the number of instances a particular variable contains within each column and in R Programming Language, there are multiple ways to do so.
Method 1: Using apply() method
The apply method in base R returns a vector or array or list of values obtained by applying a function to margins of an array or matrix. It has the following syntax :
apply ( df , axis , FUN)
The table() method takes the cross-classifying factors belonging in a vector to build a contingency table of the counts at each combination of factor levels. A contingency table is basically a tabulation of the counts and/or percentages for multiple variables. It excludes the counting of any missing values from the factor variable supplied to the method. The output returned is in the form of a table. This method can be used to cross-tabulation and statistical analysis.
Example 1: Here we return column-wise for all the columns of the data frame, indicating the frequencies of the variable value instances occurring in that particular column.
R
set.seed(1)
# creating a data frame
data_frame <- data.frame(col1 = sample(letters[1:3], 8,
replace = TRUE),
col2 = sample(letters[1:3], 8,
replace = TRUE),
col3 = sample(letters[1:3], 8,
replace = TRUE),
col4 = sample(letters[1:3], 8,
replace = TRUE)
)
print ("Original DataFrame")
print (data_frame)
# calculating frequency of multiple variables
mod_frame <- apply(data_frame, 2 , table)
print ("Frequencies")
print (mod_frame)
Output:
[1] "Original DataFrame"
col1 col2 col3 col4
1 a b b a
2 c c b b
3 a c c a
4 b a a a
5 a a c b
6 c a a b
7 c b a b
8 b b a a
[1] "Frequencies"
$col1
a b c
3 2 3
$col2
a b c
3 3 2
$col3
a b c
4 2 2
$col4
a b
4 4
Example 2: Only for specific columns also, by specifying the desired column names in the form of a vector and addressing them using data frame indexing df[cols]. The output is returned to the form of a table, where column headings are column names desired and row heading are the different values found.
R
set.seed(1)
# creating a data frame
data_frame <- data.frame(col1 = sample(letters[1:3], 8,
replace = TRUE) ,
col2 = sample(letters[1:3], 8,
replace = TRUE),
col3 = sample(letters[1:3], 8,
replace = TRUE),
col4 = sample(letters[1:3], 8,
replace = TRUE)
)
print ("Original DataFrame")
print (data_frame)
sel_col <- c("col1", "col3")
# calculating frequency of multiple variables
mod_frame <- apply(data_frame[sel_col], 2, table)
print ("Frequencies")
print (mod_frame)
Output:
[1] "Original DataFrame"
col1 col2 col3 col4
1 a b b a
2 c c b b
3 a c c a
4 b a a a
5 a a c b
6 c a a b
7 c b a b
8 b b a a
[1] "Frequencies"
col1 col3
a 3 4
b 2 2
c 3 2
Method 2: Using plyr package
The plyr package is used preferably to experiment with the data, that is, create, modify and delete the columns of the data frame, on subjecting them to multiple conditions and user-defined functions. It can be downloaded and loaded into the workspace using the following command :
install.packages("lpyr")
The count() method of this package is used to return a frequency count of the variable contained in the specified columns respectively. It may contain multiple columns, and all the possible combinations are generated as per the cross join. The unique combinations out of the them are returned along with their respective counts.
count (df , args..) , where args.. are the column names
The output returns only the column specified in the count() method.
R
library("plyr")
set.seed(1)
# creating a data frame
data_frame <- data.frame(col1 = sample(letters[1:3], 8,
replace = TRUE) ,
col2 = sample(letters[1:3], 8,
replace = TRUE),
col3 = sample(letters[1:3], 8,
replace = TRUE),
col4 = sample(letters[1:3], 8,
replace = TRUE)
)
print ("Original DataFrame")
print (data_frame)
sel_col <- c("col1")
# calculating frequency of multiple variables
mod_frame <- count(data_frame, sel_col)
print ("Frequencies")
print (mod_frame)
Output:
[1] "Original DataFrame"
col1 col2 col3 col4
1 a b b a
2 c c b b
3 a c c a
4 b a a a
5 a a c b
6 c a a b
7 c b a b
8 b b a a
[1] "Frequencies"
col1 freq
1 a 3
2 b 2
3 c 3
Similar Reads
Count the frequency of a variable per column in R Dataframe
A data frame may contain repeated or missing values. Each column may contain any number of duplicate or repeated instances of the same variable. Data statistics and analysis mostly rely on the task of computing the frequency or count of the number of instances a particular variable contains within e
6 min read
Split DataFrame Variable into Multiple Columns in R
In this article, we will discuss how to split dataframe variables into multiple columns using R programming language. Method 1: Using do.call method The strsplit() method in R is used to split the specified column string vector into corresponding parts. The pattern is used to divide the string into
3 min read
Sum of Two or Multiple DataFrame Columns in R
In this article, we will discuss how to perform some of two and multiple dataframes columns in R programming language. Database in use: Sum of two columns The columns whose sum has to be calculated can be called through the $ operator and then we can perform the sum of two dataframe columns by using
2 min read
Count non-NA values by group in DataFrame in R
In this article, we will discuss how to count non-NA values by the group in dataframe in R Programming Language. Method 1 : Using group_by() and summarise() methods The dplyr package is used to perform simulations in the data by performing manipulations and transformations. The group_by() method in
5 min read
Sort a given DataFrame by multiple column(s) in R
Sorting of data may be useful when working on a large data and data is un-arranged, so it is very helpful to sort data first before applying operations. In this article, we will learn how to sort given dataframes by multiple columns in R. Approach:Create data frameChoose any more number of columns m
2 min read
Create Lagged Variable by Group in R DataFrame
Lagged variable is the type of variable that contains the previous value of the variable for which we want to create the lagged variable and the first value is neglected. Data can be segregated based on different groups in R programming language and then these categories can be processed differently
5 min read
Count non zero values in each column of R dataframe
In this article, we are going to count the number of non-zero data entries in the data using R Programming Language. To check the number of non-zero data entries in the data first we have to put that data in the data frame by using: data <- data.frame(x1 = c(1,2,0,100,0,3,10), x2 = c(5,0,1,8,10,0
2 min read
Insert multiple rows in R DataFrame
In this article, we are going to see how to insert multiple rows in the dataframe in R Programming Language. First, let's create a DataFrame To create a data frame we need to use vectors. We need to create vectors with some values and pass the vectors into data.frame() function as parameter. Thus, a
4 min read
Count the number of NA values in a DataFrame column in R
A null value in R is specified using either NaN or NA. In this article, we will see how can we count these values in a column of a dataframe. Approach Create dataframePass the column to be checked to is.na() function Syntax: is.na(column) Parameter: column: column to be searched for na values Return
1 min read
Calculate mean of multiple columns of R DataFrame
Mean is a numerical representation of the central tendency of the sample in consideration. In this article, we are going to calculate the mean of multiple columns of a dataframe in R Programming Language. Formula: Mean= sum of observations/total number of observations. Method 1: Using colMeans() fun
2 min read