Create Lagged Variable by Group in R DataFrame
Last Updated :
29 Jul, 2022
Lagged variable is the type of variable that contains the previous value of the variable for which we want to create the lagged variable and the first value is neglected. Data can be segregated based on different groups in R programming language and then these categories can be processed differently.
Method 1 : Using dplyr package
The "dplyr" package in R language is used to perform data enhancements and manipulations and can be loaded into the working space.
group_by() method in R can be used to categorize data into groups based on either a single column or a group of multiple columns. All the plausible unique combinations of the input columns are stacked together as a single group.
Syntax:
group_by(args .. ),
where the args contain a sequence of column to group data upon
This is followed by the application of the mutate() method over the data frame which is used to simulate creation, deletion and modification of data frame columns. mutate() method adds new variables as well as preserves the existing ones. The mutate method takes as an argument the lag() method to perform transmutations on the data. The lag() method is used to induce lagged values for the specified variable.
Syntax:
lag(col, n = 1L, default = NA)
Parameters :
- col - The column of the data frame to introduce lagged values in.
- n - (Default : 1) The number of positions to lead or lag by
- default - (Default : NA) Value used for non-existent rows.
The first instance of the occurrence of the variable in the lag() input column's attribute is replaced by NA. All the successive instances as replaced by the previous value that was assigned to the same group.
The result of these methods is in the form of a tibble which is a table-like structure and proper information about the number of groups and column class is returned.
Example 1:
R
library("dplyr")
# creating a data frame
data_frame <- data.frame(col1 = rep(c(1:3), each = 3),
col2 = letters[1:3]
)
print ("Original DataFrame")
print (data_frame)
data_mod <- data_frame %>%
group_by(col1) %>%
dplyr::mutate(laggedval = lag(col2, n = 1, default = NA))
print ("Modified Data")
print (data_mod)
Output
[1] "Original DataFrame"
col1 col2
1 1 a
2 1 b
3 1 c
4 2 a
5 2 b
6 2 c
7 3 a
8 3 b
9 3 c
[1] "Modified Data"
# A tibble: 9 x 3
# Groups: col1 [3]
col1 col2 laggedval
<int> <fct> <fct>
1 1 a NA
2 1 b a
3 1 c b
4 2 a NA
5 2 b a
6 2 c b
7 3 a NA
8 3 b a
9 3 c b
Grouping can be done based on multiple columns, where the groups created are dependent on the different possible unique sets that can be created out of all the combinations of the involved columns.
Example 2:
R
library("tidyverse")
# creating a data frame
data_frame <- data.frame(col1 = rep(c(1:3), each = 3),
col2 = letters[1:3],
col3 = c(1,4,1,2,2,2,1,2,2))
print ("Original DataFrame")
print (data_frame)
print ("Modified DataFrame")
data_mod <- data_frame %>%
group_by(col1,col3) %>%
dplyr::mutate(laggedval = lag(col2, n = 1, default = NA))
print ("Modified Data")
print (data_mod)
Output
[1] "Original DataFrame"
col1 col2 col3
1 1 a 1
2 1 b 4
3 1 c 1
4 2 a 2
5 2 b 2
6 2 c 2
7 3 a 1
8 3 b 2
9 3 c 2
[1] "Modified DataFrame"
[1] "Modified Data"
# A tibble: 9 x 4
# Groups: col1, col3 [5]
col1 col2 col3 laggedval
<int> <fct> <dbl> <fct>
1 1 a 1 NA
2 1 b 4 NA
3 1 c 1 a
4 2 a 2 NA
5 2 b 2 a
6 2 c 2 b
7 3 a 1 NA
8 3 b 2 NA
9 3 c 2 b
Method 2 : Using duplicated()
Initially, the number of rows of the data frame are fetched using the nrow() method in R language. This is followed by the extraction of values from the column to introduce lagged values in excluding the last row value. This will return a vector of one missing value (induced for the last row) followed by the row values in order of the desired column.
The first instance of every group occurrence is then identified by the duplicated() method and replaced by NA using the which() method. These values' modification is stored in the new column name assigned to the data frame.
Example:
R
# creating a data frame
data_frame <- data.frame(col1 = rep(c(1:3), each = 3),
col2 = letters[1:3]
)
print ("Original DataFrame")
print (data_frame)
# getting the last row col index
last_row <- -nrow(data_frame)
excl_last_row <- as.character(data_frame$col2[last_row])
# create a vector of values of NA and col2
data_frame$lag_value <- c( NA, excl_last_row)
# replace first occurrence by NA
data_frame$lag_value[which(!duplicated(data_frame$col1))] <- NA
print ("Modified Data")
print (data_frame)
Output
[1] "Original DataFrame"
col1 col2
1 1 a
2 1 b
3 1 c
4 2 a
5 2 b
6 2 c
7 3 a
8 3 b
9 3 c
[1] "Modified Data"
col1 col2 lag_value
1 1 a <NA>
2 1 b a
3 1 c b
4 2 a <NA>
5 2 b a
6 2 c b
7 3 a <NA>
8 3 b a
9 3 c b
Similar Reads
Count non-NA values by group in DataFrame in R
In this article, we will discuss how to count non-NA values by the group in dataframe in R Programming Language. Method 1 : Using group_by() and summarise() methods The dplyr package is used to perform simulations in the data by performing manipulations and transformations. The group_by() method in
5 min read
Create DataFrame Row by Row in R
In this article, we will discuss how to create dataframe row by row in R Programming Language. Method 1: Using for loop and indexing methods An empty data frame in R language can be created using the data.frame() method in R. For better clarity, the data types of the columns can be defined during th
4 min read
How to Calculate the Mean by Group in R DataFrame ?
Calculating the mean by group in an R DataFrame involves splitting the data into subsets based on a specific grouping variable and then computing the mean of a numeric variable within each subgroup. In this article, we will see how to calculate the mean by the group in R DataFrame in R Programming L
5 min read
Frequency count of multiple variables in R Dataframe
A data frame may contain repeated or missing values. Each column may contain any number of duplicate or repeated instances of the same variable. Data statistics and analysis mostly rely on the task of computing the frequency or count of the number of instances a particular variable contains within e
4 min read
Create Dataframe of Unequal Length in R
In this article, we will be looking at the approach to create a data frame of unequal length using different functions in R Programming language. To create a data frame of unequal length, we add the NA value at the end of the columns which are smaller in the lengths and makes them equal to the colum
2 min read
How to create a DataFrame from given vectors in R ?
In this article we will see how to create a Dataframe from four given vectors in R. To create a data frame in R using the vector, we must first have a series of vectors containing data. The data.frame() function is used to create a data frame from vector in R. Syntax: data.frame(vectors) Example 1.
2 min read
Count the frequency of a variable per column in R Dataframe
A data frame may contain repeated or missing values. Each column may contain any number of duplicate or repeated instances of the same variable. Data statistics and analysis mostly rely on the task of computing the frequency or count of the number of instances a particular variable contains within e
6 min read
Create DataFrame with Spaces in Column Names in R
In this article, we will see how to create a DataFrame with spaces in column names in R Programming Language. Method 1: Using check.names attribute The data.frame() method in R can be used to create a data frame with individual rows and columns in R. This method contains an attribute check.names, wh
4 min read
Split DataFrame Variable into Multiple Columns in R
In this article, we will discuss how to split dataframe variables into multiple columns using R programming language. Method 1: Using do.call method The strsplit() method in R is used to split the specified column string vector into corresponding parts. The pattern is used to divide the string into
3 min read
Convert Named Vector to DataFrame in R
In this article, we will see how to convert the named vector to Dataframe in the R Programming Language. Method 1: Generally while converting a named vector to a dataframe we may face a problem. That is, names of vectors may get converted into row names, and data may be converted into a single colu
1 min read