How to merge data in R using R merge, dplyr or data.table

Last Updated : 25 Apr, 2025

Merging data is a common task in data analysis and data manipulation. It enables to combine information from different sources based on shared keys, creating richer datasets for exploration and modeling. Choosing the right merge method lets one balance speed, flexibility and ease of use.

Different Methods to Merge Data

We will explore three most common methods used in R programming language to merge data.

1. Using merge() function

The merge() function in R helps us to combine two or more data frames based on common columns. It performs various types of joins such as inner join, left join, right join and full join.

Syntax:
merged_df <- merge(x,y,by = "common_column",..)
'x' and 'y' are the data frames that you want to merge.
'by' specifies the common columns on which the merge will be performed.
Additional arguments like 'all.x',all.y' and 'all' control the type of join that is to be performed.

Example:

Consider two data frames df1 and df2

df1 <- data.frame(ID = c(1, 2, 3, 4),
                  Name = c("A", "B", "C", "D"),
                  Age = c(25, 30, 35, 40))

df2 <- data.frame(ID = c(2, 3, 4, 5),
                  Occupation = c("Engineer", "Teacher", "Doctor", "Lawyer"),
                  Salary = c(5000, 4000, 6000, 7000))

Various types of joins using the 'merge()' function

There are four types of Joins that can be done using the merge() function.

1. Inner join (default behavior):

inner_join <- merge(df1, df2, by = "ID")
print(inner_join)

Output:

ID Name Age Occupation Salary
1 2 B 30 Engineer 5000
2 3 C 35 Teacher 4000
3 4 D 40 Doctor 6000

The resulting inner_join dataframe will only include the common rows where 'ID' is present in both df1 and df2.

2. Left join('all.x=TRUE'):

left_join <- merge(df1, df2, by = "ID", all.x = TRUE)
print (left_join)

Output:

ID Name Age Occupation Salary
1 1 A 25 <NA> NA
2 2 B 30 Engineer 5000
3 3 C 35 Teacher 4000
4 4 D 40 Doctor 6000

The resulting left_join data frame will include all rows from df1 and the matching rows from df2. Non matching rows from df2 will have an NA value

3. Right join ('all.y=TRUE'):

right_join <- merge(df1, df2, by = "ID", all.y = TRUE)
print(right_join)

Output:

ID Name Age Occupation Salary
1 2 B 30 Engineer 5000
2 3 C 35 Teacher 4000
3 4 D 40 Doctor 6000
4 5 <NA> NA Lawyer 7000

The resulting right_join data frame will include all rows from df2 and the matching rows from df1. Non matching rows from df1 will have NA values.

4. Full outer join('all =TRUE')

full_join <- merge(df1, df2, by = "ID", all = TRUE)
print(full_join)

Output:

ID Name Age Occupation Salary
1 1 A 25 <NA> NA
2 2 B 30 Engineer 5000
3 3 C 35 Teacher 4000
4 4 D 40 Doctor 6000
5 5 <NA> NA Lawyer 7000

The resulting full_join data frame will include all rows from both df1 and df2. Non matching values will have NA values.

2. Using 'dplyr' package:

The 'dplyr' package provides a set of functions for data manipulation, including merging data frames. The primary function for merging in dplyr is join() function, which supports various types of joins.

Syntax:
merged_df<- join(x,y,by="common_column",type="type_of_join")
'x' and 'y' are the data frames to be merged.
'by' specifies the common columns on which the merge is to be performed
'type_of_join' can be 'inner', 'left',' right' or 'full' to specify the type of join.

Example:

Install the dplyr() package and create two data frames, df1 and df2.

install.packages("dplyr")
library(dplyr)

df1 <- data.frame(ID = c(1, 2, 3, 4),
                  Name = c("A", "B", "C", "D"),
                  Age = c(20, 30, 40, 50))

df2 <- data.frame(ID = c(2, 3, 4, 5),
                  Occupation = c("Engineer", "Teacher", "Doctor", "Lawyer"),
                  Salary = c(2000, 4000, 6000, 7000))

Various types of joins using the join() function from dplyr

We can perform four types of join , using the join() function from dplyr.

1. Inner join:

inner_join <- inner_join(df1, df2, by = "ID")
print(inner_join)

Output:

ID Name Age Occupation Salary
1 2 B 30 Engineer 2000
2 3 C 40 Teacher 4000
3 4 D 50 Doctor 6000

The resulting inner_join data frame will only include the common rows where 'ID' is present in both df1 and df2.

2. Left join:

left_join <- left_join(df1, df2, by = "ID")
print(left_join)

Output:

ID Name Age Occupation Salary
1 1 A 20 <NA> NA
2 2 B 30 Engineer 2000
3 3 C 40 Teacher 4000
4 4 D 50 Doctor 6000

The resulting left_join data frame will include all rows from df1 and the matching rows from df2. Non matching rows from df2 will have NA values.

3. Right join:

right_join <- right_join(df1, df2, by = "ID")
print(right_join)

Output:

ID Name Age Occupation Salary
1 2 B 30 Engineer 2000
2 3 C 40 Teacher 4000
3 4 D 50 Doctor 6000
4 5 <NA> NA Lawyer 7000

The resulting right_join dataframe will include all rows from df2 and the matching rows from df1. Non matching rows of df1 will have NA values.

4. Full outer join:

full_join <- full_join(df1, df2, by = "ID")
print(full_join)

Output:

ID Name Age Occupation Salary
1 1 A 20 <NA> NA
2 2 B 30 Engineer 2000
3 3 C 40 Teacher 4000
4 4 D 50 Doctor 6000
5 5 <NA> NA Lawyer 7000

The resulting full_join data frame will include all rows from both df1 and df2. Non matching rows will have NA values.

3. Using data.table package:

The data.table package offers an efficient and fast approach to data manipulation. It provides the merge()' function. It is similar to the one in R but optimized for speed.

Syntax:
merged_dt <- merge(x, y, by = "common_column", ...)
'x' and 'y' are the data frames that are to be merged.
'by' specifies the common columns on which the merge will be performed.
Additional arguments like 'all.x', 'all.y' and 'all' that controls the type of join.

Example:

Install the data.table library and create two data tables, df1 and df2.

install.packages("data.table")
library(data.table)

df1 <- data.table(ID = c(1, 2, 3, 4),
                  Name = c("A", "B", "C", "D"),
                  Age = c(25, 30, 35, 40))
df2 <- data.table(ID = c(2, 3, 4, 5),
                  Occupation = c("Engineer", "Teacher", "Doctor", "Lawyer"),
                  Salary = c(5000, 4000, 6000, 7000))

Various types of merges using the merge() function from data.table package

We can perform four types of join , using the merge() function from data.table.

1. Inner join( default behaviour):

inner_join <- merge(df1, df2, by = "ID")
print(inner_join)

Output:

ID Name Age Occupation Salary
1 2 B 30 Engineer 2000
2 3 C 40 Teacher 4000
3 4 D 50 Doctor 6000

The resulting inner_join data frame will only include the common rows where 'ID' is present in both df1 and df2.

2. Left join( 'all.x = TRUE'):

left_join <- merge(df1, df2, by = "ID", all.x = TRUE)
print(left_join)

Output:

ID Name Age Occupation Salary
1 1 A 20 <NA> NA
2 2 B 30 Engineer 2000
3 3 C 40 Teacher 4000
4 4 D 50 Doctor 6000

The resulting left_join data frame will include all Non matching from df1 and the matching rows from df2. Non matching rows from df2 will have NA values.

3. Right join( 'all.y = TRUE'):

right_join <- merge(df1, df2, by = "ID", all.y = TRUE)
print(right_join)

Output:

ID Name Age Occupation Salary
1 2 B 30 Engineer 2000
2 3 C 40 Teacher 4000
3 4 D 50 Doctor 6000
4 5 <NA> NA Lawyer 7000

The resulting right_join data frame will include all Non matching rows from df2 and the matching rows from df1. Non matching rows from df1 will have NA values.

3. Full outer join( 'all = TRUE'):

full_join <- merge(df1, df2, by = "ID", all = TRUE)
print(full_join)

Output:

ID Name Age Occupation Salary
1 1 A 20 <NA> NA
2 2 B 30 Engineer 2000
3 3 C 40 Teacher 4000
4 4 D 50 Doctor 6000
5 5 <NA> NA Lawyer 7000

The resulting full_join data frame will include all Non matching rows from both df1 and df2. Non matching rows will have NA values.

In this article, we explored three approaches to merging data frames in R: base R’s merge(), dplyr’s join functions and data.table’s fast join.

How to merge data in R using R merge, dplyr or data.table

roshnichoppala6

Improve

Article Tags :

How to merge data in R using R merge, dplyr or data.table

Different Methods to Merge Data

1. Using merge() function

Example:

Various types of joins using the 'merge()' function

1. Inner join (default behavior):

2. Left join('all.x=TRUE'):

3. Right join ('all.y=TRUE'):

4. Full outer join('all =TRUE')

2. Using 'dplyr' package:

Example:

Various types of joins using the join() function from dplyr

1. Inner join:

2. Left join:

3. Right join:

4. Full outer join:

3. Using data.table package:

Example:

Various types of merges using the merge() function from data.table package

1. Inner join( default behaviour):

2. Left join( 'all.x = TRUE'):

3. Right join( 'all.y = TRUE'):

3. Full outer join( 'all = TRUE'):

Similar Reads

Thank You!

What kind of Experience do you want to share?