Identify and Remove Duplicate Data in R

Last Updated : 23 Apr, 2025

A dataset can have duplicate values and to keep it redundancy-free and accurate, duplicate rows need to be identified and removed. In this article, we are going to see how to identify and remove duplicate data in R. First we will check if duplicate data is present in our data, if yes then, we will remove it.

Identifying Duplicate Data in vector

We can use duplicated() function to find out how many duplicates value are present in a vector. The sum() function will give us the count of the number of duplicate values.

vec <- c(1, 2, 3, 4, 4, 5)

duplicated(vec)

sum(duplicated(vec))

Output:

[1] FALSE FALSE FALSE FALSE TRUE FALSE

[1] 1

Removing Duplicate Data in a vector

We can remove duplicate data from vectors by using unique() functions so it will give only unique values.

vec <- c(1, 2, 3, 4, 4, 5)

unique(vec)

Output:

[1] 1 2 3 4 5

Identifying Duplicate Data in a data frame

For identification, we will use the duplicated() function which returns the count of duplicate rows.

Syntax: duplicated(dataframe)

Example:

res=data.frame(name=c("Ram","Geeta","John","Paul",
                                 "Cassie","Geeta","Paul"),
                          maths=c(7,8,8,9,10,8,9),
                          science=c(5,7,6,8,9,7,8),
                          history=c(7,7,7,7,7,7,7))

res
duplicated(res)
sum(duplicated(res))

Output:

duplicated(student_result)
[1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE

sum(duplicated(student_result))
[1] 2

Removing Duplicate Data in a data frame

We will see some different methods to handle duplicate values in a dataframe.

Method 1: Using unique()

We use unique() to get rows having unique values in our data.

Syntax: unique(dataframe)

Example:

res=data.frame(name=c("Ram","Geeta","John","Paul",
                                 "Cassie","Geeta","Paul"),
                          maths=c(7,8,8,9,10,8,9),
                          science=c(5,7,6,8,9,7,8),
                          history=c(7,7,7,7,7,7,7))

res
unique(res)

Output:

name maths science history
1 Ram 7 5 7
2 Geeta 8 7 7
3 John 8 6 7
4 Paul 9 8 7
5 Cassie 10 9 7

Method 2: Using distinct()

Package "tidyverse" should be installed and "dplyr" library should be loaded to use distinct(). We use distinct() to get rows having distinct values in our data.

Syntax: distinct(dataframe,keepall)

Parameter:

dataframe: data in use
keepall: decides which variables to keep

Example 1: Using distinct function

library(tidyverse)

res=data.frame(name=c("Ram","Geeta","John","Paul",
                                 "Cassie","Geeta","Paul"),
                          maths=c(7,8,8,9,10,8,9),
                          science=c(5,7,6,8,9,7,8),
                          history=c(7,7,7,7,7,7,7))

res
distinct(res)

Output:

name maths science history
1 Ram 7 5 7
2 Geeta 8 7 7
3 John 8 6 7
4 Paul 9 8 7
5 Cassie 10 9 7

Example 2: Printing unique rows in terms of maths column

res=data.frame(name=c("Ram","Geeta","John","Paul",
                                 "Cassie","Geeta","Paul"),
                          maths=c(7,8,8,9,10,8,9),
                          science=c(5,7,6,8,9,7,8),
                          history=c(7,7,7,7,7,7,7))

res
distinct(res,maths,.keep_all = TRUE)

Output:

name maths science history
1 Ram 7 5 7
2 Geeta 8 7 7
3 Paul 9 8 7
4 Cassie 10 9 7

In this article , we learned how to identify and remove duplicate values using different approaches in R programming language.

Identify and Remove Duplicate Data in R

devangj9689

Improve

Article Tags :

Identify and Remove Duplicate Data in R

Identifying Duplicate Data in vector

Removing Duplicate Data in a vector

Identifying Duplicate Data in a data frame

Example:

Removing Duplicate Data in a data frame

Method 1: Using unique()

Method 2: Using distinct()

Similar Reads

Thank You!

What kind of Experience do you want to share?