Identify and Remove Duplicate Data in R
Last Updated :
23 Apr, 2025
A dataset can have duplicate values and to keep it redundancy-free and accurate, duplicate rows need to be identified and removed. In this article, we are going to see how to identify and remove duplicate data in R. First we will check if duplicate data is present in our data, if yes then, we will remove it.
Identifying Duplicate Data in vector
We can use duplicated() function to find out how many duplicates value are present in a vector. The sum() function will give us the count of the number of duplicate values.
R
vec <- c(1, 2, 3, 4, 4, 5)
duplicated(vec)
sum(duplicated(vec))
Output:
[1] FALSE FALSE FALSE FALSE TRUE FALSE
[1] 1
Removing Duplicate Data in a vector
We can remove duplicate data from vectors by using unique() functions so it will give only unique values.
R
vec <- c(1, 2, 3, 4, 4, 5)
unique(vec)
Output:
[1] 1 2 3 4 5
Identifying Duplicate Data in a data frame
For identification, we will use the duplicated() function which returns the count of duplicate rows.
Syntax: duplicated(dataframe)
Example:
R
res=data.frame(name=c("Ram","Geeta","John","Paul",
"Cassie","Geeta","Paul"),
maths=c(7,8,8,9,10,8,9),
science=c(5,7,6,8,9,7,8),
history=c(7,7,7,7,7,7,7))
res
duplicated(res)
sum(duplicated(res))
Output:
duplicated(student_result)
[1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE
sum(duplicated(student_result))
[1] 2
Removing Duplicate Data in a data frame
We will see some different methods to handle duplicate values in a dataframe.
Method 1: Using unique()
We use unique() to get rows having unique values in our data.
Syntax: unique(dataframe)
Example:
R
res=data.frame(name=c("Ram","Geeta","John","Paul",
"Cassie","Geeta","Paul"),
maths=c(7,8,8,9,10,8,9),
science=c(5,7,6,8,9,7,8),
history=c(7,7,7,7,7,7,7))
res
unique(res)
Output:
name maths science history
1 Ram 7 5 7
2 Geeta 8 7 7
3 John 8 6 7
4 Paul 9 8 7
5 Cassie 10 9 7
Method 2: Using distinct()
Package "tidyverse" should be installed and "dplyr" library should be loaded to use distinct(). We use distinct() to get rows having distinct values in our data.
Syntax: distinct(dataframe,keepall)
Parameter:
- dataframe: data in use
- keepall: decides which variables to keep
Example 1: Using distinct function
R
library(tidyverse)
res=data.frame(name=c("Ram","Geeta","John","Paul",
"Cassie","Geeta","Paul"),
maths=c(7,8,8,9,10,8,9),
science=c(5,7,6,8,9,7,8),
history=c(7,7,7,7,7,7,7))
res
distinct(res)
Output:
name maths science history
1 Ram 7 5 7
2 Geeta 8 7 7
3 John 8 6 7
4 Paul 9 8 7
5 Cassie 10 9 7
Example 2: Printing unique rows in terms of maths column
R
res=data.frame(name=c("Ram","Geeta","John","Paul",
"Cassie","Geeta","Paul"),
maths=c(7,8,8,9,10,8,9),
science=c(5,7,6,8,9,7,8),
history=c(7,7,7,7,7,7,7))
res
distinct(res,maths,.keep_all = TRUE)
Output:
name maths science history
1 Ram 7 5 7
2 Geeta 8 7 7
3 Paul 9 8 7
4 Cassie 10 9 7
In this article , we learned how to identify and remove duplicate values using different approaches in R programming language.
Similar Reads
How to Find and Remove Duplicate Files on Linux? Most of us have a habit of downloading many types of stuff (songs, files, etc) from the internet and that is why we may often find we have downloaded the same mp3 files, PDF files, and other extensions. Your disk spaces are unnecessarily wasted by Duplicate files and if you want the same files on a
4 min read
How to Find and Remove Duplicates in Excel Inacurate data always ruin your data and that the reason removing duplicates in Excel because it is one of the reason for duplicate data in Excel. In this guide, we aer going to explain you the best ways to find and remove duplicates in Excel. How to Find and Remove Duplicates in ExcelIn the section
9 min read
Coping with Missing, Invalid and Duplicate Data in R Data is the base of statistical analysis and machine learning. The free data we get for processing is often raw and has many issues like invalid terms, and missing or duplicate values that can cause major changes in our model processing and estimation. We use the past data to train our model and pre
15+ min read
Duplicate a data frame using R In this article, we will explore various methods to duplicate the data frame by using the R Programming Language. How to duplicate a data frameR language offers various methods to duplicate the data frame. By using these methods provided by R, it is possible to duplicate the data frame. Some of the
4 min read
Remove Duplicate rows in R using Dplyr In this article, we are going to remove duplicate rows in R programming language using Dplyr package. Method 1: distinct() This function is used to remove the duplicate rows in the dataframe and get the unique data Syntax: distinct(dataframe) We can also remove duplicate rows based on the multiple c
3 min read