How to Remove Duplicates by using $unionWith in MongoDB?
Last Updated :
07 May, 2024
Duplicate documents in a MongoDB collection can often lead to inefficiencies and inconsistencies in data management. However, MongoDB provides powerful aggregation features to help us solve such issues effectively.
In this article, we'll explore how to remove duplicates using the $unionWith aggregation stage in MongoDB. We'll cover the concepts, syntax, and practical examples to demonstrate its usage and effectiveness.
Understanding $unionWith
- The $unionWith aggregation stage in MongoDB is used to combine documents from multiple collections or aggregation pipelines into a single stream of documents.
- It allows us to merge the results of different data sources which can be useful for various data processing tasks, including removing duplicates.
Syntax of $unionWith:
The syntax of $unionWith is straightforward. Here's how it looks:
{
$unionWith: {
coll: "<collection_name>"
}
}
- $unionWith: The aggregation stage to combine documents from different collections.
- coll: The name of the collection to union documents with.
Example of Removing Duplicates with $unionWith
To understand How to Remove Duplicates by using $unionWith in MongoDB we need a collection and some documents on which we will perform various operations and queries. Here we will consider a collection called users and collection2 which contains the information shown below:
[
{
"_id": ObjectId("60f3727c81c1b4e14f252d12"),
"name": "Alice",
"email": "[email protected]",
"age": 30
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d13"),
"name": "Bob",
"email": "[email protected]",
"age": 35
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d14"),
"name": "Charlie",
"email": "[email protected]",
"age": 40
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d15"),
"name": "David",
"email": "[email protected]",
"age": 45
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d16"),
"name": "Eve",
"email": "[email protected]",
"age": 50
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d17"),
"name": "Frank",
"email": "[email protected]",
"age": 55
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d18"),
"name": "Alice",
"email": "[email protected]",
"age": 60
}
]
collection2:
// collection2
[
{
"_id": ObjectId("60f3727c81c1b4e14f252d19"),
"name": "Alice",
"email": "[email protected]",
"age": 65
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d20"),
"name": "Bob",
"email": "[email protected]",
"age": 70
}
]
Example 1: Remove duplicates based on the "name" field
db.users.aggregate([
{ $unionWith: { coll: "collection2" } },
{
$group: {
_id: "$name",
doc: { $first: "$$ROOT" }
}
},
{ $replaceRoot: { newRoot: "$doc" } }
])
Output:
[
{
"_id": ObjectId("60f3727c81c1b4e14f252d12"),
"name": "Alice",
"email": "[email protected]",
"age": 30
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d13"),
"name": "Bob",
"email": "[email protected]",
"age": 35
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d14"),
"name": "Charlie",
"email": "[email protected]",
"age": 40
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d15"),
"name": "David",
"email": "[email protected]",
"age": 45
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d16"),
"name": "Eve",
"email": "[email protected]",
"age": 50
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d17"),
"name": "Frank",
"email": "[email protected]",
"age": 55
}
]
Explanation: This MongoDB aggregation pipeline combines documents from the users
collection with collection2
, groups them by the "name" field, and retains only the first document encountered for each name. The $replaceRoot
stage then replaces each document with the retained document, effectively removing duplicates based on the "name" field.
Example 2: Remove duplicates based on the "email" field
To remove duplicates based on the "email" field, you can modify the $group
stage in the aggregation pipeline
db.users.aggregate([
{ $unionWith: { coll: "collection2" } },
{
$group: {
_id: "$email",
doc: { $first: "$$ROOT" }
}
},
{ $replaceRoot: { newRoot: "$doc" } }
])
Output:
[
{
"_id": ObjectId("60f3727c81c1b4e14f252d12"),
"name": "Alice",
"email": "[email protected]",
"age": 30
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d13"),
"name": "Bob",
"email": "[email protected]",
"age": 35
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d14"),
"name": "Charlie",
"email": "[email protected]",
"age": 40
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d15"),
"name": "David",
"email": "[email protected]",
"age": 45
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d16"),
"name": "Eve",
"email": "[email protected]",
"age": 50
},
{
"_id": ObjectId("60f3727c81c1b4e14f252d17"),
"name": "Frank",
"email": "[email protected]",
"age": 55
}
]
Explanation: This MongoDB aggregation pipeline merges documents from the users
collection with collection2
, groups them by the "email" field, and retains only the first document encountered for each email. The $replaceRoot
stage then replaces each document with the retained document, effectively removing duplicates based on the "email" field
Conclusion
Overall, we explored how to remove duplicates from a MongoDB collection using the $unionWith aggregation stage. We discussed the syntax and provided a step-by-step example to demonstrate its usage. By using the aggregation pipelines and $unionWith, MongoDB enables efficient and effective removal of duplicate documents, ensuring data integrity and consistency in your database. As you continue to work with MongoDB, mastering aggregation pipelines and their stages will prove invaluable for various data processing tasks.
Similar Reads
How to Find Duplicates in MongoDB Duplicates in a MongoDB collection can lead to data inconsistency and slow query performance. Therefore, it's essential to identify and handle duplicates effectively to maintain data integrity. In this article, we'll explore various methods of how to find duplicates in MongoDB collections and discus
4 min read
How to Remove Documents using Node.js Mongoose? When working with Node.js and MongoDB, managing data involves removing documents from collections. In this article, we will explore how to remove documents using Node.js and Mongoose, a popular MongoDB library that simplifies database interactions. We'll cover the various methods provided by Mongoos
3 min read
How to Use $unwind Operator in MongoDB? MongoDB $unwind operator is an essential tool for handling arrays within documents. It helps deconstruct arrays, converting each array element into a separate document, which simplifies querying, filtering, and aggregation in MongoDB.By understanding the MongoDB $unwind syntax users can utilize this
6 min read
How to Remove Duplicate Elements from an Array using Lodash ? Removing duplicate elements from an array is necessary for data integrity and efficient processing. The approaches implemented and explained below will use the Lodash to remove duplicate elements from an array. Table of Content Using uniq methodUsing groupBy and map methodsUsing xor functionUsing un
3 min read
How to Use $set and $unset Operators in MongoDB MongoDB is a NoSQL database that stores data in documents instead of traditional rows and columns found in relational databases. These documents, grouped into collections, allow for flexible data storage and retrieval. One of MongoDBâs key advantages is its ability to dynamically update documents us
6 min read
How to Filter Array in Subdocument with MongoDB? In MongoDB, working with arrays within subdocuments is a common requirement in many applications. Filtering and manipulating arrays efficiently can significantly enhance the flexibility and enhance our queries. In this article, we'll explore how to filter arrays within subdocuments in MongoDB by cov
5 min read
How to add unique constraint in collection of MongoDB using Node.js? Mongoose module is one of the most powerful external modules of the node.js.Mongoose is a MongoDB ODM i.e (Object database Modelling) that used to translate the code and its representation from MongoDB to the Node.js server. Mongoose module provides several functions in order to manipulate the docum
2 min read
How to count total number of unique documents in MongoDB using Node.js ? MongoDB, the most popular NoSQL database, we can count the number of documents in MongoDB Collection using the MongoDB collection.countDocuments() function. The mongodb module is used for connecting the MongoDB database as well as used for manipulating the collections and databases in MongoDB. Inst
1 min read
How to delete single and multiple documents in MongoDB using node.js ? MongoDB, the most popular NoSQL database is an open-source document-oriented database. The term âNoSQLâ means ânon-relationalâ. It means that MongoDB isnât based on the table-like relational database structure but provides an altogether different mechanism for storage and retrieval of data. This for
2 min read
How to Merge Two Arrays and Remove Values that have Duplicates? In this article, we will discuss how to merge two arrays and remove values that have duplicate Ruby. We can merge two arrays and remove values that have duplicates through various methods provided in Ruby. Table of Content Using the | OperatorUsing the uniq MethodUsing the Concat with uniq MethodUsi
2 min read