I’m working with a MongoDB aggregation pipeline that involves joining two large collections—mainCollection and relatedCollection—using $lookup. The goal is to filter documents based on various fields, including some within the joined data, and then paginate the results.rrawat.com+1medium.com+1
However, the query is experiencing performance issues, taking several seconds to return results. The explain output indicates a collection scan (COLLSCAN) and a blocking sort operation. I’m seeking advice on how to optimize this pipeline to improve performance.
I did try adding the $sort: { createdAt: -1 } stage after the $lookup and $match stages, but it noticeably increased the aggregate execution time by around 4–5 seconds.
I was thinking of creating a temporary collection to store the results after the $lookup, then indexing it to improve aggregate performance — but I’m not sure if storing duplicate data like that is a good approach.
Mainly i need the 6-7 fields from lookup and 4-5 fields from MainCollection, i tried with adding pipeline in lookup to get specific fields but it has increased my Overall Aggegrate run time.
yes i required all $or check because then it will easy for end User to fetch data as ref to multiple fields.
In your case, since it looks like it is an email address field and email addresses are case-insensitive you could spare that extra computing by permanently storing emails in lowercase.
You might need to have searchValue as entered by the user for the other comparison but search with searchValueInLowercase for …fieldD.
Before going to the extent of
I think we can do better but we might need a little more brainstorming.
Most likely because the $addFields and $lookup modifies the documents and $sort cannot determine that whatever index you have with createdAt prefix can be use. Please share the indexes you have on mainCollection.
The $addFields after the $limit seems purely cosmetic, I would try to remove them while your invesitigate. And I am pretty sure the $first is not really needed because foreignField is _id so you should only get 1 document per refId.
We still miss a few details. What are your indexes?
As mentioned by steevej, field fieldD seems to be an email address. In this case it would make sense to convert them to lower case and use equal condition. { $regex: "searchValue", $options: "i" } might be wrong anyway. Mail [email protected] is different to [email protected] but they both match to { $regex: "[email protected]", $options: "i" }
You could try to limit the number of documents in $lookup, for example like this:
Note that if you really want to match sales@… only rather that pre-sales@… you may anchor the regex at the front with ^ and $ at the end. Anchoring at the front might improve performance.