Working with huge datasets - optimizing aggregates with date fields

Hello guys,

i am working with a very big collection. 38m entries in total.

Those entries include a date field which i have indexed in my mongoose model.

Selecting the last 30 days of data returns 4.7m datasets - the request takes 2 minutes to load

i have tried to optimize my aggregate but cant get it any faster

the aggregate only includes:

       {
            $match: {
                $and: [
                    { date: { $lte: endDate } },
                    { date: { $gte: startDate } }
                ]
            }
        },

date is indexed

what can i do to improve performance of that collection?

best regards
Damian

Your query is equivalent to

May be, just may be, the fact that you are using the explicit $and rather than the implicit version above causes the optimizer to ignore the index.

Share your indices and the explain plan for further help.

2 Likes

Recreating the aggregate in compass - the performance is wayyy faster

Running the aggregate in my node backend always takes 2 minutes

1 Like

The the bottleneck seems to be data transfer, which is comprehensible since you transfer 4.7m.

You have 2 choices from here:

  1. use $project to weed out fields you do not need in your backend
  2. leverage the power of aggregation by moving what you do in the back to a more complex pipeline
1 Like

you are right - my mistake.

i have changed the date indexes to descending to try and squeeze more speed out of the query but it didnt do anything

        "executionStats": {
            "executionSuccess": true,
            "nReturned": 5376753,
            "executionTimeMillis": 13878,
            "totalKeysExamined": 5376753,
            "totalDocsExamined": 5376753,
            "executionStages": {
                "isCached": false,
                "stage": "FETCH",
                "nReturned": 5376753,
                "executionTimeMillisEstimate": 13428,
                "works": 5376754,
                "advanced": 5376753,
                "needTime": 0,
                "needYield": 0,
                "saveState": 741,
                "restoreState": 741,
                "isEOF": 1,
                "docsExamined": 5376753,
                "alreadyHasObj": 0,
                "inputStage": {
                    "stage": "IXSCAN",
                    "nReturned": 5376753,
                    "executionTimeMillisEstimate": 3647,
                    "works": 5376754,
                    "advanced": 5376753,
                    "needTime": 0,
                    "needYield": 0,
                    "saveState": 741,
                    "restoreState": 741,
                    "isEOF": 1,
                    "keyPattern": {
                        "date": -1
                    },
                    "indexName": "date_-1",
                    "isMultiKey": false,
                    "multiKeyPaths": {
                        "date": []
                    },
                    "isUnique": false,
                    "isSparse": false,
                    "isPartial": false,
                    "indexVersion": 2,
                    "direction": "forward",
                    "indexBounds": {
                        "date": [
                            "[new Date(1748303999999), new Date(1745837512346)]"
                        ]
                    },
                    "keysExamined": 5376753,
                    "seeks": 1,
                    "dupsTested": 0,
                    "dupsDropped": 0
                }
            },
            "allPlansExecution": []
        },

FETCH stage takes 13 sec while the IXSCAN takes additional 3,6s

Is there any chance to improvie the speed of queries selecting data by date ranges?

If FETCH takes 13sec it means your working set does not fit in RAM.

It means you are consuming too much data for your use-case.

For the $project as

you would need an index that makes your query a covered query.

For dated use-cases, especially if past date data is not mutable, the Computed Pattern is often a good solution.

I am removing this thread from my bookmarks from the lack of followups.