Fix nbtree cleanup-only VACUUM stats inaccuracies.
authorPeter Geoghegan <[email protected]>
Thu, 5 Nov 2020 02:42:27 +0000 (18:42 -0800)
committerPeter Geoghegan <[email protected]>
Thu, 5 Nov 2020 02:42:27 +0000 (18:42 -0800)
Logic for counting heap TIDs from posting list tuples (added by commit
0d861bbb) was faulty.  It didn't count any TIDs/index tuples in the
event of no callback being set.  This meant that we incorrectly counted
no index tuples in clean-up only VACUUMs, which could lead to
pg_class.reltuples being spuriously set to 0 in affected indexes.

To fix, go back to counting items from the page in cases where there is
no callback.  This approach isn't very accurate, but it works well
enough in practice while avoiding the expense of accessing every index
tuple during cleanup-only VACUUMs.

Author: Peter Geoghegan <[email protected]>
Reported-By: Jehan-Guillaume de Rorthais <[email protected]>
https://postgr.es/m/20201023174451.69e358f1@firost
Backpatch: 13-, where nbtree deduplication was introduced

src/backend/access/nbtree/nbtree.c

index c822b49a71022d8b6b7b5e5baaa58a9ec1640a8b..c4f22f1c694ff306f3f3071228a00d61b4ac69d0 100644 (file)
@@ -933,6 +933,12 @@ btvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
     * double-counting some index tuples, so disbelieve any total that exceeds
     * the underlying heap's count ... if we know that accurately.  Otherwise
     * this might just make matters worse.
+    *
+    * Posting list tuples are another source of inaccuracy.  Cleanup-only
+    * btvacuumscan calls assume that the number of index tuples can be used
+    * as num_index_tuples, even though num_index_tuples is supposed to
+    * represent the number of TIDs in the index.  This naive approach can
+    * underestimate the number of tuples in the index.
     */
    if (!info->estimated_count)
    {
@@ -1394,11 +1400,18 @@ backtrack:
         * separate live tuples).  We don't delete when backtracking, though,
         * since that would require teaching _bt_pagedel() about backtracking
         * (doesn't seem worth adding more complexity to deal with that).
+        *
+        * We don't count the number of live TIDs during cleanup-only calls to
+        * btvacuumscan (i.e. when callback is not set).  We count the number
+        * of index tuples directly instead.  This avoids the expense of
+        * directly examining all of the tuples on each page.
         */
        if (minoff > maxoff)
            attempt_pagedel = (blkno == scanblkno);
-       else
+       else if (callback)
            stats->num_index_tuples += nhtidslive;
+       else
+           stats->num_index_tuples += maxoff - minoff + 1;
 
        Assert(!attempt_pagedel || nhtidslive == 0);
    }