Fix incorrect row estimates used for Memoize costing
authorDavid Rowley <[email protected]>
Mon, 16 May 2022 04:07:56 +0000 (16:07 +1200)
committerDavid Rowley <[email protected]>
Mon, 16 May 2022 04:07:56 +0000 (16:07 +1200)
In order to estimate the cache hit ratio of a Memoize node, one of the
inputs we require is the estimated number of times the Memoize node will
be rescanned.  The higher this number, the large the cache hit ratio is
likely to become.  Unfortunately, the value being passed as the number of
"calls" to the Memoize was incorrectly using the Nested Loop's
outer_path->parent->rows instead of outer_path->rows.  This failed to
account for the fact that the outer_path might be parameterized by some
upper-level Nested Loop.

This problem could lead to Memoize plans appearing more favorable than
they might actually be.  It could also lead to extended executor startup
times when work_mem values were large due to the planner setting overly
large MemoizePath->est_entries resulting in the Memoize hash table being
initially made much larger than might be required.

Fix this simply by passing outer_path->rows rather than
outer_path->parent->rows.  Also, adjust the expected regression test
output for a plan change.

Reported-by: Pavel Stehule
Author: David Rowley
Discussion: https://postgr.es/m/CAFj8pRAMp%3DQsMi6sPQJ4W3hczoFJRvyXHJV3AZAZaMyTVM312Q%40mail.gmail.com
Backpatch-through: 14, where Memoize was introduced

src/backend/optimizer/path/joinpath.c
src/test/regress/expected/join.out

index 55206ec54d2bd43f4027671e52d7f1cf00b37dd6..2a3f0ab7bfc8f46f9d0b82db8f09b2d217abf125 100644 (file)
@@ -610,7 +610,7 @@ get_memoize_path(PlannerInfo *root, RelOptInfo *innerrel,
                                            hash_operators,
                                            extra->inner_unique,
                                            binary_mode,
-                                           outer_path->parent->rows);
+                                           outer_path->rows);
    }
 
    return NULL;
index bf1a2db2cf08e0b31ec447f9686af5dfe0543179..bd3375f2bae0858380198190f69fd0a161ff1763 100644 (file)
@@ -3673,8 +3673,8 @@ select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten = t3.ten
 where t1.unique1 = 1;
-                          QUERY PLAN                          
---------------------------------------------------------------
+                       QUERY PLAN                       
+--------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3684,20 +3684,17 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Memoize
-               Cache Key: t2.thousand
-               Cache Mode: logical
-               ->  Index Scan using tenk1_unique2 on tenk1 t3
-                     Index Cond: (unique2 = t2.thousand)
-(14 rows)
+         ->  Index Scan using tenk1_unique2 on tenk1 t3
+               Index Cond: (unique2 = t2.thousand)
+(11 rows)
 
 explain (costs off)
 select * from tenk1 t1 left join
   (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2)
   on t1.hundred = t2.hundred and t1.ten + t2.ten = t3.ten
 where t1.unique1 = 1;
-                          QUERY PLAN                          
---------------------------------------------------------------
+                       QUERY PLAN                       
+--------------------------------------------------------
  Nested Loop Left Join
    ->  Index Scan using tenk1_unique1 on tenk1 t1
          Index Cond: (unique1 = 1)
@@ -3707,12 +3704,9 @@ where t1.unique1 = 1;
                Recheck Cond: (t1.hundred = hundred)
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = t1.hundred)
-         ->  Memoize
-               Cache Key: t2.thousand
-               Cache Mode: logical
-               ->  Index Scan using tenk1_unique2 on tenk1 t3
-                     Index Cond: (unique2 = t2.thousand)
-(14 rows)
+         ->  Index Scan using tenk1_unique2 on tenk1 t3
+               Index Cond: (unique2 = t2.thousand)
+(11 rows)
 
 explain (costs off)
 select count(*) from