Re: Partial aggregates pushdown - Mailing list pgsql-hackers
From | Alexander Pyhalov |
---|---|
Subject | Re: Partial aggregates pushdown |
Date | |
Msg-id | [email protected] Whole thread Raw |
In response to | RE: Partial aggregates pushdown ("[email protected]" <[email protected]>) |
Responses |
Re: Partial aggregates pushdown
|
List | pgsql-hackers |
[email protected] писал 2023-07-19 03:43: > Hi Mr.Pyhalov, hackers. > 3) > I modified the patch to safely do a partial aggregate pushdown for > queries which contain having clauses. > Hi. Sorry, but I don't see how it could work. For example, the attached test returns wrong result: CREATE FUNCTION f() RETURNS INT AS $$ begin return 10; end $$ LANGUAGE PLPGSQL; SELECT b, sum(a) FROM pagg_tab GROUP BY b HAVING sum(a) < f() ORDER BY 1; b | sum ----+----- 0 | 0 10 | 0 20 | 0 30 | 0 40 | 0 +(5 rows) In fact the above query should have returned 0 rows, as SELECT b, sum(a) FROM pagg_tab GROUP BY b ORDER BY 1; b | sum ----+------ 0 | 600 1 | 660 2 | 720 3 | 780 4 | 840 5 | 900 6 | 960 7 | 1020 8 | 1080 9 | 1140 10 | 600 11 | 660 12 | 720 .... shows no such rows. Or, on the same data SELECT b, sum(a) FROM pagg_tab GROUP BY b HAVING sum(a) > 660 ORDER BY 1; You'll get 0 rows. But SELECT b, sum(a) FROM pagg_tab GROUP BY b; b | sum ----+------ 42 | 720 29 | 1140 4 | 840 34 | 840 41 | 660 0 | 600 40 | 600 gives. The issue is that you can't calculate "partial" having. You should compare full aggregate in filter, but it's not possible on the level of one partition. And you have this in plans Finalize GroupAggregate Output: pagg_tab.b, avg(pagg_tab.a), max(pagg_tab.a), count(*) Group Key: pagg_tab.b Filter: (sum(pagg_tab.a) < 700) -> Sort Output: pagg_tab.b, (PARTIAL avg(pagg_tab.a)), (PARTIAL max(pagg_tab.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab.a)) Sort Key: pagg_tab.b -> Append -> Foreign Scan Output: pagg_tab.b, (PARTIAL avg(pagg_tab.a)), (PARTIAL max(pagg_tab.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab.a)) Filter: ((PARTIAL sum(pagg_tab.a)) < 700) !!!! <--- here we can't compare anything yet, sum is incomplete. Relations: Aggregate on (public.fpagg_tab_p1 pagg_tab) Remote SQL: SELECT b, avg_p_int4(a), max(a), count(*), sum(a) FROM public.pagg_tab_p1 GROUP BY 1 -> Foreign Scan Output: pagg_tab_1.b, (PARTIAL avg(pagg_tab_1.a)), (PARTIAL max(pagg_tab_1.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab_1.a)) Filter: ((PARTIAL sum(pagg_tab_1.a)) < 700) Relations: Aggregate on (public.fpagg_tab_p2 pagg_tab_1) Remote SQL: SELECT b, avg_p_int4(a), max(a), count(*), sum(a) FROM public.pagg_tab_p2 GROUP BY 1 -> Foreign Scan Output: pagg_tab_2.b, (PARTIAL avg(pagg_tab_2.a)), (PARTIAL max(pagg_tab_2.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab_2.a)) Filter: ((PARTIAL sum(pagg_tab_2.a)) < 700) Relations: Aggregate on (public.fpagg_tab_p3 pagg_tab_2) Remote SQL: SELECT b, avg_p_int4(a), max(a), count(*), sum(a) FROM public.pagg_tab_p3 GROUP BY 1 In foreign_grouping_ok() 6586 if (IsA(expr, Aggref)) 6587 { 6588 if (partial) 6589 { 6590 mark_partial_aggref((Aggref *) expr, AGGSPLIT_INITIAL_SERIAL); 6591 continue; 6592 } 6593 else if (!is_foreign_expr(root, grouped_rel, expr)) 6594 return false; 6595 6596 tlist = add_to_flat_tlist(tlist, list_make1(expr)); 6597 } at least you shouldn't do anything with expr, if is_foreign_expr() returned false. If we restrict pushing down queries with havingQuals, I'm not quite sure how Aggref can appear in local_conds. As for changes in planner.c (setGroupClausePartial()) I have several questions. 1) Why don't we add non_group_exprs to pathtarget->exprs when partial_target->exprs is not set? 2) We replace extra->partial_target->exprs with partial_target->exprs after processing. Why are we sure that after this tleSortGroupRef is correct? -- Best regards, Alexander Pyhalov, Postgres Professional
Attachment
pgsql-hackers by date: