Re: POC: GROUP BY optimization - Mailing list pgsql-hackers
On 5/29/24 19:53, Alexander Korotkov wrote: > Hi, Andrei! > > Thank you for your feedback. > > On Wed, May 29, 2024 at 11:08 AM Andrei Lepikhov > <[email protected]> wrote: >> On 5/27/24 19:41, Alexander Korotkov wrote: >>> Any thoughts? >> About 0001: >> Having overviewed it, I don't see any issues (but I'm the author), >> except grammatical ones - but I'm not a native to judge it. >> Also, the sentence 'turning GROUP BY clauses into pathkeys' is unclear >> to me. It may be better to write something like: 'building pathkeys by >> the list of grouping clauses'. > > OK, thank you. I'll run once again for the grammar issues. > >> 0002: >> The part under USE_ASSERT_CHECKING looks good to me. But the code in >> group_keys_reorder_by_pathkeys looks suspicious: of course, we do some >> doubtful work without any possible way to reproduce, but if we envision >> some duplicated elements in the group_clauses, we should avoid usage of >> the list_concat_unique_ptr. > > As I understand Tom, there is a risk that clauses list may contain > multiple instances of equivalent SortGroupClause, not duplicate > pointers. Maybe. I just implemented the worst-case scenario with the intention of maximum safety. So, it's up to you. > >> What's more, why do you not exit from >> foreach_ptr immediately after SortGroupClause has been found? I think >> the new_group_clauses should be consistent with the new_group_pathkeys. > > I wanted this to be consistent with preprocess_groupclause(), where > duplicate SortGroupClause'es are grouped together. Otherwise, we > could just delete redundant SortGroupClause'es. Hm, preprocess_groupclause is called before the standard_qp_callback forms the 'canonical form' of group_pathkeys and such behaviour needed. But the code that chooses the reordering strategy uses already processed grouping clauses, where we don't have duplicates in the first num_groupby_pathkeys elements, do we? >> 0004: >> I was also thinking about reintroducing the preprocess_groupclause >> because with the re-arrangement of GROUP-BY clauses according to >> incoming pathkeys, it doesn't make sense to have a user-defined order—at >> least while cost_sort doesn't differ costs for alternative column orderings. >> So, I'm okay with the code. But why don't you use the same approach with >> foreach_ptr as before? > > I restored the function as it was before 0452b461bc with minimal edits > to support the incremental sort. I think it would be more valuable to > keep the difference with pg16 code small rather than refactor to > simplify existing code. Got it -- regards, Andrei Lepikhov Postgres Professional
pgsql-hackers by date: