Home > mailing lists

Re: POC: GROUP BY optimization - Mailing list pgsql-hackers

From	Andrei Lepikhov
Subject	Re: POC: GROUP BY optimization
Date	May 30, 2024 04:22:46
Msg-id	[email protected] Whole thread Raw
In response to	Re: About 0001:,Having overviewed it, I don't see any issues (but I'm the author), except grammatical ones - but I'm not a native to judge it.,Also, the sentence 'turning GROUP BY clauses into pathkeys' is unclear to me. It may be better to write something like: 'building pathkeys by the list of grouping clauses'.,,0002:,The part under USE_ASSERT_CHECKING looks good to me. But the code in group_keys_reorder_by_pathkeys looks suspicious: of course, we do some doubtful work without any possible way to reproduce, but if we envision some duplicated elements in the group_clauses, we should avoid usage of the list_concat_unique_ptr. What's more, why do you not exit from foreach_ptr immediately after SortGroupClause has been found? I think the new_group_clauses should be consistent with the new_group_pathkeys.,,0003:,Looks good,,0004:,I was also thinking about reintroducing the preprocess_groupclause because with the re-arrangement of GROUP-BY clauses according to incoming pathkeys, it d... (Alexander Korotkov <[email protected]>)
Responses	Re: POC: GROUP BY optimization
List	pgsql-hackers

Tree view

On 5/29/24 19:53, Alexander Korotkov wrote:
> Hi, Andrei!
> 
> Thank you for your feedback.
> 
> On Wed, May 29, 2024 at 11:08 AM Andrei Lepikhov
> <[email protected]> wrote:
>> On 5/27/24 19:41, Alexander Korotkov wrote:
>>> Any thoughts?
>> About 0001:
>> Having overviewed it, I don't see any issues (but I'm the author),
>> except grammatical ones - but I'm not a native to judge it.
>> Also, the sentence 'turning GROUP BY clauses  into pathkeys' is unclear
>> to me. It may be better to write something like:  'building pathkeys by
>> the list of grouping clauses'.
> 
> OK, thank you.  I'll run once again for the grammar issues.
> 
>> 0002:
>> The part under USE_ASSERT_CHECKING looks good to me. But the code in
>> group_keys_reorder_by_pathkeys looks suspicious: of course, we do some
>> doubtful work without any possible way to reproduce, but if we envision
>> some duplicated elements in the group_clauses, we should avoid usage of
>> the list_concat_unique_ptr.
> 
> As I understand Tom, there is a risk that clauses list may contain
> multiple instances of equivalent SortGroupClause, not duplicate
> pointers.
Maybe. I just implemented the worst-case scenario with the intention of 
maximum safety. So, it's up to you.
> 
>> What's more, why do you not exit from
>> foreach_ptr immediately after SortGroupClause has been found? I think
>> the new_group_clauses should be consistent with the new_group_pathkeys.
> 
> I wanted this to be consistent with preprocess_groupclause(), where
> duplicate SortGroupClause'es are grouped together.  Otherwise, we
> could just delete redundant SortGroupClause'es.
Hm, preprocess_groupclause is called before the standard_qp_callback 
forms the 'canonical form' of group_pathkeys and such behaviour needed. 
But the code that chooses the reordering strategy uses already processed 
grouping clauses, where we don't have duplicates in the first 
num_groupby_pathkeys elements, do we?
>> 0004:
>> I was also thinking about reintroducing the preprocess_groupclause
>> because with the re-arrangement of GROUP-BY clauses according to
>> incoming pathkeys, it doesn't make sense to have a user-defined order—at
>> least while cost_sort doesn't differ costs for alternative column orderings.
>> So, I'm okay with the code. But why don't you use the same approach with
>> foreach_ptr as before?
> 
> I restored the function as it was before 0452b461bc with minimal edits
> to support the incremental sort.  I think it would be more valuable to
> keep the difference with pg16 code small rather than refactor to
> simplify existing code.
Got it

-- 
regards,
Andrei Lepikhov
Postgres Professional

pgsql-hackers by date:

From: David Rowley
Date: 30 May 2024, 04:02:58
Subject: Re: Avoid an odd undefined behavior with memcmp (src/bin/pg_rewind/pg_rewind.c)

From: Peter Smith
Date: 30 May 2024, 04:48:11
Subject: Re: 001_rep_changes.pl fails due to publisher stuck on shutdown

Re: POC: GROUP BY optimization - Mailing list pgsql-hackers

Previous

Next