Home > mailing lists

Re: POC: GROUP BY optimization - Mailing list pgsql-hackers

From	Andrei Lepikhov
Subject	Re: POC: GROUP BY optimization
Date	December 28, 2023 08:22:47
Msg-id	[email protected] Whole thread Raw
In response to	Re: POC: GROUP BY optimization (Tom Lane <[email protected]>)
Responses	Re: POC: GROUP BY optimization
List	pgsql-hackers

Tree view

On 27/12/2023 12:07, Tom Lane wrote:
> Andrei Lepikhov <[email protected]> writes:
>> To be clear. In [1], I mentioned we can perform micro-benchmarks and
>> structure costs of operators. At least for fixed-length operators, it is
>> relatively easy.
> 
> I repeat what I said: this is a fool's errand.  You will not get
> trustworthy results even for the cases you measured, let alone
> all the rest.  I'd go as far as to say I would not believe your
> microbenchmarks, because they would only apply for one platform,
> compiler, backend build, phase of the moon, etc.

Thanks for the explanation.
I removed all cost-related codes. It still needs to be finished; I will 
smooth the code further and rewrite regression tests - many of them 
without cost-dependent reorderings look silly. Also, remember 
Alexander's remarks, which must be implemented, too.
But already here, it works well. Look:

Preliminaries:
CREATE TABLE t(x int, y int, z text, w int);
INSERT INTO t SELECT gs%100,gs%100, 'abc' || gs%10, gs
   FROM generate_series(1,10000) AS gs;
CREATE INDEX abc ON t(x,y);
ANALYZE t;
SET enable_hashagg = 'off';

This patch eliminates unneeded Sort operation:
explain SELECT x,y FROM t GROUP BY (x,y);
explain SELECT x,y FROM t GROUP BY (y,x);

Engages incremental sort:
explain SELECT x,y FROM t GROUP BY (x,y,z,w);
explain SELECT x,y FROM t GROUP BY (z,y,w,x);
explain SELECT x,y FROM t GROUP BY (w,z,x,y);
explain SELECT x,y FROM t GROUP BY (w,x,z,y);

Works with subqueries:
explain SELECT x,y
FROM (SELECT * FROM t ORDER BY x,y,w,z) AS q1
GROUP BY (w,x,z,y);
explain SELECT x,y
FROM (SELECT * FROM t ORDER BY x,y,w,z LIMIT 100) AS q1
GROUP BY (w,x,z,y);

But arrangement with an ORDER BY clause doesn't work:

DROP INDEX abc;
explain SELECT x,w,z FROM t GROUP BY (w,x,z) ORDER BY (x,z,w);

I think the reason is that the sort_pathkeys and group_pathkeys are 
physically different structures, and we can't just compare pointers here.

-- 
regards,
Andrei Lepikhov
Postgres Professional

Attachment

0001-Explore-alternative-orderings-of-group-by-pathkeys-d-20231228.patch

pgsql-hackers by date:

From: "Anita"
Date: 28 December 2023, 08:12:06
Subject: Pdadmin open on Macbook issue

From: shveta malik
Date: 28 December 2023, 09:28:25
Subject: Re: Track in pg_replication_slots the reason why slots conflict?

Re: POC: GROUP BY optimization - Mailing list pgsql-hackers

Attachment

Previous

Next