[PATCH] Refactor bytea_sortsupport()

Lists: pgsql-hackers
From: Aleksander Alekseev <aleksander(at)timescale(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: [PATCH] Refactor bytea_sortsupport()
Date: 2024-10-09 14:39:22
Message-ID: CAJ7c6TO3X88dGd8C4Tb-Eq2ZDPz+9mP+KOwdzK_82BEz_cMPZg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Tom (cc:'ed) recently pointed out [1] that adt/varlena.c uses common
logic for sorting string types and bytea. There are several problems
with this.

Firstly, this is difficult to reason about. For instance, you have to
keep in mind that when "C" locale is used you might be sorting not
strings but rather bytea for which it is legal to have internal NUL
bytes.

Secondly, this is error-prone. Changing logic for string types may
affect bytea logic and vice versa.

Lastly, the performance and memory consumption could be optimized for
a bytea case. The win is arguably small if noticeable at all, but
still.

Attached is a PoC patch that fixes this. There are some TODOs and
FIXMEs but all in all it works and passes the tests.

The code becomes longer but the new code is simple and it's easier to
understand. If we agree on this refactoring we could decompose
adt/varlena.c into varlena.c and bytea.c - also something proposed by
Tom. IMO it would be a good move but this is not implemented in the
patch.

Thoughts?

[1]: https://postgr.es/m/1502394.1725398354%40sss.pgh.pa.us

--
Best regards,
Aleksander Alekseev

Attachment Content-Type Size
v1-0001-Refactor-bytea_sortsupport.patch application/octet-stream 14.5 KB

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Aleksander Alekseev <aleksander(at)timescale(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: [PATCH] Refactor bytea_sortsupport()
Date: 2024-11-08 04:33:45
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 09, 2024 at 05:39:22PM +0300, Aleksander Alekseev wrote:
> Attached is a PoC patch that fixes this. There are some TODOs and
> FIXMEs but all in all it works and passes the tests.

The patch has exactly three TODOs, and it is kind of hard to figure
out what your goal here is by only looking at the patch.

> The code becomes longer but the new code is simple and it's easier to
> understand. If we agree on this refactoring we could decompose
> adt/varlena.c into varlena.c and bytea.c - also something proposed by
> Tom. IMO it would be a good move but this is not implemented in the
> patch.
>
> Thoughts?

The point is that moving all the logic related to bytea is going to
reduce the risk of potential bugs if the varstring paths are
manipulated so as they accidentally impact the former, even if it
comes at a cost of duplicating some of it.

It seems to me that the patch should be more ambitious than just
trying to move the sort support functions and structures. How about
moving anything related to bytea to a new bytea.c, including the
in/out and receive/send functions. An idea would be to split the
patch into roughly two pieces, to prove the benefits that this can
have in the long-term:
1) Extract all the bytea-related code from varlena.c into its own
file, even if it comes copy-paste some of the structures and routines
shared by bytea and varstring. You would get the file structure done
first.
2) Maximize the simplifications in the new bytea.c and the former
varlena.c. This would show how this makes the code better for one,
for the other, or for both.
--
Michael


From: Aleksander Alekseev <aleksander(at)timescale(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: [PATCH] Refactor bytea_sortsupport()
Date: 2024-12-05 08:11:55
Message-ID: CAJ7c6TN2O1eWjspccEuTeWEfO4YCRHo6VqcOWJWSu+SDZvPc-w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi Michael,

> On Wed, Oct 09, 2024 at 05:39:22PM +0300, Aleksander Alekseev wrote:
> > Attached is a PoC patch that fixes this. There are some TODOs and
> > FIXMEs but all in all it works and passes the tests.
>
> The patch has exactly three TODOs, and it is kind of hard to figure
> out what your goal here is by only looking at the patch.
>
> > The code becomes longer but the new code is simple and it's easier to
> > understand. If we agree on this refactoring we could decompose
> > adt/varlena.c into varlena.c and bytea.c - also something proposed by
> > Tom. IMO it would be a good move but this is not implemented in the
> > patch.
> >
> > Thoughts?
>
> The point is that moving all the logic related to bytea is going to
> reduce the risk of potential bugs if the varstring paths are
> manipulated so as they accidentally impact the former, even if it
> comes at a cost of duplicating some of it.
>
> It seems to me that the patch should be more ambitious than just
> trying to move the sort support functions and structures. How about
> moving anything related to bytea to a new bytea.c, including the
> in/out and receive/send functions. An idea would be to split the
> patch into roughly two pieces, to prove the benefits that this can
> have in the long-term:
> 1) Extract all the bytea-related code from varlena.c into its own
> file, even if it comes copy-paste some of the structures and routines
> shared by bytea and varstring. You would get the file structure done
> first.
> 2) Maximize the simplifications in the new bytea.c and the former
> varlena.c. This would show how this makes the code better for one,
> for the other, or for both.

Many thanks for your feedback. Unfortunately I had no opportunity to
work on this during the November commitfest, but I didn't forget about
the patch and am going to submit an updated version, probably
somewhere in January.

--
Best regards,
Aleksander Alekseev