git.postgresql.org Git - postgresql.git/log

Base LWLock limits directly on MAX_BACKENDS

Jacob reported that comments for LW_SHARED_MASK referenced a MAX_BACKENDS
limit of 2^23-1, but that MAX_BACKENDS is actually limited to 2^18-1. The
limit was lowered in 48354581a49c, but the comment in lwlock.c wasn't updated.

Instead of just fixing the comment, it seems better to directly base the
lwlock defines on MAX_BACKENDS and add static assertions to ensure that there
is enough space. That way there's no comment that can go out of sync in the
future.

As part of that change I noticed that for some reason the high bit wasn't used
for flags, which seems somewhat odd. Redefine the flag values to start at the
highest bit.

Reported-by: Jacob Brazeal <[email protected]>
Reviewed-by: Jacob Brazeal <[email protected]>
Discussion: https://postgr.es/m/CA+COZaBO_s3LfALq=b+HcBHFSOEGiApVjrRacCe4VP9m7CJsNQ@mail.gmail.com

Move MAX_BACKENDS to procnumber.h

MAX_BACKENDS influences many things besides postmaster. I e.g. noticed that we
don't have static assertions ensuring BUF_REFCOUNT_MASK is big enough for
MAX_BACKENDS, adding them would require including postmaster.h in
buf_internals.h which doesn't seem right.

While at that, add MAX_BACKENDS_BITS, as that's useful in various places for
static assertions (to be added in subsequent commits).

Reviewed-by: Thomas Munro <[email protected]>
Discussion: https://postgr.es/m/wptizm4qt6yikgm2pt52xzyv6ycmqiutloyvypvmagn7xvqkce@d4xuv3mylpg4

Silence warning in older versions of Valgrind

Due to misunderstanding on my part, commit 235328ee4 did not go far
enough to silence older versions of Valgrind. For those, it was the bit
scan that was problematic, not the subsequent bit-masking operation. To
fix, use the unaligned path for the trailing bytes. Since we don't have
a bit scan here anymore, also remove some comments and endian-specific
coding around that.

Reported-by: Anton A. Melnikov <[email protected]>
Discussion: https://postgr.es/m/f3aa2d45-3b28-41c5-9499-a1bc30e0f8ec@postgrespro.ru
Backpatch-through: 17

Remove read/sync fields from pg_stat_wal and GUC track_wal_io_timing

The four following attributes are removed from pg_stat_wal:
* wal_write
* wal_sync
* wal_write_time
* wal_sync_time

a051e71e28a1 has added an equivalent of this information in pg_stat_io
with more granularity as this now spreads across the backend types, IO
context and IO objects. So, keeping the same information in pg_stat_wal
has little benefits.

Another benefit of this commit is the removal of PendingWalStats,
simplifying an upcoming patch to add per-backend WAL statistics, which
already support IO statistics and which have access to the write/sync
stats data of WAL.

The GUC track_wal_io_timing, that was used to enable or disable the
aggregation of the write and sync timings for WAL, is also removed.
pgstat_prepare_io_time() is simplified.

Bump catalog version.
Bump PGSTAT_FILE_FORMAT_ID, due to the update of PgStat_WalStats.

Author: Bertrand Drouvot <[email protected]>
Discussion: https://postgr.es/m/Z7RkQ0EfYaqqjgz/@ip-10-97-1-34.eu-west-3.compute.internal

Ignore hash's relallvisible when checking pg_upgrade from pre-v10.

Our cross-version upgrade tests have been failing for some pre-v10
source versions since commit 1fd1bd871. This turns out to be
because relallvisible may change for tables that have hash indexes,
because the upgrade process forcibly reindexes such indexes to
deal with the changes made in v10.

Fortunately, the set of tables that have such indexes is small
and won't change anymore in those branches. So just hack up
AdjustUpgrade.pm to not compare the relallvisible values of
those specific tables.

While here, also tighten the regex that suppresses comparison
of version fields.

Discussion: https://postgr.es/m/812817.1740277228@sss.pgh.pa.us

backend libpq void * argument for binary data

Change some backend libpq functions to take void * for binary data
instead of char *. This removes the need for numerous casts.

Reviewed-by: Dagfinn Ilmari Mannsåker <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org

SnapBuildRestoreContents() void * argument for binary data

Change internal snapbuild API function to take void * for binary data
instead of char *. This removes the need for numerous casts.

Reviewed-by: Dagfinn Ilmari Mannsåker <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org

Add more tests for utility commands in pipelines

This commit checks interactions with pipelines and implicit transaction
blocks for the following commands that have their own behaviors when
used in pipelines depending on their order in a pipeline and sync
requests:
- SET LOCAL
- REINDEX CONCURRENTLY
- VACUUM
- Subtransactions (SAVEPOINT, ROLLBACK TO SAVEPOINT)

These scenarios could be tested only with pgbench previously. The
meta-commands of psql controlling pipelines make these easier to
implement, debug, and they can be run in a SQL script.

Author: Anthonin Bonnefoy <[email protected]>
Discussion: https://postgr.es/m/CAO6_XqroE7JuMEm1sWz55rp9fAYX2JwmcP_3m_v51vnOFdsLiQ@mail.gmail.com

jsonb internal API void * argument for binary data

Change some internal jsonb API functions to take void * for binary
data instead of char *. This removes the need for numerous casts.

Reviewed-by: Dagfinn Ilmari Mannsåker <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org

Documentation fixups for dumping statistics.

Reported-by: Hayato Kuroda (Fujitsu) <[email protected]>
Reported-by: Andrew Dunstan <[email protected]>
Discussion: https://postgr.es/m/OSCPR01MB149665630030E7F54FDA8B27BF5C72@OSCPR01MB14966.jpnprd01.prod.outlook.com
Discussion: https://postgr.es/m/25d26774-25fa-46f2-9888-c6a707d1fef7@dunslane.net

Change \conninfo to use tabular format

(Initially the proposal was to keep \conninfo alone and add this feature
as \conninfo+, but we decided against keeping the original.)

Also display more fields than before, though not as many as were
suggested during the discussion. In particular, we don't show 'role'
nor 'session authorization', for both which a case can probably be made.
These can be added as followup commits, if we agree to it.

Some (most?) reviewers actually reviewed rather different versions of
the patch and do not necessarily endorse the current one.

Co-authored-by: Maiquel Grassi <[email protected]>
Co-authored-by: Hunaid Sohail <[email protected]>
Reviewed-by: Nathan Bossart <[email protected]>
Reviewed-by: Peter Eisentraut <[email protected]>
Reviewed-by: Sami Imseih <[email protected]>
Reviewed-by: David G. Johnston <[email protected]>
Reviewed-by: Jim Jones <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Reviewed-by: Pavel Luzanov <[email protected]>
Reviewed-by: Dean Rasheed <[email protected]>
Reviewed-by: Erik Wienhold <[email protected]>
Discussion: https://postgr.es/m/CP8P284MB24965CB63DAC00FC0EA4A475EC462@CP8P284MB2496.BRAP284.PROD.OUTLOOK.COM

Remove unstable test suite added by 525392d57

The 'cached-plan-inval' test suite, introduced in 525392d57 under
src/test/modules/delay_execution, aimed to verify that cached plan
invalidation triggers replanning after deferred locks are taken.
However, its ExecutorStart_hook-based approach relies on lock timing
assumptions that, in retrospect, are fragile. This instability was
exposed by failures on BF animal trilobite, which builds with
CLOBBER_CACHE_ALWAYS.

One option was to dynamically disable the cache behavior that causes
the test suite to fail by setting "debug_discard_caches = 0", but it
seems better to remove the suite. The risk of future failures due to
other cache flush hazards outweighs the benefit of catching real
breakage in the backend behavior it tests.

Reported-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/2990641.1740117879@sss.pgh.pa.us

Allow lwlocks to be disowned

To implement AIO writes, the backend initiating writes needs to transfer the
lock ownership to the AIO subsystem, so the lock held during the write can be
released in another backend.

Other backends need to be able to "complete" an asynchronously started IO to
avoid deadlocks (consider e.g. one backend starting IO for a buffer and then
waiting for a heavyweight lock held by another relation followed by the
current holder of the heavyweight lock waiting for the IO to complete).

To that end, this commit adds LWLockDisown() and LWLockReleaseDisowned(). If
code uses LWLockDisown() it's the code's responsibility to ensure that the
lock is released in case of errors.

Reviewed-by: Heikki Linnakangas <[email protected]>
Discussion: https://postgr.es/m/1f6b50a7-38ef-4d87-8246-786d39f46ab9@iki.fi

Adjust EXPLAIN test case to filter out "Actual Rows" values.

Per the buildfarm, these tests appear to be unstable in the wake of
commit ddb17e387aa28d61521227377b00f997756b8a27. I'm not sure that
just hiding this output is the right way forward, because I think
there may be other test cases that will fail with lower probability
even after this fix. However, it's hard to tell right now, because
this is failing on a number of buildfarm animals. So let's try this
for now to either get a clearer picture of what else is broken, or
as a stopgap until we decide what the permanent fix should be, or
perhaps this will be the permanent fix after all.

Avoid race condition between "GRANT role" and "DROP ROLE".

Concurrently dropping either the granted role or the grantee
does not stop GRANT from completing, instead resulting in a
dangling role reference in pg_auth_members.  That's relatively
harmless in the short run, but inconsistent catalog entries
are not a good thing.

This patch solves the problem by adding the granted and grantee
roles as explicit shared dependencies of the pg_auth_members entry.
That's a bit indirect, but it works because the pg_shdepend code
applies the necessary locking and rechecking.

Commit 6566133c5 previously established similar handling for
the grantor column of pg_auth_members; it's not clear why it
didn't cover the other two role OID columns.

A side-effect of this approach is that DROP OWNED BY will now drop
pg_auth_members entries that mention the target role as either the
granted or grantee role.  That's clearly appropriate for the
grantee, since we'll drop its other privileges too.  It doesn't
seem too far out of line for the granted role, since we're
presumably about to drop it and besides we're removing all reasons
why it'd matter to be a member of it.  (One could argue that this
makes DropRole's code to auto-drop pg_auth_members entries
unnecessary, but I chose to leave it in place since perhaps some
people's workflows expect that to work without a DROP OWNED BY.)

Note to patch readers: CreateRole's first CommandCounterIncrement
call is now unconditional, because this change creates another
case in which it's needed, and it seemed to be more trouble than
it's worth to preserve that micro-optimization.

Arguably this is a bug fix, but the fact that it changes the
expected contents of pg_shdepend seems like not a great thing
to do in the stable branches, and perhaps we don't want the
change in DROP OWNED BY semantics there either.  On the other
hand, I opted not to force a catversion bump in HEAD, because
the presence or absence of these entries doesn't matter for
most purposes.

Reported-by: Virender Singla <[email protected]>
Reviewed-by: Laurenz Albe <[email protected]>
Discussion: https://postgr.es/m/CAM6Zo8woa62ZFHtMKox6a4jb8qQ=w87R2L0K8347iE-juQL2EA@mail.gmail.com

Allow EXPLAIN to indicate fractional rows.

When nloops > 1, we now display two digits after the decimal point,
rather than none. This is important because what we print is actually
planstate->instrument->ntuples / nloops, and sometimes what you want
to know is planstate->instrument->ntuples. You can estimate that by
multiplying the displayed row count by the displayed nloops value, but
the fact that the displayed value is rounded makes that inexact. It's
still inexact even if we show these two extra decimal places, but less
so. Perhaps we will agree on a way to further improve this output later,
but for now this seems better than doing nothing.

Author: Ibrar Ahmed <[email protected]>
Author: Ilia Evdokimov <[email protected]>
Reviewed-by: David G. Johnston <[email protected]>
Reviewed-by: Amit Kapila <[email protected]>
Reviewed-by: Vignesh C <[email protected]>
Reviewed-by: Greg Stark <[email protected]>
Reviewed-by: Naeem Akhter <[email protected]>
Reviewed-by: Hamid Akhtar <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Reviewed-by: Andrei Lepikhov <[email protected]>
Reviewed-by: Guillaume Lelarge <[email protected]>
Reviewed-by: Matheus Alcantara <[email protected]>
Reviewed-by: Alena Rybakina <[email protected]>
Discussion: http://postgr.es/m/603c8f070905281830g2e5419c4xad2946d149e21f9d%40mail.gmail.com

Add test 005_char_signedness.pl to meson.build.

Oversight in a8238f87f98 where the test has been added.

Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com

Fix pg_dumpall to cope with dangling OIDs in pg_auth_members.

There is a race condition between "GRANT role" and "DROP ROLE",
which allows GRANT to install pg_auth_members entries that refer to
dropped roles.  (Commit 6566133c5 prevented that for the grantor
field, but not for the granted or grantee roles.)  We'll soon fix
that, at least in HEAD, but pg_dumpall needs to cope with the
situation in case of pre-existing inconsistency.  As pg_dumpall
stands, it will emit invalid commands like 'GRANT foo TO ""',
which causes pg_upgrade to fail.  Fix it to emit warnings and skip
those GRANTs, instead.

There was some discussion of removing the problem by changing
dumpRoleMembership's query to use JOIN not LEFT JOIN, but that
would result in silently ignoring such entries.  It seems better
to produce a warning.

Pre-v16 branches already coped with dangling grantor OIDs by simply
omitting the GRANTED BY clause.  I left that behavior as-is, although
it's somewhat inconsistent with the behavior of later branches.

Reported-by: Virender Singla <[email protected]>
Discussion: https://postgr.es/m/CAM6Zo8woa62ZFHtMKox6a4jb8qQ=w87R2L0K8347iE-juQL2EA@mail.gmail.com
Backpatch-through: 13

Fix an issue with index scan using pg_trgm due to char signedness on different architectures.

GIN and GiST indexes utilizing pg_trgm's opclasses store sorted
trigrams within index tuples. When comparing and sorting each trigram,
pg_trgm treats each character as a 'char[3]' type in C. However, the
char type in C can be interpreted as either signed char or unsigned
char, depending on the platform, if the signedness is not explicitly
specified. Consequently, during replication between different CPU
architectures, there was an issue where index scans on standby servers
could not locate matching index tuples due to the differing treatment
of character signedness.

This change introduces comparison functions for trgm that explicitly
handle signed char and unsigned char. The appropriate comparison
function will be dynamically selected based on the character
signedness stored in the control file. Therefore, upgraded clusters
can utilize the indexes without rebuilding, provided the cluster
upgrade occurs on platforms with the same character signedness as the
original cluster initialization.

The default char signedness information was introduced in 44fe30fdab6,
so no backpatch.

Reviewed-by: Noah Misch <[email protected]>
Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com

pg_upgrade: Add --set-char-signedness to set the default char signedness of new cluster.

This change adds a new option --set-char-signedness to pg_upgrade. It
enables user to set arbitrary signedness during pg_upgrade. This helps
cases where user who knew they copied the v17 source cluster from
x86 (signedness=true) to ARM (signedness=false) can pg_upgrade
properly without the prerequisite of acquiring an x86 VM.

Reviewed-by: Noah Misch <[email protected]>
Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com

pg_upgrade: Preserve default char signedness value from old cluster.

Commit 44fe30fdab6 introduced the 'default_char_signedness' field in
controlfile. Newly created database clusters always set this field to
'signed'.

This change ensures that pg_upgrade updates the
'default_char_signedness' to 'unsigned' if the source database cluster
has signedness=false. For source clusters from v17 or earlier, which
lack the 'default_char_signedness' information, pg_upgrade assumes the
source cluster was initialized on the same platform where pg_upgrade
is running. It then sets the 'default_char_signedness' value according
to the current platform's default character signedness.

Reviewed-by: Noah Misch <[email protected]>
Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com

pg_resetwal: Add --char-signedness option to change the default char signedness.

With the newly added option --char-signedness, pg_resetwal updates the
default char signedness flag in the controlfile. This option is
primarily intended for an upcoming patch that pg_upgrade supports
preserving the default char signedness during upgrades, and is not
meant for manual operation.

Reviewed-by: Noah Misch <[email protected]>
Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com

Add default_char_signedness field to ControlFileData.

The signedness of the 'char' type in C is
implementation-dependent. For instance, 'signed char' is used by
default on x86 CPUs, while 'unsigned char' is used on aarch
CPUs. Previously, we accidentally let C implementation signedness
affect persistent data. This led to inconsistent results when
comparing char data across different platforms.

This commit introduces a new 'default_char_signedness' field in
ControlFileData to store the signedness of the 'char' type. While this
change does not encourage the use of 'char' without explicitly
specifying its signedness, this field can be used as a hint to ensure
consistent behavior for pre-v18 data files that store data sorted by
the 'char' type on disk (e.g., GIN and GiST indexes), especially in
cross-platform replication scenarios.

Newly created database clusters unconditionally set the default char
signedness to true. pg_upgrade (with an upcoming commit) changes this
flag for clusters if the source database cluster has
signedness=false. As a result, signedness=false setting will become
rare over time. If we had known about the problem during the last
development cycle that forced initdb (v8.3), we would have made all
clusters signed or all clusters unsigned. Making pg_upgrade the only
source of signedness=false will cause the population of database
clusters to converge toward that retrospective ideal.

Bump catalog version (for the catalog changes) and PG_CONTROL_VERSION
(for the additions in ControlFileData).

Reviewed-by: Noah Misch <[email protected]>
Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com

doc: clarify default checksum behavior in non-master branches

Also simplify and correct data checksum wording in master now that it is
the default. PG 13 did not have the awkward wording.

Reported-by: Felix <[email protected]>
Reviewed-by: Laurenz Albe
Discussion: https://postgr.es/m/173928241056.707.3989867022954178032@wrigleys.postgresql.org

Backpatch-through: 14

doc: remove non-breaking space in SGML files, causes make error

Make test portlock logic work with meson

Previously the portlock logic, added in 9b4eafcaf41, didn't actually work
properly when the tests were run via meson. 9b4eafcaf41 used the
MESON_BUILD_ROOT environment variable to determine the directory for the port
lock directory, but that's never set for running the tests. That meant that
each test used its own portlock dir, unless the PG_TEST_PORT_DIR environment
variable was set.

Fix the problem by setting top_builddir for the environment. That's also used
for the autoconf/make build.

Backpatch back to 16, where meson support was added.

Reported-by: Zharkov Roman <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Backpatch-through: 16

Fix cross-version upgrades with XMLSERIALIZE(NO INDENT)

Dumps from versions older than v16 do not know about NO INDENT in a
XMLSERIALIZE() clause.  This commit adjusts AdjustUpgrade.pm so as NO
INDENT is discarded in the contents of the new dump adjusted for
comparison when the old version is v15 or older.  This should be enough
to make the cross-version upgrade tests pass.

Per report from buildfarm member crake.  Oversight in 984410b92326.

Reviewed-by: Andrew Dunstan <[email protected]>
Discussion: https://postgr.es/m/88b183f1-ebf9-4f51-9144-3704380ccae7@dunslane.net
Backpatch-through: 16

Support text position search functions with nondeterministic collations

This allows using text position search functions with nondeterministic
collations.  These functions are

- position, strpos
- replace
- split_part
- string_to_array
- string_to_table

which all use common internal infrastructure.

There was previously no internal implementation of this, so it was met
with a not-supported error.  This adds the internal implementation and
removes the error.

Unlike with deterministic collations, the search cannot use any
byte-by-byte optimized techniques but has to go substring by
substring.  We also need to consider that the found match could have a
different length than the needle and that there could be substrings of
different length matching at a position.  In most cases, we need to
find the longest such substring (greedy semantics), but this can be
configured by each caller.

Reviewed-by: Euler Taveira <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/582b2613-0900-48ca-8b0d-340c06f4d400@eisentraut.org

doc: Add links to olsen93 and ong90 in bibliography

The bibliography entries for olsen93 and ong90 lacked links to
online copies. While ong90 is available in digital form, the
olsen93 thesis is only available as a physical copy in the UCB
library. To save people from searching for it, we still link
to it via the UCB library page.

Reported-by: jian he <[email protected]>
Discussion: https://postgr.es/m/CACJufxFcJYdRvzgt59N26XjFp2tFFUXu+VN+x8Uo0NbDUCMCbw@mail.gmail.com

Fix a WARNING for data origin discrepancies.

Previously, a WARNING was issued at the time of defining a subscription
with origin=NONE only when the publisher subscribed to the same table from
other publishers, indicating potential data origination from different
origins. However, the publisher can subscribe to the partition ancestors
or partition children of the table from other publishers, which could also
result in mixed-origin data inclusion. So, give a WARNING in those cases
as well.

Reported-by: Sergey Tatarintsev <[email protected]>
Author: Hou Zhijie <[email protected]>
Author: Shlok Kyal <[email protected]>
Reviewed-by: Vignesh C <[email protected]>
Reviewed-by: Amit Kapila <[email protected]>
Backpatch-through: 16, where it was introduced
Discussion: https://postgr.es/m/5eda6a9c-63cf-404d-8a49-8dcb116a29f3@postgrespro.ru

Add missing deparsing of [NO] IDENT to XMLSERIALIZE()

NO INDENT is the default, and is added if no explicit indentation
flag was provided with XMLSERIALIZE().

Oversight in 483bdb2afec9.

Author: Jim Jones <[email protected]>
Discussion: https://postgr.es/m/bebd457e-5b43-46b3-8fc6-f6a6509483ba@uni-muenster.de
Backpatch-through: 16

Drop opcintype from index AM strategy translation API

The type argument wasn't actually really necessary.  It was a remnant
of converting the API of the gist strategy translation from using
opclass to using opfamily+opcintype (commits c09e5a6a016,
622f678c102).  For looking up the gist translation function, we used
the convention "amproclefttype = amprocrighttype = opclass's
opcintype" (see pg_amproc.h).  But each operator family should only
have one translation function, and getting the right type for the
lookup is sometimes cumbersome and fragile, so this is all
unnecessarily complicated.

To simplify this, change the gist stategy support procedure to take
"any", "any" as argument.  (This is arbitrary but seems intuitive.
The alternative of using InvalidOid as argument(s) upsets various DDL
commands, so it's not practical.)  Then we don't need opcintype for
the lookup, and we can remove it from all the API layers introduced by
commit c09e5a6a016.

This also adds some more documentation about the correct signature of
the gist support function and adds more checks in gistvalidate().
This was previously underspecified.  (It relied implicitly on
convention mentioned above.)

Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com

backend launchers void * arguments for binary data

Change backend launcher functions to take void * for binary data
instead of char *. This removes the need for numerous casts.

Reviewed-by: Dagfinn Ilmari Mannsåker <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org

Fix for pg_restore_attribute_stats().

Use RelationGetIndexExpressions() rather than rd_indexprs directly.

Author: Corey Huinker <[email protected]>

psql: Add support for pipelines

With \bind, \parse, \bind_named and \close, it is possible to issue
queries from psql using the extended protocol.  However, it was not
possible to send these queries using libpq's pipeline mode.  This
feature has two advantages:
- Testing.  Pipeline tests were only possible with pgbench, using TAP
tests.  It now becomes possible to have more SQL tests that are able to
stress the backend with pipelines and extended queries.  More tests will
be added in a follow-up commit that were discussed on some other
threads.  Some external projects in the community had to implement their
own facility to work around this limitation.
- Emulation of custom workloads, with more control over the actions
taken by a client with libpq APIs.  It is possible to emulate more
workload patterns to bottleneck the backend with the extended query
protocol.

This patch adds six new meta-commands to be able to control pipelines:
* \startpipeline starts a new pipeline.  All extended queries are queued
until the end of the pipeline are reached or a sync request is sent and
processed.
* \endpipeline ends an existing pipeline.  All queued commands are sent
to the server and all responses are processed by psql.
* \syncpipeline queues a synchronisation request, without flushing the
commands to the server, equivalent of PQsendPipelineSync().
* \flush, equivalent of PQflush().
* \flushrequest, equivalent of PQsendFlushRequest()
* \getresults reads the server's results for the queries in a pipeline.
Unsent data is automatically pushed when \getresults is called.  It is
possible to control the number of results read in a single meta-command
execution with an optional parameter, 0 means that all the results
should be read.

Author: Anthonin Bonnefoy <[email protected]>
Reviewed-by: Jelte Fennema-Nio <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/CAO6_XqroE7JuMEm1sWz55rp9fAYX2JwmcP_3m_v51vnOFdsLiQ@mail.gmail.com

Add braces for if block with large comment in psql's common.c

A patch touching this area of the code is under review, and this format
makes the readability of the code slightly harder to parse.

Extracted from a larger patch by the same author.

Author: Anthonin Bonnefoy <[email protected]>
Discussion: https://postgr.es/m/CAO6_XqroE7JuMEm1sWz55rp9fAYX2JwmcP_3m_v51vnOFdsLiQ@mail.gmail.com

Add missing entry to oauth_validator test .gitignore

Commit b3f0be788 accidentally missed adding the oauth client test
binary to the relevant .gitignore.

Reported-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/2839306.1740082041@sss.pgh.pa.us

Remove various unnecessary (char *) casts

Remove a number of (char *) casts that are unnecessary. Or in some
cases, rewrite the code to make the purpose of the cast clearer.

Reviewed-by: Dagfinn Ilmari Mannsåker <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org

Trial fix for old cross-version upgrades.

Per buildfarm and reports, it seems that 9.X to 18 upgrades were
failing after commit 1fd1bd8710 due to an incorrect regex. Loosen the
regex to accommodate older versions.

Reported-by: vignesh C <[email protected]>
Reported-by: Andrew Dunstan <[email protected]>
Discussion: https://postgr.es/m/CALDaNm3GUs+U8Nt4S=V5zmb+K8-RfAc03vRENS0teeoq0Lc6Tw@mail.gmail.com
Discussion: https://postgr.es/m/ea4cbbc1-c5a5-43d1-9618-8ff3f2155bfe@dunslane.net

Ignore blank lines in pgindent exclude files

Currently a blank line matches everything, which is almost never what
someone would want. If they really want that they can use a wildcard
regex to do it.

Author: Zsolt Parragi <[email protected]>

Discussion: https://postgr.es/m/CAN4CZFNka+2q3=-Dithr4w65RJfwPaV92T62spEzLn+T4MgcMg@mail.gmail.com

cirrus: Temporarily fix libcurl link error

On FreeBSD the ftp/curl port appears to be missing a minimum
version dependency on libssh2, so the following starts showing
up after upgrading to curl 8.11.1_1:

libcurl.so.4: Undefined symbol "libssh2_session_callback_set2"

Awaiting an upgrade of the FreeBSD CI images to version 14, work
around the issue.

Author: Jacob Champion <[email protected]>
Reviewed-by: Daniel Gustafsson <[email protected]>
Discussion: https://postgr.es/m/CAOYmi+kZAka0sdxCOBxsQc2ozEZGZKHWU_9nrPXg3sG1NJ-zJw@mail.gmail.com

Add support for OAUTHBEARER SASL mechanism

This commit implements OAUTHBEARER, RFC 7628, and OAuth 2.0 Device
Authorization Grants, RFC 8628.  In order to use this there is a
new pg_hba auth method called oauth.  When speaking to a OAuth-
enabled server, it looks a bit like this:

  $ psql 'host=example.org oauth_issuer=... oauth_client_id=...'
  Visit https://oauth.example.org/login and enter the code: FPQ2-M4BG

Device authorization is currently the only supported flow so the
OAuth issuer must support that in order for users to authenticate.
Third-party clients may however extend this and provide their own
flows.  The built-in device authorization flow is currently not
supported on Windows.

In order for validation to happen server side a new framework for
plugging in OAuth validation modules is added.  As validation is
implementation specific, with no default specified in the standard,
PostgreSQL does not ship with one built-in.  Each pg_hba entry can
specify a specific validator or be left blank for the validator
installed as default.

This adds a requirement on libcurl for the client side support,
which is optional to build, but the server side has no additional
build requirements.  In order to run the tests, Python is required
as this adds a https server written in Python.  Tests are gated
behind PG_TEST_EXTRA as they open ports.

This patch has been a multi-year project with many contributors
involved with reviews and in-depth discussions:  Michael Paquier,
Heikki Linnakangas, Zhihong Yu, Mahendrakar Srinivasarao, Andrey
Chudnovsky and Stephen Frost to name a few.  While Jacob Champion
is the main author there have been some levels of hacking by others.
Daniel Gustafsson contributed the validation module and various bits
and pieces; Thomas Munro wrote the client side support for kqueue.

Author: Jacob Champion <[email protected]>
Co-authored-by: Daniel Gustafsson <[email protected]>
Co-authored-by: Thomas Munro <[email protected]>
Reviewed-by: Daniel Gustafsson <[email protected]>
Reviewed-by: Peter Eisentraut <[email protected]>
Reviewed-by: Antonin Houska <[email protected]>
Reviewed-by: Kashif Zeeshan <[email protected]>
Discussion: https://postgr.es/m/d1b467a78e0e36ed85a09adf979d04cf124a9d4b [email protected]

Transfer statistics during pg_upgrade.

Add support to pg_dump for dumping stats, and use that during
pg_upgrade so that statistics are transferred during upgrade. In most
cases this removes the need for a costly re-analyze after upgrade.

Some statistics are not transferred, such as extended statistics or
statistics with a custom stakind.

Now pg_dump accepts the options --schema-only, --no-schema,
--data-only, --no-data, --statistics-only, and --no-statistics; which
allow all combinations of schema, data, and/or stats. The options are
named this way to preserve compatibility with the previous
--schema-only and --data-only options.

Statistics are in SECTION_DATA, unless the object itself is in
SECTION_POST_DATA.

The stats are represented as calls to pg_restore_relation_stats() and
pg_restore_attribute_stats().

Author: Corey Huinker, Jeff Davis
Reviewed-by: Jian He
Discussion: https://postgr.es/m/CADkLM=fzX7QX6r78fShWDjNN3Vcr4PVAnvXxQ4DiGy6V=0bCUA@mail.gmail.com
Discussion: https://postgr.es/m/CADkLM%3DcB0rF3p_FuWRTMSV0983ihTRpsH%2BOCpNyiqE7Wk0vUWA%40mail.gmail.com

Improve errdetail message added by ac0e33136a.

Make it consistent with other similar messages.

Author: Kyotaro Horiguchi <[email protected]>
Reviewed-by: Hayato Kuroda <[email protected]>
Discussion: https://postgr.es/m/20250220.140839.1444694904721968348 [email protected]

Don't lock partitions pruned by initial pruning

Before executing a cached generic plan, AcquireExecutorLocks() in
plancache.c locks all relations in a plan's range table to ensure the
plan is safe for execution. However, this locks runtime-prunable
relations that will later be pruned during "initial" runtime pruning,
introducing unnecessary overhead.

This commit defers locking for such relations to executor startup and
ensures that if the CachedPlan is invalidated due to concurrent DDL
during this window, replanning is triggered. Deferring these locks
avoids unnecessary locking overhead for pruned partitions, resulting
in significant speedup, particularly when many partitions are pruned
during initial runtime pruning.

* Changes to locking when executing generic plans:

AcquireExecutorLocks() now locks only unprunable relations, that is,
those found in PlannedStmt.unprunableRelids (introduced in commit
cbc127917e), to avoid locking runtime-prunable partitions
unnecessarily.  The remaining locks are taken by
ExecDoInitialPruning(), which acquires them only for partitions that
survive pruning.

This deferral does not affect the locks required for permission
checking in InitPlan(), which takes place before initial pruning.
ExecCheckPermissions() now includes an Assert to verify that all
relations undergoing permission checks, none of which can be in the
set of runtime-prunable relations, are properly locked.

* Plan invalidation handling:

Deferring locks introduces a window where prunable relations may be
altered by concurrent DDL, invalidating the plan. A new function,
ExecutorStartCachedPlan(), wraps ExecutorStart() to detect and handle
invalidation caused by deferred locking. If invalidation occurs,
ExecutorStartCachedPlan() updates CachedPlan using the new
UpdateCachedPlan() function and retries execution with the updated
plan. To ensure all code paths that may be affected by this handle
invalidation properly, all callers of ExecutorStart that may execute a
PlannedStmt from a CachedPlan have been updated to use
ExecutorStartCachedPlan() instead.

UpdateCachedPlan() replaces stale plans in CachedPlan.stmt_list. A new
CachedPlan.stmt_context, created as a child of CachedPlan.context,
allows freeing old PlannedStmts while preserving the CachedPlan
structure and its statement list. This ensures that loops over
statements in upstream callers of ExecutorStartCachedPlan() remain
intact.

ExecutorStart() and ExecutorStart_hook implementations now return a
boolean value indicating whether plan initialization succeeded with a
valid PlanState tree in QueryDesc.planstate, or false otherwise, in
which case QueryDesc.planstate is NULL. Hook implementations are
required to call standard_ExecutorStart() at the beginning, and if it
returns false, they should do the same without proceeding.

* Testing:

To verify these changes, the delay_execution module tests scenarios
where cached plans become invalid due to changes in prunable relations
after deferred locks.

* Note to extension authors:

ExecutorStart_hook implementations must verify plan validity after
calling standard_ExecutorStart(), as explained earlier. For example:

    if (prev_ExecutorStart)
        plan_valid = prev_ExecutorStart(queryDesc, eflags);
    else
        plan_valid = standard_ExecutorStart(queryDesc, eflags);

    if (!plan_valid)
        return false;

    <extension-code>

    return true;

Extensions accessing child relations, especially prunable partitions,
via ExecGetRangeTableRelation() must now ensure their RT indexes are
present in es_unpruned_relids (introduced in commit cbc127917e), or
they will encounter an error. This is a strict requirement after this
change, as only relations in that set are locked.

The idea of deferring some locks to executor startup, allowing locks
for prunable partitions to be skipped, was first proposed by Tom Lane.

Reviewed-by: Robert Haas <[email protected]> (earlier versions)
Reviewed-by: David Rowley <[email protected]> (earlier versions)
Reviewed-by: Tom Lane <[email protected]> (earlier versions)
Reviewed-by: Tomas Vondra <[email protected]>
Reviewed-by: Junwang Zhao <[email protected]>
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com

Include schema/table publications even with exclude options in dump.

The current implementation inconsistently includes public schema but not
information_schema when those are specified in FOR TABLES IN SCHMEA ...
Apart from that, the current behavior for publications w.r.t exclude table
and schema (--exclude-table, --exclude-schema) option differs from what we
do at other places. We try to avoid including publications for
corresponding tables or schemas when an exclude-table or exclude-schema
option is given, unlike what we do for views using functions defined in a
particular schema or a subscription pointing to publications with their
corresponding exclude options.

I decided not to backpatch this as it leads to a behavior change and we don't
see any field report for current behavior.

Reported-by: Tom Lane <[email protected]>
Author: Vignesh C <[email protected]>
Reviewed-by: Amit Kapila <[email protected]>
Discussion: https://postgr.es/m/1270733.1734134272@sss.pgh.pa.us

doc: Fix typo in section "WAL configuration"

pg_stat_io has an attribute named fsync_time, not sync_time.

Oversight in 2f70871c2bc1.

Discussion: https://postgr.es/m/Z7RkQ0EfYaqqjgz/@ip-10-97-1-34.eu-west-3.compute.internal

doc: Add details about object "wal" in pg_stat_io

This commit adds a short description of what kind of activity is tracked
in pg_stat_io for the object "wal", with a link pointing to the section
"WAL configuration" that has a lot of details on the matter.

This should perhaps have been added in a051e71e28a1, but things are what
they are.

Author: Bertrand Drouvot <[email protected]>
Discussion: https://postgr.es/m/Z7RkQ0EfYaqqjgz/@ip-10-97-1-34.eu-west-3.compute.internal

doc: Recommend pg_stat_io rather than pg_stat_wal in WAL configuration

Since a051e71e28a1, pg_stat_io is able to track statistics for the WAL
activity, providing an equivalent of pg_stat_wal with more granularity
for the fsyncs/writes counts and timings, as the data is split across
backend types.

This commit now recommends pg_stat_io rather than pg_stat_wal in the
section "WAL configuration", some of the latter's attributes being
candidate for removal in a follow-up commit.

Extracted from a larger patch by the same author.

Author: Bertrand Drouvot <[email protected]>
Discussion: https://postgr.es/m/Z7RkQ0EfYaqqjgz/@ip-10-97-1-34.eu-west-3.compute.internal

Fix FATAL message for invalid recovery timeline at beginning of recovery

If the requested recovery timeline is not reachable, the logged
checkpoint and timeline should to be the values read from the
backup_label when it is defined. The message generated used the values
from the control file in this case, which is fine when recovering from
the control file without a backup_label, but not if there is a
backup_label.

Issue introduced in ee994272ca50. v15 has introduced xlogrecovery.c and
more simplifications in this area (4a92a1c3d1c3, a27048cbcb58), making
this change a bit simpler to think about, so backpatch only down to this
version.

Author: David Steele <[email protected]>
Reviewed-by: Andrey M. Borodin <[email protected]>
Reviewed-by: Benoit Lobréau <[email protected]>
Discussion: https://postgr.es/m/c3d617d4-1696-4aa7-8a4d-5a7d19cc5618@pgbackrest.org
Backpatch-through: 15

pgbench: Increase RLIMIT_NOFILE if necessary

pgbench already had code to check if the soft rlimit is too low for the
specified number of connections. If too low, it errored out, telling the user
to increase the limit.

However, we can do better: If the hard limit allows, increase the soft limit
to be sufficiently for the number of connections.

It is common for the soft limit to be considerably lower than the hard limit,
due to the danger of soft limits > 1024 breaking programs that use the
select(2), as explained in [1].

[1]: https://0pointer.net/blog/file-descriptor-limits.html

Author: Jelte Fennema-Nio <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/CAGECzQQh6VSy3KG4pN1d%3Dh9J%3DD1rStFCMR%2Bt7yh_Kwj-g87aLQ%40mail.gmail.com

test_escape: Fix output of --help

The short option name -f was not listed, only its long option name
--force-unsupported.

Author: Japin Li
Discussion: https://postgr.es/m/ME0P300MB04452BD1FB1B277D4C1C20B9B6C52@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
Backpatch-through: 13

Correct relation size estimate with low fillfactor

Since commit 29cf61ade3, table_block_relation_estimate_size() considers
fillfactor when estimating number of rows in a relation before the first
ANALYZE. The formula however did not consider tuples may be larger than
available space determined by fillfactor, ending with density 0. This
ultimately means the relation was estimated to contain a single row.

The executor however places at least one tuple per page, even with very
low fillfactor values, so the density should be at least 1. Fixed by
clamping the density estimate using clamp_row_est().

Reported by Heikki Linnakangas. Fix by me, with regression test inspired
by example provided by Heikki.

Backpatch to 17, where the issue was introduced.

Reported-by: Heikki Linnakangas
Backpatch-through: 17
Discussion: https://postgr.es/m/2bf9d973-7789-4937-a7ca-0af9fb49c71e@iki.fi

Assert that ExecOpenIndices and ExecCloseIndices are not repeated.

These functions should be called at most once per ResultRelInfo;
it's wasteful to do otherwise, and certainly the pattern of
opening twice and then closing twice is a bad idea.  Moreover,
aminsertcleanup functions might not be prepared to be called twice,
as the just-hardened code in BRIN demonstrates.

This amounts to an API change, since such coding patterns were
safe even if wasteful before v17.  Hence, apply to HEAD only.
(Extension code violating this new rule faces some risk in v17,
but we just fixed brininsertcleanup and there are probably few
other aminsertcleanup functions as yet.  So the odds of breaking
usable code seem higher than the odds of doing something useful
with a back-patch.)

Bug: #18815
Reported-by: Sergey Belyashov <[email protected]>
Discussion: https://postgr.es/m/18815-2a0407cc7f40b327@postgresql.org

Fix crash in brininsertcleanup during logical replication.

Logical replication crashes if the subscriber's partitioned table
has a BRIN index.  There are two independently blamable causes,
and this patch fixes both:

1. brininsertcleanup fails if called twice for the same IndexInfo,
because it half-destroys its BrinInsertState but leaves it still
linked from ii_AmCache.  brininsert would also fail in that state,
so it's pretty hard to see any advantage to this coding.  Fully
remove the BrinInsertState, instead, so that a new brininsert
call would create a new cache.

2. A logical replication subscriber sometimes does ExecOpenIndices
twice on the same ResultRelInfo, followed by doing ExecCloseIndices
twice; the second call reaches the brininsertcleanup bug.  Quite
aside from tickling unexpected cases in aminsertcleanup methods,
this seems very wasteful, because the IndexInfos built in the
first ExecOpenIndices call are just lost during the second call,
and have to be rebuilt at possibly-nontrivial cost.  We should
establish a coding rule that you don't do that.

The problematic coding is that when the target table is partitioned,
apply_handle_tuple_routing calls ExecFindPartition which does
ExecOpenIndices (and expects that ExecCleanupTupleRouting will
close the indexes again).  Using the ResultRelInfo made by
ExecFindPartition, it calls apply_handle_delete_internal or
apply_handle_insert_internal, both of which think they need to do
ExecOpenIndices/ExecCloseIndices for themselves.  They do in the main
non-partitioned code paths, but not here.  The simplest fix is to pull
their ExecOpenIndices/ExecCloseIndices calls out and put them in the
call sites for the non-partitioned cases.  (We could have refactored
apply_handle_update_internal similarly, but I did not do so today
because there's no bug there: the partitioned code path doesn't
call it.)

Also, remove the always-duplicative open/close calls within
apply_handle_tuple_routing itself.

Since brininsertcleanup and indeed the whole aminsertcleanup mechanism
are new in v17, there's no observable bug in older branches.  A case
could be made for trying to avoid these duplicative open/close calls
in the older branches, but for now it seems not worth the trouble and
risk of new bugs.

Bug: #18815
Reported-by: Sergey Belyashov <[email protected]>
Discussion: https://postgr.es/m/18815-2a0407cc7f40b327@postgresql.org
Backpatch-through: 17

Consider BufFiles when adjusting hashjoin parameters

Until now ExecChooseHashTableSize() considered only the size of the
in-memory hash table, and ignored the memory needed for the batch files.
Which can be a significant amount, because each batch needs two BufFiles
(each with a BLCKSZ buffer). The same issue applies to increasing the
number of batches during execution.

It's also possible to trigger a "batch explosion", e.g. due to duplicate
values or skew. We've seen reports of joins with hundreds of thousands
(or even millions) of batches, consuming gigabytes of memory, triggering
OOM errors. These cases may be fairly rare, but it's clearly possible to
hit them.

These issues can't be prevented during planning. Even if we improve
that, it does not help with execution-time batch explosion. We can
however reduce the impact and use as little memory as possible.

This patch improves the behavior by adjusting how the memory is divided
between the hash table and batch files. It may be better to use fewer
batch files, even if it means the hash table will exceed the limit.

The capacity of the hash node may be increased either by doubling he
number of batches, or doubling the size of the in-memory hash table. The
outcome is the same, but the memory usage may be very different. For low
nbatch values it's better to add batches, for high nbatch values it's
better to allow a larger hash table.

The patch considers both options, both during the initial sizing and
then during execution, to minimize how much the limit gets exceeded.

It might seem this patch is relaxing the memory limit - allowing it to
be exceeded. But that's not really the case. It has always been like
that, except the memory used by batches was ignored.

Allowing the hash table to grow may also prevent the batch explosion.
If there's a large batch that can't be split (due to hash collisions or
duplicate values), at some point the memory limit will increase enough
for the batch to fit into the hash table.

This patch was in the works for a long time. The early versions were
posted in 2019, and revived every year or two when we happened to get
the next report of OOM due to a hashjoin batch explosion. Each of those
patch versions were reviewed by a couple people. I'm mentioning only
Melanie Plageman and Robert Haas, because they reviewed the last
version, and the older patches are very different.

Reviewed-by: Melanie Plageman, Robert Haas
Discussion: https://postgr.es/m/7bed6c08-72a0-4ab9-a79c-e01fcdd0940f@vondra.me
Discussion: https://postgr.es/m/20190504003414.bulcbnge3rhwhcsh%40development
Discussion: https://postgr.es/m/20190428141901.5dsbge2ka3rxmpk6%40development

tests: BackgroundPsql: Fix potential for lost errors on windows

This addresses various corner cases in BackgroundPsql:

- On windows stdout and stderr may arrive out of order, leading to errors not
  being reported, or attributed to the wrong statement.

  To fix, emit the "query-separation banner" on both stdout and stderr and
  wait for both.

- Very occasionally the "query-separation banner" would not get removed, because
  we waited until the banner arrived, but then replaced the banner plus
  newline.

  To fix, wait for banner and newline.

- For interactive psql replacing $banner\n is not sufficient, interactive psql
  outputs \r\n.

- For interactive psql, where commands are echoed to stdout, the \echo
  command, rather than its output, would be matched.

  This would sometimes lead to output from the prior query, or wait_connect(),
  being returned in the next command.

  This also affected wait_connect(), leading to sometimes sending queries to
  psql before the connection actually was established.

While debugging these issues I also found that it's hard to know whether a
query separation banner was attributed to the right query. Make that easier by
counting the queries each BackgroundPsql instance has emitted and include the
number in the banner.

Also emit psql stdout/stderr in query() and wait_connect() as Test::More
notes, without that it's rather hard to debug some issues in CI and buildfarm.

As this can cause issues not just to-be-added tests, but also existing ones,
backpatch the fix to all supported versions.

Reviewed-by: Daniel Gustafsson <[email protected]>
Reviewed-by: Noah Misch <[email protected]>
Discussion: https://postgr.es/m/wmovm6xcbwh7twdtymxuboaoarbvwj2haasd3sikzlb3dkgz76@n45rzycluzft
Backpatch-through: 13

Add ATAlterConstraint struct for ALTER .. CONSTRAINT

Replace the use of Constraint with a new ATAlterConstraint struct, which
allows us to pass additional information.  No functionality is added by
this commit.  This is necessary for future work that allows altering
constraints in other ways.

I (Álvaro) took the liberty of restructuring the code for ALTER
CONSTRAINT beyond what Amul did.  The original coding before Amul's
patch was unnecessarily baroque, and this change makes things simpler
by removing one level of subroutine.  Also, partly remove the assumption
that only partitioned tables are relevant (by passing sensible 'recurse'
arguments) and no longer ignore whether ONLY was specified.  I say
'partly' because the current coding only walks down via the 'conparentid'
relationship, which is only used for partitioned tables; but future
patches could handle ONLY or not for other types of constraint changes
for legacy inheritance trees too.

Author: Amul Sul <[email protected]>
Author: Álvaro Herrera <[email protected]>
Discussion: https://postgr.es/m/CAAJ_b94bfgPV-8Mw_HwSBeheVwaK9=5s+7+KbBj_NpwXQFgDGg@mail.gmail.com

Improve statistics estimation for single-column GROUP BY in sub-queries

This commit follows the idea of the 4767bc8ff2. If sub-query has only one
GROUP BY column, we can consider its output variable as being unique. We can
employ this fact in the statistics to make more precise estimations in the
upper query block.

Author: Andrei Lepikhov <[email protected]>
Reviewed-by: Heikki Linnakangas <[email protected]>
Reviewed-by: Alexander Korotkov <[email protected]>

Add a test for commit ac0e33136a using the injection point.

This test uses an injection point to bypass the time overhead caused by
the idle_replication_slot_timeout GUC, which has a minimum value of one
minute.

Author: Hayato Kuroda <[email protected]>
Author: Nisha Moond <[email protected]>
Reviewed-by: Peter Smith <[email protected]>
Reviewed-by: Vignesh C <[email protected]>
Reviewed-by: Amit Kapila <[email protected]>
Discussion: https://postgr.es/m/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com

Add support for LIKE in CREATE FOREIGN TABLE

LIKE enables the creation of foreign tables based on the column
definitions, constraints and objects of the defined source relation(s).

This feature mirrors the behavior of CREATE TABLE LIKE, but ignores
the INCLUDING sub-options that do not make sense for foreign tables:
INDEXES, COMPRESSION, IDENTITY and STORAGE. The supported sub-options
are COMMENTS, CONSTRAINTS, DEFAULTS, GENERATED and STATISTICS, mapping
with the clauses already supported by the command.

Note that the restriction with LIKE in CREATE FOREIGN TABLE was added in
a0c6dfeecfcc.

Author: Zhang Mingli
Reviewed-by: Álvaro Herrera, Sami Imseih, Michael Paquier
Discussion: https://postgr.es/m/42d3f855-2275-4361-a42a-826172ca2dc4@Spark

doc: Fix some issues with JSON_TABLE() examples

1. Remove an unused PASSING variable.

2. Adjust formatting of JSON data used in an example to be valid
under strict mode

Reported-by: Miłosz Chmura <[email protected]>
Author: Robert Treat <[email protected]>
Discussion: https://postgr.es/m/173859550337.1071.4748984213168572913@wrigleys.postgresql.org

Invalidate inactive replication slots.

This commit introduces idle_replication_slot_timeout GUC that allows
inactive slots to be invalidated at the time of checkpoint. Because
checkpoints happen checkpoint_timeout intervals, there can be some lag
between when the idle_replication_slot_timeout was exceeded and when the
slot invalidation is triggered at the next checkpoint. To avoid such lags,
users can force a checkpoint to promptly invalidate inactive slots.

Note that the idle timeout invalidation mechanism is not applicable for
slots that do not reserve WAL or for slots on the standby server that are
synced from the primary server (i.e., standby slots having 'synced' field
'true'). Synced slots are always considered to be inactive because they
don't perform logical decoding to produce changes.

The slots can become inactive for a long period if a subscriber is down
due to a system error or inaccessible because of network issues. If such a
situation persists, it might be more practical to recreate the subscriber
rather than attempt to recover the node and wait for it to catch up which
could be time-consuming.

Then, external tools could create replication slots (e.g., for migrations
or upgrades) that may fail to remove them if an error occurs, leaving
behind unused slots that take up space and resources. Manually cleaning
them up can be tedious and error-prone, and without intervention, these
lingering slots can cause unnecessary WAL retention and system bloat.

As the duration of idle_replication_slot_timeout is in minutes, any test
using that would be time-consuming. We are planning to commit a follow up
patch for tests by using the injection point framework.

Author: Nisha Moond <[email protected]>
Author: Bharath Rupireddy <[email protected]>
Reviewed-by: Peter Smith <[email protected]>
Reviewed-by: Hayato Kuroda <[email protected]>
Reviewed-by: Vignesh C <[email protected]>
Reviewed-by: Amit Kapila <[email protected]>
Reviewed-by: Hou Zhijie <[email protected]>
Reviewed-by: Bertrand Drouvot <[email protected]>
Discussion: https://postgr.es/m/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com
Discussion: https://postgr.es/m/OS0PR01MB5716C131A7D80DAE8CB9E88794FC2@OS0PR01MB5716.jpnprd01.prod.outlook.com

Update to latest Snowball sources.

It's been some time since we did this, partly because the upstream
snowball project hasn't formally tagged a new release since 2021.
The main motivation for doing it now is to absorb a bug fix
(their commit e322673a841d9abd69994ae8cd20e191090b6ef4), which
prevents a null pointer dereference crash if SN_create_env() gets
a malloc failure at just the wrong point.  We'll patch the back
branches with only that change, but we might as well do the full
sync dance on HEAD.

Aside from a bunch of mostly-minor tweaks to existing stemmers, this
update adds a new stemmer for Estonian.  It also removes the existing
stemmer for Romanian using ISO-8859-2 encoding.  Upstream apparently
concluded that ISO-8859-2 doesn't provide an adequate representation
of some Romanian characters, and the UTF-8 implementation should be
used instead.

While at it, update the README's instructions for doing a sync,
which have not been adjusted during the addition of meson tooling.

Thanks to Maksim Korotkov for discovering the null-pointer
bug and submitting the fix to upstream snowball.

Reported-by: Maksim Korotkov <[email protected]>
Discussion: https://postgr.es/m/1d1a46-67ab1000-21-80c451@83151435

Fix unsafe access to BufferDescriptors

When considering a local buffer, the GetBufferDescriptor() call in
BufferGetLSNAtomic() would be retrieving a shared buffer with a bad
buffer ID. Since the code checks whether the buffer is shared before
using the retrieved BufferDesc, this issue did not lead to any
malfunction. Nonetheless this seems like trouble waiting to happen,
so fix it by ensuring that GetBufferDescriptor() is only called when
we know the buffer is shared.

Author: Tender Wang <[email protected]>
Reviewed-by: Xuneng Zhou <[email protected]>
Reviewed-by: Richard Guo <[email protected]>
Discussion: https://postgr.es/m/CAHewXNku-o46-9cmUgyv6LkSZ25doDrWq32p=oz9kfD8ovVJMg@mail.gmail.com
Backpatch-through: 13

Fix freeing a child join's SpecialJoinInfo

In try_partitionwise_join, we try to break down the join between two
partitioned relations into joins between matching partitions.  To
achieve this, we iterate through each pair of partitions from the two
joining relations and create child join relations for them.  To reduce
memory accumulation during each iteration, one step we take is freeing
the SpecialJoinInfos created for the child joins.

A child join's SpecialJoinInfo is a copy of the parent join's
SpecialJoinInfo, with some members being translated copies of their
counterparts in the parent.  However, when freeing the bitmapset
members in a child join's SpecialJoinInfo, we failed to check whether
they were translated copies.  As a result, we inadvertently freed the
members that were still in use by the parent SpecialJoinInfo, leading
to crashes when those freed members were accessed.

To fix, check if each member of the child join's SpecialJoinInfo is a
translated copy and free it only if that's the case.  This requires
passing the parent join's SpecialJoinInfo as a parameter to
free_child_join_sjinfo.

Back-patch to v17 where this bug crept in.

Bug: #18806
Reported-by: 孟令彬 <[email protected]>
Diagnosed-by: Tender Wang <[email protected]>
Author: Richard Guo <[email protected]>
Reviewed-by: Amit Langote <[email protected]>
Reviewed-by: Ashutosh Bapat <[email protected]>
Discussion: https://postgr.es/m/18806-d70b0c9fdf63dcbf@postgresql.org
Backpatch-through: 17

test_escape: Fix handling of short options in getopt_long()

This addresses two errors in the module, based on the set of options
supported:
- '-c', for --conninfo, was not listed.
- '-f', for --force-unsupported, was not listed.

While on it, these are now listed in an alphabetical order.

Author: Japin Li
Discussion: https://postgr.es/m/ME0P300MB04451FB20CE0346A59C25CADB6FA2@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
Backpatch-through: 13

Make the description of some GUCs more consistent

This commit improves the description of a couple of GUCs, to be more
consistent with the style of their surroundings:
* array_nulls
* enable_self_join_elimination
* optimize_bounded_sort
* row_security
* synchronize_seqscans

Author: Kyotaro Horiguchi
Discussion: https://postgr.es/m/20250218.103240.1422205966404509831 [email protected]

doc: add example of sign mismatch with POSIX/ISO-8601 time zones

Author: Laurenz Albe

Discussion: https://postgr.es/m/eb4d1e15c6822c1937be1491118500dd9201492f [email protected]

Update outdated comments in nodeAgg.c.

Author: Zhang Mingli
Reviewed-by: Richard Guo
Discussion: https://postgr.es/m/198a8d1e-0792-4e7f-828e-902aa342f36e@Spark

Reduce scope of heap vacuum per_buffer_data

Move lazy_scan_heap()'s per_buffer_data variable into a tighter scope.
In lazy_scan_heap()'s phase I heap vacuuming, the read stream API
returns a pointer to the next block number to vacuum. As long as
read_stream_next_buffer() returns a valid buffer, per_buffer_data should
always be valid.

Move per_buffer_data into a tighter scope and make sure it is reset to
NULL on each iteration so that we get a core dump instead of bogus data
from a previous block if something goes wrong in the read stream API.

Suggested-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/626104.1739729538%40sss.pgh.pa.us

Add PGErrorVerbosity to typedefs.list

PGErrorVerbosity was missing which resulted in incorrect whitespace
alignment going back all the way to e3860ffa4dd0. No backpatch for
this though since we don't pgindent backbranches.

Author: Jelte Fennema-Nio <[email protected]>
Reviewed-by: Daniel Gustafsson <[email protected]>
Discussion: https://postgr.es/m/CAGECzQTVi8n-HW4Q27je-b9ckQk7zf6bS_it42gNvQu+DX0NCQ@mail.gmail.com

Fix poorly written regression test

bd10ec529 added code to allow redundant functionally dependent GROUP BY
columns to be removed using unique indexes and NOT NULL constraints as
proofs of functional dependency. In that commit, I (David) added a test
to ensure that when there are multiple indexes available to remove columns
that we pick the index that allows us to remove the most columns. This
test was faulty as it assumed the t3 table's primary key index was valid
to use as functional dependency proof, but that's not the case since
that's defined as deferrable.

Here we adjust the tests added by that commit to use the t2 table instead.
That's defined with a non-deferrable primary key.

Author: songjinzhou <[email protected]>
Author: David Rowley <[email protected]>
Reviewed-by: Japin Li <[email protected]>
Discussion: https://postgr.es/m/[email protected]

Raise a WARNING for max_slot_wal_keep_size in pg_createsubscriber.

During the pg_createsubscriber execution, it is possible that the required
WAL is removed from the primary/publisher node due to
'max_slot_wal_keep_size'.

This patch raises a WARNING during the '--dry-run' mode if the
'max_slot_wal_keep_size' is set to a non-default value on the
primary/publisher node.

Author: Shubham Khanna <[email protected]>
Reviewed-by: Peter Smith <[email protected]>
Reviewed-by: Hayato Kuroda <[email protected]>
Reviewed-by: Vignesh C <[email protected]>
Discussion: https://postgr.es/m/CAHv8Rj+deqsQXOMa7Tck8CBQUbsua=+4AuMVQ2=MPM0f-ZHbjA@mail.gmail.com

Specialize intarray sorting

There is at least one report in the field of storing millions of
integers in arrays, so it seems like a good time to specialize
intarray's qsort function. In doing so, streamline the comparators:
Previously there were three, two for each direction for sorting
and one passed to qunique_arg. To preserve the early exit in the
case of descending input, pass the direction as an argument to
the comparator. This requires giving up duplicate detection, which
previously allowed skipping the qunique_arg() call. Testing showed
no regressions this way.

In passing, get rid of nearby checks that the input has at least
two elements, since preserving them would make some macros less
readable. These are not necessary for correctness, and seem like
premature optimizations.

Author: Andrey M. Borodin <[email protected]>
Discussion: https://postgr.es/m/098A3E67-E4A6-4086-9C66-B1EAEB1DFE1C@yandex-team.ru

Doc: Improve pg_replication_slots.inactive_since description.

Author: Peter Smith <[email protected]>
Reviewed-by: Amit Kapila <[email protected]>
Discussion: https://postgr.es/m/CAHut+PssvVMTWVtUPto6HbPO8pgVsvtzndt_FdBomA_Oq4zf3w@mail.gmail.com

Fix typo in 2a8a0067.

Builds configured with Valgrind but without assertions would fail due to
a typo in the recent change. This should be included when back-patching
2a8a0067 into v17.

Fix translator notes in comments

The translator comments detailing what a %s inclusion refers to were
accidentally including too many address types. In practice this is
not a problem since it's not a translated string, but to minimize any
risk of confusion let's fix them anwyays. Even though this exists in
backbranches there is little use for backpatch as the translation work
has already happened there, so let's avoid the churn.

Author: Japin Li <[email protected]>
Reviewed-by: Daniel Gustafsson <[email protected]>
Discussion: https://postgr.es/m/ME0P300MB04458DE627480614ABE639D2B6FB2@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM

Add tab completion for ALTER USER/ROLE RESET

Currently tab completion for ALTER USER RESET shows a list of all
configuration parameters that may be set on a role, irrespectively of
which parameters are actually set. This patch improves tab completion to
offer only parameters that are set.

Author: Robins Tharakan
Reviewed-By: Tomas Vondra
Discussion: https://postgr.es/m/CAEP4nAzqiT6VbVC5r3nq5byLTnPzjniVGzEMpYcnAHQyNzEuaw%40mail.gmail.com

Add tab completion for ALTER DATABASE RESET

Currently tab completion for ALTER DATABASE RESET shows a list of all
configuration parameters that may be set on a database, irrespectively
of which parameters are actually set. This patch improves tab completion
to offer only parameters that are set.

Author: Robins Tharakan
Reviewed-By: Tomas Vondra
Discussion: https://postgr.es/m/CAEP4nAzqiT6VbVC5r3nq5byLTnPzjniVGzEMpYcnAHQyNzEuaw%40mail.gmail.com

Implement Self-Join Elimination

The Self-Join Elimination (SJE) feature removes an inner join of a plain
table to itself in the query tree if it is proven that the join can be
replaced with a scan without impacting the query result.  Self-join and
inner relation get replaced with the outer in query, equivalence classes,
and planner info structures.  Also, the inner restrictlist moves to the
outer one with the removal of duplicated clauses.  Thus, this optimization
reduces the length of the range table list (this especially makes sense for
partitioned relations), reduces the number of restriction clauses and,
in turn, selectivity estimations, and potentially improves total planner
prediction for the query.

This feature is dedicated to avoiding redundancy, which can appear after
pull-up transformations or the creation of an EquivalenceClass-derived clause
like the below.

  SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3);
  SELECT * FROM t1 WHERE EXISTS (SELECT t3.x FROM t1 t3 WHERE t3.x = t1.x);
  SELECT * FROM t1,t2, t1 t3 WHERE t1.x = t2.x AND t2.x = t3.x;

In the future, we could also reduce redundancy caused by subquery pull-up
after unnecessary outer join removal in cases like the one below.

  SELECT * FROM t1 WHERE x IN
    (SELECT t3.x FROM t1 t3 LEFT JOIN t2 ON t2.x = t1.x);

Also, it can drastically help to join partitioned tables, removing entries
even before their expansion.

The SJE proof is based on innerrel_is_unique() machinery.

We can remove a self-join when for each outer row:

1. At most, one inner row matches the join clause;
2. Each matched inner row must be (physically) the same as the outer one;
3. Inner and outer rows have the same row mark.

In this patch, we use the next approach to identify a self-join:

1. Collect all merge-joinable join quals which look like a.x = b.x;
2. Add to the list above the baseretrictinfo of the inner table;
3. Check innerrel_is_unique() for the qual list.  If it returns false, skip
    this pair of joining tables;
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the
    possibility of self-join elimination, the inner and outer clauses must
    match exactly.

The relation replacement procedure is not trivial and is partly combined
with the one used to remove useless left joins.  Tests covering this feature
were added to join.sql.  Some of the existing regression tests changed due
to self-join removal logic.

Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov <[email protected]>
Author: Alexander Kuzmenkov <[email protected]>
Co-authored-by: Alexander Korotkov <[email protected]>
Co-authored-by: Alena Rybakina <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Simon Riggs <[email protected]>
Reviewed-by: Jonathan S. Katz <[email protected]>
Reviewed-by: David Rowley <[email protected]>
Reviewed-by: Thomas Munro <[email protected]>
Reviewed-by: Konstantin Knizhnik <[email protected]>
Reviewed-by: Heikki Linnakangas <[email protected]>
Reviewed-by: Hywel Carver <[email protected]>
Reviewed-by: Laurenz Albe <[email protected]>
Reviewed-by: Ronan Dunklau <[email protected]>
Reviewed-by: vignesh C <[email protected]>
Reviewed-by: Zhihong Yu <[email protected]>
Reviewed-by: Greg Stark <[email protected]>
Reviewed-by: Jaime Casanova <[email protected]>
Reviewed-by: Michał Kłeczek <[email protected]>
Reviewed-by: Alena Rybakina <[email protected]>
Reviewed-by: Alexander Korotkov <[email protected]>

Revert: Get rid of WALBufMappingLock

This commit reverts 6a2275b895. Buildfarm failure on batta spots some
concurrency issue, which requires further investigation.

Fix an oversight in cbc127917 to handle MERGE correctly

ExecInitModifyTable() forgot to trim MERGE-related lists to exclude
entries for result relations pruned during initial pruning, so fix
that.

While at it, make the function's use of the pruned resultRelations
list, rather than ModifyTable.resultRelations, more consistent.

Reported-by: Alexander Lakhin <[email protected]> (via sqlsmith)
Reviewed-by: Junwang Zhao <[email protected]>
Discussion: https://postgr.es/m/e72c94d9-e5f9-4753-9bc1-69d72bd54b8a@gmail.com

Add information about WAL buffers full to VACUUM/ANALYZE (VERBOSE)

This commit adds the information about the number of times WAL buffers
have been full to the logs generated by VACUUM/ANALYZE (VERBOSE) and in
the logs generated by autovacuum, complementing the existing information
stored by WalUsage.

This is the last part of the backend code where the value of
wal_buffers_full can be reported, similarly to all the other fields of
WalUsage. 320545bfcfee and ce5bcc4a9f26 have done the same for EXPLAIN
and pgss.

Author: Bertrand Drouvot
Reviewed-by: Ilia Evdokimov
Discussion: https://postgr.es/m/[email protected]

Add information about WAL buffers being full to EXPLAIN (WAL)

This is similar to ce5bcc4a9f26, relying on the addition of
wal_buffers_full to WalUsage. This time, the information is added to
the output generated by EXPLAIN (WAL).

Author: Bertrand Drouvot
Reviewed-by: Ilia Evdokimov
Discussion: https://postgr.es/m/[email protected]

pg_stat_statements: Add wal_buffers_full

wal_buffers_full tracks the number of times WAL buffers become full,
giving hints to be able to tune the GUC wal_buffers.

Up to now, this information was only available in pg_stat_wal. With
this field available in WalUsage since eaf502747bac, exposing it in
pg_stat_statements is straight-forward, and it offers more granularity
at query level.

pg_stat_statements does not need a version bump as one has been done in
commit cf54a2c00254 for this development cycle.

Author: Bertrand Drouvot
Reviewed-by: Ilia Evdokimov
Discussion: https://postgr.es/m/[email protected]

Move wal_buffers_full from PgStat_PendingWalStats to WalUsage

wal_buffers_full has been introduced in pg_stat_wal in 8d9a935965f, as
some information providing metrics for the tuning of the GUC
wal_buffers.  WalUsage has been introduced before that in df3b181499.

Moving this field is proving to be beneficial for several reasons:
- This information can now be made available in more layers, providing
more granularity than just pg_stat_wal, on a per-query basis: EXPLAIN,
pgss and VACUUM/ANALYZE logs.
- A patch is under discussion to provide statistics for WAL at backend
level, and this move simplifies a bit the handling of pending
statistics.  The remaining data in PgStat_PendingWalStats now relates to
write/sync counters and times, with equivalents present in pg_stat_io,
that backend statistics are able to already track.  So this should cut
all the dependencies between PgStat_PendingWalStats and WAL stats at
backend level.

As of this change, wal_buffers_full only shows in pg_stat_wal.

Author: Bertrand Drouvot
Reviewed-by: Ilia Evdokimov
Discussion: https://postgr.es/m/[email protected]

Get rid of WALBufMappingLock

Allow multiple backends to initialize WAL buffers concurrently.  This way
`MemSet((char *) NewPage, 0, XLOG_BLCKSZ);` can run in parallel without
taking a single LWLock in exclusive mode.

The new algorithm works as follows:
* reserve a page for initialization using XLogCtl->InitializeReserved,
* ensure the page is written out,
* once the page is initialized, try to advance XLogCtl->InitializedUpTo and
   signal to waiters using XLogCtl->InitializedUpToCondVar condition
   variable,
* repeat previous steps until we reserve initialization up to the target
   WAL position,
* wait until concurrent initialization finishes using a
   XLogCtl->InitializedUpToCondVar.

Now, multiple backends can, in parallel, concurrently reserve pages,
initialize them, and advance XLogCtl->InitializedUpTo to point to the latest
initialized page.

Author: Yura Sokolov <[email protected]>
Co-authored-by: Alexander Korotkov <[email protected]>
Reviewed-by: Pavel Borisov <[email protected]>

Adjust tuples estimate for appendrels

In set_append_rel_size(), we currently set rel->tuples to rel->rows
for an appendrel.  Generally, rel->tuples is the raw number of tuples
in the relation and rel->rows is the estimated number of tuples after
the relation's restriction clauses have been applied.  Although an
appendrel itself doesn't directly enforce any quals today, its child
relations may.  Therefore, setting rel->tuples equal to rel->rows for
an appendrel isn't always appropriate.

Doing so can lead to issues in cost estimates in some cases.  For
instance, when estimating the number of distinct values from an
appendrel, we would not be able to adjust the estimate based on the
restriction selectivity.

This patch addresses this by setting an appendrel's tuples to the
total number of tuples accumulated from each live child, which better
aligns with reality.

This is arguably a bug, but nobody has complained about that until
now, so no back-patch.

Author: Richard Guo <[email protected]>
Reviewed-by: Tender Wang <[email protected]>
Reviewed-by: Alena Rybakina <[email protected]>
Discussion: https://postgr.es/m/CAMbWs4_TG_+kVn6fjG-5GYzzukrNK57=g9eUo4gsrUG26OFawg@mail.gmail.com

In fmtIdEnc(), handle failure of enlargePQExpBuffer().

Coverity complained that we weren't doing that, and it's right.

This fix just makes fmtIdEnc() honor the general convention that OOM
causes a PQExpBuffer to become marked "broken", without any immediate
error. In the pretty-unlikely case that we actually did hit OOM here,
the end result would be to return an empty string to the caller,
probably resulting in invalid SQL syntax in an issued command (if
nothing else went wrong, which is even more unlikely). It's tempting
to throw an "out of memory" error if the buffer becomes broken, but
there's not a lot of point in doing that only here and not in hundreds
of other PQExpBuffer-using places in pg_dump and similar callers.
The whole issue could do with some non-time-crunched redesign, perhaps.

This is a followup to the fixes for CVE-2025-1094, and should be
included if cherry-picking those fixes.

Make escaping functions retain trailing bytes of an invalid character.

Instead of dropping the trailing byte(s) of an invalid or incomplete
multibyte character, replace only the first byte with a known-invalid
sequence, and process the rest normally. This seems less likely to
confuse incautious callers than the behavior adopted in 5dc1e42b4.

While we're at it, adjust PQescapeStringInternal to produce at most
one bleat about invalid multibyte characters per string. This
matches the behavior of PQescapeInternal, and avoids the risk of
producing tons of repetitive junk if a long string is simply given
in the wrong encoding.

This is a followup to the fixes for CVE-2025-1094, and should be
included if cherry-picking those fixes.

Author: Andres Freund <[email protected]>
Co-authored-by: Tom Lane <[email protected]>
Reported-by: Jeff Davis <[email protected]>
Discussion: https://postgr.es/m/20250215012712 [email protected]
Backpatch-through: 13

Fix explicit valgrind interaction in read_stream.c.

By calling wipe_mem() on per-buffer data memory that has been released,
we are also telling Valgrind that the memory is "noaccess". We need to
set it to "undefined" before giving it to the registered callback to
fill in, when a slot is reused.

As discovered by build farm animal skink when the VACUUM streamification
patches landed (the first users of per-buffer data).

Pushing to master only for now, to clear the error on skink. It's also
possible that external code might discover the per-buffer data feature
in v17, and reasonable to expect Valgrind not to produce spurious
memcheck reports, but the back-patch is deferred until after the
imminent minor release is out of the way.

Reviewed-by: Melanie Plageman <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Tested-by: Melanie Plageman <[email protected]>
Discussion: https://postgr.es/m/CA%2BhUKG%2Bg6aXpi2FEHqeLOzE%2BxYw%3DOV%2B-N5jhOEnnV%2BF0USM9xA%40mail.gmail.com

Fix PQescapeLiteral()/PQescapeIdentifier() length handling

In 5dc1e42b4fa I fixed bugs in various escape functions, unfortunately as part
of that I introduced a new bug in PQescapeLiteral()/PQescapeIdentifier(). The
bug is that I made PQescapeInternal() just use strlen(), rather than taking
the specified input length into account.

That's bad, because it can lead to including input that wasn't intended to be
included (in case len is shorter than null termination of the string) and
because it can lead to reading invalid memory if the input string is not null
terminated.

Expand test_escape to this kind of bug:

a) for escape functions with length support, append data that should not be
escaped and check that it is not

b) add valgrind requests to detect access of bytes that should not be touched

Author: Tom Lane <[email protected]>
Author: Andres Freund <[email protected]
Reviewed-by: Tom Lane <[email protected]>
Reviewed-by: Noah Misch <[email protected]>
Discussion: https://postgr.es/m/Z64jD3u46gObCo1p@pryzbyj2023
Backpatch: 13

Add delay time to VACUUM/ANALYZE (VERBOSE) and autovacuum logs.

Commit bb8dff9995 added this information to the
pg_stat_progress_vacuum and pg_stat_progress_analyze system views.
This commit adds the same information to the output of VACUUM and
ANALYZE with the VERBOSE option and to the autovacuum logs.

Suggested-by: Masahiro Ikeda <[email protected]>
Author: Bertrand Drouvot <[email protected]>
Discussion: https://postgr.es/m/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal

pgcrypto: Add support for CFB mode in AES encryption

Cipher Feedback Mode, CFB, is a self-synchronizing stream cipher which
is very similar to CBC performed in reverse. Since OpenSSL supports it,
we can easily plug it into the existing cipher selection code without
any need for infrastructure changes.

This patch was simultaneously submitted by Umar Hayat and Vladyslav
Nebozhyn, the latter whom suggested the feauture. The committed patch
is Umar's version.

Author: Umar Hayat <[email protected]>
Reviewed-by: Daniel Gustafsson <[email protected]>
Reviewed-by: Álvaro Herrera <[email protected]>
Discussion: https://postgr.es/m/CAPBGcbxo9ASzq14VTpQp3mnUJ5omdgTWUJOvWV0L6nNigWE5jw@mail.gmail.com

Use PqMsg_Progress macro in HandleParallelMessage().

Commit a99cc6c6b4 introduced the PqMsg_Progress macro but missed
updating HandleParallelMessage() accordingly.

Backpatch-through: 17

Use streaming read I/O in VACUUM's third phase

Make vacuum's third phase (its second pass over the heap), which reaps
dead items collected in the first phase and marks them as reusable, use
the read stream API. This commit adds a new read stream callback,
vacuum_reap_lp_read_stream_next(), that looks ahead in the TidStore and
returns the next block number to read for vacuum.

Author: Melanie Plageman <[email protected]>
Co-authored-by: Thomas Munro <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>
Discussion: https://postgr.es/m/CA%2BhUKGKN3oy0bN_3yv8hd78a4%2BM1tJC9z7mD8%2Bf%2ByA%2BGeoFUwQ%40mail.gmail.com

Use streaming read I/O in VACUUM's first phase

Make vacuum's first phase, which prunes and freezes tuples and records
dead TIDs, use the read stream API by by converting
heap_vac_scan_next_block() to a read stream callback.

Reviewed-by: Masahiko Sawada <[email protected]>
Reviewed-by: Thomas Munro <[email protected]>
Discussion: https://postgr.es/m/CAAKRu_aLwANZpxHc0tC-6OT0OQT4TftDGkKAO5yigMUOv_Tcsw%40mail.gmail.com

Convert heap_vac_scan_next_block() boolean parameters to flags

The read stream API only allows one piece of extra per block state to be
passed back to the API user (per_buffer_data). lazy_scan_heap() needs
two pieces of per-buffer data: whether or not the block was all-visible
in the visibility map and whether or not it was eagerly scanned.

Convert these two pieces of information to flags so that they can be
populated by heap_vac_scan_next_block() and returned to
lazy_scan_heap(). A future commit will turn heap_vac_scan_next_block()
into the read stream callback for heap phase I vacuuming.

Reviewed-by: Masahiko Sawada <[email protected]>
Reviewed-by: Thomas Munro <[email protected]>
Discussion: https://postgr.es/m/CAAKRu_bmx33jTqATP5GKNFYwAg02a9dDtk4U_ciEjgBHZSVkOQ%40mail.gmail.com

Describe special values in GUC descriptions more consistently.

Many GUCs accept special values like -1 or an empty string to
disable the feature, use a system default, etc.  While the
documentation consistently lists these special values, the GUC
descriptions do not.  Many such descriptions fail to mention the
special values, and those that do vary in phrasing and placement.
This commit aims to bring some consistency to this area by applying
the following rules:

* Special values should be listed at the end of the long
  description.
* Descriptions should use numerals (e.g., "0") instead of words
  (e.g., "zero").
* Special value mentions should be concise and direct (e.g., "0
  disables the timeout.", "An empty string means use the operating
  system setting.").
* Multiple special values should be listed in ascending order.

Of course, there are exceptions, such as
max_pred_locks_per_relation and search_path, whose special values
are too complex to include.  And there are cases like
listen_addresses, where the meaning of an empty string is arguably
too obvious to include.  In those cases, I've refrained from adding
special value information to the GUC description.

Reviewed-by: Peter Smith <[email protected]>
Reviewed-by: "David G. Johnston" <[email protected]>
Reviewed-by: Daniel Gustafsson <[email protected]>
Discussion: https://postgr.es/m/Z6aIy4aywxUZHAo6%40nathan