Andres Freund [Tue, 7 Apr 2020 04:28:55 +0000 (21:28 -0700)]
WIP: pgbench
Andres Freund [Tue, 7 Apr 2020 04:28:55 +0000 (21:28 -0700)]
wip: commit sequence number based snapshot caching
Andres Freund [Tue, 7 Apr 2020 04:28:55 +0000 (21:28 -0700)]
Remove now unused PGXACT.
Andres Freund [Tue, 7 Apr 2020 04:28:55 +0000 (21:28 -0700)]
Improve GetSnapshotData() performance by avoiding indirection for subxid access.
Andres Freund [Tue, 7 Apr 2020 04:28:55 +0000 (21:28 -0700)]
Improve GetSnapshotData() performance by avoiding indirection for vacuumFlags access.
Andres Freund [Tue, 7 Apr 2020 04:28:55 +0000 (21:28 -0700)]
Improve GetSnapshotData() performance by avoiding indirection for xid access.
Andres Freund [Tue, 7 Apr 2020 04:28:55 +0000 (21:28 -0700)]
Move PGXACT->xmin back to PGPROC.
Now that xmin isn't needed for GetSnapshotData() anymore, it just
leads to unnecessary cacheline conflicts.
Andres Freund [Tue, 7 Apr 2020 04:28:55 +0000 (21:28 -0700)]
Change the way backends perform tuple-is-invisible-to-everyone tests.
Instead of using RecentGlobal[Data]Xmin the tests are now done via
InvisibleToEveryone* APIs.
Following commit will take advantage of that to make GetSnapshotData()
more scalable.
Note: This contains a workaround in heap_page_prune_opt() to keep the
snapshot_too_old tests working. While that workaround is ugly, the
tests currently are not meaningful, and it seems best to address them
separately.
Andres Freund [Tue, 7 Apr 2020 04:28:55 +0000 (21:28 -0700)]
Move delayChkpt from PGXACT to PGPROC it's rarely checked & frequently modified.
The goal of PGXACT is to make foreign accesses faster. Having rarely
accessed & frequently modified reduces cache hit ratio for other CPU
cores.
Andres Freund [Tue, 7 Apr 2020 04:28:55 +0000 (21:28 -0700)]
Improve and extend asserts for a snapshot being set.
Andres Freund [Tue, 7 Apr 2020 04:28:55 +0000 (21:28 -0700)]
TMP: work around missing snapshot registrations.
This is just what's hit by the tests. It's not an actual fix.
Andres Freund [Tue, 7 Apr 2020 04:28:55 +0000 (21:28 -0700)]
Fix xlogreader fd leak encountered with twophase commit.
This perhaps is not the best fix, but it's better than the current
situation of failing after a few commits.
This issue appeared after
0dc8ead46, but only because before that
change fd leakage was limited to a single file descriptor.
Discussion: https://postgr.es/m/
20200406025651[email protected]
Tom Lane [Tue, 7 Apr 2020 02:22:13 +0000 (22:22 -0400)]
Fix representation of SORT_TYPE_STILL_IN_PROGRESS.
It turns out that the code did indeed rely on a zeroed
TuplesortInstrumentation.sortMethod field to indicate
"this worker never did anything", although it seems the
issue only comes up during certain race-condition-y cases.
Hence, rearrange the TuplesortMethod enum to restore
SORT_TYPE_STILL_IN_PROGRESS to having the value zero,
and add some comments reinforcing that that isn't optional.
Also future-proof a loop over the possible values of the enum.
sizeof(bits32) happened to be the correct limit value,
but only by purest coincidence.
Per buildfarm and local investigation.
Discussion: https://postgr.es/m/12222.
1586223974@sss.pgh.pa.us
Thomas Munro [Mon, 6 Apr 2020 23:33:56 +0000 (11:33 +1200)]
Introduce xid8-based functions to replace txid_XXX.
The txid_XXX family of fmgr functions exposes 64 bit transaction IDs to
users as int8. Now that we have an SQL type xid8 for FullTransactionId,
define a new set of functions including pg_current_xact_id() and
pg_current_snapshot() based on that. Keep the old functions around too,
for now.
It's a bit sneaky to use the same C functions for both, but since the
binary representation is identical except for the signedness of the
type, and since older functions are the ones using the wrong signedness,
and since we'll presumably drop the older ones after a reasonable period
of time, it seems reasonable to switch to FullTransactionId internally
and share the code for both.
Reviewed-by: Fujii Masao <[email protected]>
Reviewed-by: Takao Fujii <[email protected]>
Reviewed-by: Yoshikazu Imai <[email protected]>
Reviewed-by: Mark Dilger <[email protected]>
Discussion: https://postgr.es/m/
20190725000636.666m5mad25wfbrri%40alap3.anarazel.de
Thomas Munro [Mon, 6 Apr 2020 23:08:14 +0000 (11:08 +1200)]
Add SQL type xid8 to expose FullTransactionId to users.
Similar to xid, but 64 bits wide. This new type is suitable for use in
various system views and administration functions.
Reviewed-by: Fujii Masao <[email protected]>
Reviewed-by: Takao Fujii <[email protected]>
Reviewed-by: Yoshikazu Imai <[email protected]>
Reviewed-by: Mark Dilger <[email protected]>
Discussion: https://postgr.es/m/
20190725000636.666m5mad25wfbrri%40alap3.anarazel.de
Tomas Vondra [Mon, 6 Apr 2020 23:16:57 +0000 (01:16 +0200)]
Use INT64_FORMAT when formatting int64 values in explain
Per report from lapwing.
Tomas Vondra [Mon, 6 Apr 2020 21:58:10 +0000 (23:58 +0200)]
Fix failures in incremental_sort due to number of workers
The last test in incremental_sort suite prints a parallel plan, but some
of the buildfarm animals have custom max_parallel_workers_per_gather
values, causing failures. Fixed by setting the GUC to an explicit value.
Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
Peter Geoghegan [Mon, 6 Apr 2020 21:46:33 +0000 (14:46 -0700)]
Fix nbtree kill_prior_tuple posting list assert.
An assertion added by commit
0d861bbb checked that _bt_killitems() only
processes a BTScanPosItem whose heap TID is contained in a posting list
tuple when its page offset number still matches what is on the page
(i.e. when it matches the posting list tuple's current offset number).
This was only correct in the common case where the page can't have
changed since we first read it. It was not correct in cases where we
don't drop the buffer pin (and don't need to verify the page hasn't
changed using its LSN). The latter category includes scans involving
unlogged tables, and scans that use a non-MVCC snapshot, per the logic
originally introduced by commit
2ed5b87f.
The assertion still seems helpful. Fix it by taking cases where the
page may have been concurrently modified into account.
Reported-By: Anastasia Lubennikova, Alexander Lakhin
Discussion: https://postgr.es/m/
c4e38e9a-0f9c-8e53-e639-
adf343f94472@postgrespro.ru
Tomas Vondra [Mon, 6 Apr 2020 21:19:13 +0000 (23:19 +0200)]
Fix show_incremental_sort_info with force_parallel_mode
When executed with force_parallel_mode=regress, the function was exiting
too early and thus failed to print the worker stats. Fixed by making it
more like show_sort_info.
Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
Tomas Vondra [Mon, 6 Apr 2020 19:33:28 +0000 (21:33 +0200)]
Implement Incremental Sort
Incremental Sort is an optimized variant of multikey sort for cases when
the input is already sorted by a prefix of the requested sort keys. For
example when the relation is already sorted by (key1, key2) and we need
to sort it by (key1, key2, key3) we can simply split the input rows into
groups having equal values in (key1, key2), and only sort/compare the
remaining column key3.
This has a number of benefits:
- Reduced memory consumption, because only a single group (determined by
values in the sorted prefix) needs to be kept in memory. This may also
eliminate the need to spill to disk.
- Lower startup cost, because Incremental Sort produce results after each
prefix group, which is beneficial for plans where startup cost matters
(like for example queries with LIMIT clause).
We consider both Sort and Incremental Sort, and decide based on costing.
The implemented algorithm operates in two different modes:
- Fetching a minimum number of tuples without check of equality on the
prefix keys, and sorting on all columns when safe.
- Fetching all tuples for a single prefix group and then sorting by
comparing only the remaining (non-prefix) keys.
We always start in the first mode, and employ a heuristic to switch into
the second mode if we believe it's beneficial - the goal is to minimize
the number of unnecessary comparions while keeping memory consumption
below work_mem.
This is a very old patch series. The idea was originally proposed by
Alexander Korotkov back in 2013, and then revived in 2017. In 2018 the
patch was taken over by James Coleman, who wrote and rewrote most of the
current code.
There were many reviewers/contributors since 2013 - I've done my best to
pick the most active ones, and listed them in this commit message.
Author: James Coleman, Alexander Korotkov
Reviewed-by: Tomas Vondra, Andreas Karlsson, Marti Raudsepp, Peter Geoghegan, Robert Haas, Thomas Munro, Antonin Houska, Andres Freund, Alexander Kuzmenkov
Discussion: https://postgr.es/m/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=goN-gfA@mail.gmail.com
Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
Tom Lane [Mon, 6 Apr 2020 16:00:37 +0000 (12:00 -0400)]
Re-stabilize infinite_recurse() test case.
Since commit
8f59f6b9c0, CLOBBER_CACHE_ALWAYS buildfarm members have
been failing this test case because the error message now sometimes
includes an error cursor position. It seems largely just luck that
that never happened before, and there are likely to be more ways it
could happen in future. Hence, rather than trying to prevent it,
adjust the test script to suppress that component of the report.
At some point we might need to back-patch this, but refrain until
there's a demonstrated need. (We'd need a different fix before v12,
anyway, since VERBOSITY=sqlstate is a recent thing.)
Tom Lane and Andres Freund
Discussion: https://postgr.es/m/30675.
1586111599@sss.pgh.pa.us
Peter Eisentraut [Mon, 6 Apr 2020 13:15:52 +0000 (15:15 +0200)]
Add logical replication support to replicate into partitioned tables
Mainly, this adds support code in logical/worker.c for applying
replicated operations whose target is a partitioned table to its
relevant partitions.
Author: Amit Langote <
[email protected]>
Reviewed-by: Rafia Sabih <[email protected]>
Reviewed-by: Peter Eisentraut <[email protected]>
Reviewed-by: Petr Jelinek <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com
Amit Kapila [Mon, 6 Apr 2020 10:54:51 +0000 (16:24 +0530)]
Allow autovacuum to log WAL usage statistics.
This commit allows autovacuum to log WAL usage statistics added by commit
df3b181499.
Author: Julien Rouhaud
Reviewed-by: Dilip Kumar and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
Michael Paquier [Mon, 6 Apr 2020 02:44:23 +0000 (11:44 +0900)]
Refactor cluster.c to use new routine get_index_isclustered()
This new cache lookup routine has been introduced in
a40caf5, and more
code paths can directly use it.
Note that in cluster_rel(), the code was returning immediately if the
tuple's entry in pg_index for the clustered index was not valid. This
commit changes the code so as a lookup error is raised instead,
something that could not happen from the start as we check for the
existence of the index beforehand, while holding an exclusive lock on
the parent table.
Author: Justin Pryzby
Reviewed-by: Álvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/
20200202161718[email protected]
Amit Kapila [Mon, 6 Apr 2020 02:32:15 +0000 (08:02 +0530)]
Add the option to report WAL usage in EXPLAIN and auto_explain.
This commit adds a new option WAL similar to existing option BUFFERS in the
EXPLAIN command. This option allows to include information on WAL record
generation added by commit
df3b181499 in EXPLAIN output.
This also allows the WAL usage information to be displayed via
the auto_explain module. A new parameter auto_explain.log_wal controls
whether WAL usage statistics are printed when an execution plan is logged.
This parameter has no effect unless auto_explain.log_analyze is enabled.
Author: Julien Rouhaud
Reviewed-by: Dilip Kumar and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
Michael Paquier [Mon, 6 Apr 2020 02:03:49 +0000 (11:03 +0900)]
Preserve clustered index after rewrites with ALTER TABLE
A table rewritten by ALTER TABLE would lose tracking of an index usable
for CLUSTER. This setting is tracked by pg_index.indisclustered and is
controlled by ALTER TABLE, so some extra work was needed to restore it
properly. Note that ALTER TABLE only marks the index that can be used
for clustering, and does not do the actual operation.
Author: Amit Langote, Justin Pryzby
Reviewed-by: Ibrar Ahmed, Michael Paquier
Discussion: https://postgr.es/m/
20200202161718[email protected]
Backpatch-through: 9.5
Andres Freund [Mon, 6 Apr 2020 01:23:30 +0000 (18:23 -0700)]
Recompute stack base in forked postmaster children.
This is for the benefit of running postgres under the rr
debugger. When using rr signal handlers running while a syscall is
active use an alternative stack. As e.g. bgworkers are started from
within signal handlers, the forked backend then has a different stack
base than postmaster. Previously that subsequently lead to those
processes triggering spurious "stack depth limit exceeded" errors.
Discussion: https://postgr.es/m/
20200327182217[email protected]
Andres Freund [Mon, 6 Apr 2020 00:47:30 +0000 (17:47 -0700)]
Use TransactionXmin instead of RecentGlobalXmin in heap_abort_speculative().
There's a very low risk that RecentGlobalXmin could be far enough in
the past to be older than relfrozenxid, or even wrapped
around. Luckily the consequences of that having happened wouldn't be
too bad - the page wouldn't be pruned for a while.
Avoid that risk by using TransactionXmin instead. As that's announced
via MyPgXact->xmin, it is protected against wrapping around (see code
comments for details around relfrozenxid).
Author: Andres Freund
Discussion: https://postgr.es/m/
20200328213023[email protected]
Backpatch: 9.5-
Andres Freund [Sun, 5 Apr 2020 19:03:09 +0000 (12:03 -0700)]
Fix recently introduced typo.
Reported-By: David Rowley
Peter Eisentraut [Sun, 5 Apr 2020 08:02:00 +0000 (10:02 +0200)]
Save errno across LWLockRelease() calls
Fixup for "Drop slot's LWLock before returning from SaveSlotToPath()"
Reported-by: Michael Paquier <[email protected]>
Tom Lane [Sun, 5 Apr 2020 04:53:25 +0000 (00:53 -0400)]
Further improve stability fix for partition_aggregate test.
Commit
7cb0a423f overlooked that the multi-level partition test table
pagg_tab_ml still had an exactly even row split at its upper level of
partitioning, so that some of the sub-aggregation plan steps still had
exactly equal costs, leading to plan instability. Tweak the partition
boundaries some more to make the row distribution unequal at both
levels. This leads to more changes in the "expected" plan order than
the previous round, but it seems fine. (Actually, I'm surprised that
this didn't affect even more plans in this test: looking at the
underlying costs shows that some of the parallel plan groups are
*not* getting sorted by cost. Bug?)
Per buildfarm member lousyjack,
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-04-04%2021%3A03%3A04
Discussion: https://postgr.es/m/24467.
1585838693@sss.pgh.pa.us
Amit Kapila [Sun, 5 Apr 2020 02:04:04 +0000 (07:34 +0530)]
Allow pg_stat_statements to track WAL usage statistics.
This commit adds three new columns in pg_stat_statements output to
display WAL usage statistics added by commit
df3b181499.
This commit doesn't bump the version of pg_stat_statements as the
same is done for this release in commit
17e0328224.
Author: Kirill Bychik and Julien Rouhaud
Reviewed-by: Julien Rouhaud, Fujii Masao, Dilip Kumar and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
Noah Misch [Sat, 4 Apr 2020 22:45:45 +0000 (15:45 -0700)]
Add perl2host call missing from a new test file.
Oversight in today's commit
c6b92041d38512a4176ed76ad06f713d2e6c01a8.
Per buildfarm member jacana.
Discussion: http://postgr.es/m/
20200404223212[email protected]
Tom Lane [Sat, 4 Apr 2020 22:03:30 +0000 (18:03 -0400)]
Remove bogus Assert, add some regression test cases showing why.
Commit
77ec5affb added an assertion to enforce_generic_type_consistency
that boils down to "if the function result is polymorphic, there must be
at least one polymorphic argument". This should be true for user-created
functions, but there are built-in functions for which it's not true, as
pointed out by Jaime Casanova. Hence, go back to the old behavior of
leaving the return type alone. There's only a limited amount of stuff
you can do with such a function result, but it does work to some extent;
add some regression test cases to ensure we don't break that again.
Discussion: https://postgr.es/m/CAJGNTeMbhtsCUZgJJ8h8XxAJbK7U2ipsX8wkHRtZRz-NieT8RA@mail.gmail.com
Noah Misch [Sat, 4 Apr 2020 19:25:34 +0000 (12:25 -0700)]
Skip WAL for new relfilenodes, under wal_level=minimal.
Until now, only selected bulk operations (e.g. COPY) did this. If a
given relfilenode received both a WAL-skipping COPY and a WAL-logged
operation (e.g. INSERT), recovery could lose tuples from the COPY. See
src/backend/access/transam/README section "Skipping WAL for New
RelFileNode" for the new coding rules. Maintainers of table access
methods should examine that section.
To maintain data durability, just before commit, we choose between an
fsync of the relfilenode and copying its contents to WAL. A new GUC,
wal_skip_threshold, guides that choice. If this change slows a workload
that creates small, permanent relfilenodes under wal_level=minimal, try
adjusting wal_skip_threshold. Users setting a timeout on COMMIT may
need to adjust that timeout, and log_min_duration_statement analysis
will reflect time consumption moving to COMMIT from commands like COPY.
Internally, this requires a reliable determination of whether
RollbackAndReleaseCurrentSubTransaction() would unlink a relation's
current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the
specification of rd_createSubid such that the field is zero when a new
rel has an old rd_node. Make relcache.c retain entries for certain
dropped relations until end of transaction.
Bump XLOG_PAGE_MAGIC, since this introduces XLOG_GIST_ASSIGN_LSN.
Future servers accept older WAL, so this bump is discretionary.
Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert
Haas. Heikki Linnakangas and Michael Paquier implemented earlier
designs that materially clarified the problem. Reviewed, in earlier
designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane,
Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout.
Discussion: https://postgr.es/m/
20150702220524[email protected]
Peter Eisentraut [Sat, 4 Apr 2020 07:08:12 +0000 (09:08 +0200)]
Revert "Improve handling of parameter differences in physical replication"
This reverts commit
246f136e76ecd26844840f2b2057e2c87ec9868d.
That patch wasn't quite complete enough.
Discussion: https://www.postgresql.org/message-id/flat/E1jIpJu-0007Ql-CL%40gemulon.postgresql.org
Amit Kapila [Sat, 4 Apr 2020 04:32:08 +0000 (10:02 +0530)]
Add infrastructure to track WAL usage.
This allows gathering the WAL generation statistics for each statement
execution. The three statistics that we collect are the number of WAL
records, the number of full page writes and the amount of WAL bytes
generated.
This helps the users who have write-intensive workload to see the impact
of I/O due to WAL. This further enables us to see approximately what
percentage of overall WAL is due to full page writes.
In the future, we can extend this functionality to allow us to compute the
the exact amount of WAL data due to full page writes.
This patch in itself is just an infrastructure to compute WAL usage data.
The upcoming patches will expose this data via explain, auto_explain,
pg_stat_statements and verbose (auto)vacuum output.
Author: Kirill Bychik, Julien Rouhaud
Reviewed-by: Dilip Kumar, Fujii Masao and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
Jeff Davis [Sat, 4 Apr 2020 02:52:16 +0000 (19:52 -0700)]
Include chunk overhead in hash table entry size estimate.
Don't try to be precise about it, just use a constant 16 bytes of
chunk overhead. Being smarter would require knowing the memory context
where the chunk will be allocated, which is not known by all callers.
Discussion: https://postgr.es/m/
20200325220936[email protected]
Robert Haas [Sat, 4 Apr 2020 02:28:37 +0000 (22:28 -0400)]
Fix resource management bug with replication=database.
Commit
0d8c9c1210c44b36ec2efcb223a1dfbe897a3661 allowed BASE_BACKUP to
acquire a ResourceOwner without a transaction so that the backup
manifest functionality could use a BufFile, but it overlooked the fact
that when a walsender is used with replication=database, it might have
a transaction in progress, because in that mode, SQL and replication
commands can be mixed. Try to fix things up so that the two cleanup
mechanisms don't conflict.
Per buildfarm member serinus, which triggered the problem when
CREATE_REPLICATION_SLOT failed from inside a transaction. It passed
on the subsequent run, so evidently the failure doesn't happen every
time.
Robert Haas [Sat, 4 Apr 2020 00:15:27 +0000 (20:15 -0400)]
Be more careful about time_t vs. pg_time_t in basebackup.c.
lapwing is complaining that about a call to pg_gmtime, saying that
it "expected 'const pg_time_t *' but argument is of type 'time_t *'".
I at first thought that the problem had someting to do with const,
but Thomas Munro suggested that it might be just because time_t
and pg_time_t are different identifers. lapwing is i686 rather than
x86_64, and pg_time_t is always int64, so that seems like a good
guess.
There is other code that just casts time_t to pg_time_t without
any conversion function, so try that approach here.
Introduced in commit
0d8c9c1210c44b36ec2efcb223a1dfbe897a3661.
Robert Haas [Fri, 3 Apr 2020 23:51:18 +0000 (19:51 -0400)]
pg_validatebackup: Fix 'make clean' to remove tmp_check.
Report by Tom Lane.
Discussion: http://postgr.es/m/22394.
1585951968@sss.pgh.pa.us
Robert Haas [Fri, 3 Apr 2020 23:01:59 +0000 (19:01 -0400)]
pg_validatebackup: Adjust TAP tests to undo permissions change.
It may be necessary to go further and remove this test altogether,
but I'm going to try this fix first. It's not clear, at least to
me, exactly how this is breaking buildfarm members, but it appears
to be doing so.
Robert Haas [Fri, 3 Apr 2020 21:16:31 +0000 (17:16 -0400)]
pg_validatebackup: Also use perl2host in TAP tests.
Second try at getting the buildfarm to be happy with 003_corrution.pl
as added by commit
0d8c9c1210c44b36ec2efcb223a1dfbe897a3661.
Per suggestion from Álvaro Herrera.
Discussion: http://postgr.es/m/
20200403205412[email protected]
Tom Lane [Fri, 3 Apr 2020 21:00:25 +0000 (17:00 -0400)]
Cosmetic improvements for code related to partitionwise join.
Move have_partkey_equi_join and match_expr_to_partition_keys to
relnode.c, since they're used only there. Refactor
build_joinrel_partition_info to split out the code that fills the
joinrel's partition key lists; this doesn't have any non-cosmetic
impact, but it seems like a useful separation of concerns.
Improve assorted nearby comments.
Amit Langote, with a little further editorialization by me
Discussion: https://postgr.es/m/CA+HiwqG2WVUGmLJqtR0tPFhniO=H=9qQ+Z3L_ZC+Y3-EVQHFGg@mail.gmail.com
Robert Haas [Fri, 3 Apr 2020 19:40:35 +0000 (15:40 -0400)]
pg_validatebackup: Use tempdir_short in TAP tests.
The buildfarm is very unhappy right now because TAP test
003_corruption.pl uses TestLib::tempdir to generate the name of
a temporary directory that is used as a tablespace name, and
this results in a 'symbolic link target too long' error message
on many of the buildfarm machines, but not on my machine.
It appears that other people have run into similar problems in
the past and that TestLib::tempdir_short was the solution, so
let's try using that instead.
Robert Haas [Fri, 3 Apr 2020 19:28:59 +0000 (15:28 -0400)]
pg_validatebackup: Adjust TAP tests to placate perlcritic.
It seems that we have a policy that every Perl subroutine should
end with an explicit "return", so add explicit "return"
statements to all the new subroutines added by my prior
commit
0d8c9c1210c44b36ec2efcb223a1dfbe897a3661.
Per buildfarm.
Robert Haas [Fri, 3 Apr 2020 18:59:47 +0000 (14:59 -0400)]
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
Fujii Masao [Fri, 3 Apr 2020 18:13:17 +0000 (03:13 +0900)]
Include information on buffer usage during planning phase, in EXPLAIN output, take two.
When BUFFERS option is enabled, EXPLAIN command includes the information
on buffer usage during each plan node, in its output. In addition to that,
this commit makes EXPLAIN command include also the information on
buffer usage during planning phase, in its output. This feature makes it
easier to discern the cases where lots of buffer access happen during
planning.
This commit revives the original commit
ed7a509571 that was reverted by
commit
19db23bcbd. The original commit had to be reverted because
it caused the regression test failure on the buildfarm members prion and
dory. But since commit
c0885c4c30 got rid of the caues of the test failure,
the original commit can be safely introduced again.
Author: Julien Rouhaud, slightly revised by Fujii Masao
Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/16109-
26a1a88651e90608@postgresql.org
Tom Lane [Fri, 3 Apr 2020 17:15:30 +0000 (13:15 -0400)]
Fix bugs in gin_fuzzy_search_limit processing.
entryGetItem()'s three code paths each contained bugs associated
with filtering the entries for gin_fuzzy_search_limit.
The posting-tree path failed to advance "advancePast" after having
decided to filter an item. If we ran out of items on the current
page and needed to advance to the next, what would actually happen
is that entryLoadMoreItems() would re-load the same page. Eventually,
the random dropItem() test would accept one of the same items it'd
previously rejected, and we'd move on --- but it could take awhile
with small gin_fuzzy_search_limit. To add insult to injury, this
case would inevitably cause entryLoadMoreItems() to decide it needed
to re-descend from the root, making things even slower.
The posting-list path failed to implement gin_fuzzy_search_limit
filtering at all, so that all entries in the posting list would
be returned.
The bitmap-result path used a "gotitem" variable that it failed to
update in the one place where it'd actually make a difference, ie
at the one "continue" statement. I think this was unreachable in
practice, because if we'd looped around then it shouldn't be the
case that the entries on the new page are before advancePast.
Still, the "gotitem" variable was contributing nothing to either
clarity or correctness, so get rid of it.
Refactor all three loops so that the termination conditions are
more alike and less unreadable.
The code coverage report showed that we had no coverage at all for
the re-descend-from-root code path in entryLoadMoreItems(), which
seems like a very bad thing, so add a test case that exercises it.
We also had exactly no coverage for gin_fuzzy_search_limit, so add a
simplistic test case that at least hits those code paths a little bit.
Back-patch to all supported branches.
Adé Heyward and Tom Lane
Discussion: https://postgr.es/m/CAEknJCdS-dE1Heddptm7ay2xTbSeADbkaQ8bU2AXRCVC2LdtKQ@mail.gmail.com
Fujii Masao [Fri, 3 Apr 2020 16:25:44 +0000 (01:25 +0900)]
Improve stability of explain regression test.
The explain regression test runs EXPLAIN commands via the function
that filters unstable outputs. To produce more stable test output,
this commit improves the function so that it also filters out text-mode
Buffers lines. This is necessary because text-mode Buffers lines vary
depending the system state.
This improvement will get rid of the regression test failure that
the commit
ed7a509571 caused on the buildfarm members prion and
dory because of the instability of Buffers lines.
Author: Fujii Masao
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/
20200403025751[email protected]
Alvaro Herrera [Fri, 3 Apr 2020 16:23:20 +0000 (13:23 -0300)]
Add a glossary to the documentation
More work is still needed, but this is a good start.
Co-authored-by: Corey Huinker <[email protected]>
Co-authored-by: Jürgen Purtz <[email protected]>
Co-authored-by: Roger Harkavy <[email protected]>
Co-authored-by: Álvaro Herrera <[email protected]>
Reviewed-by: Justin Pryzby <[email protected]>
Discussion: https://postgr.es/m/CADkLM=eP6HOeqDjn0FdXuGRusQu4oWH_LFsKjjafmhvWD=aSpQ@mail.gmail.com
Robert Haas [Fri, 3 Apr 2020 15:58:58 +0000 (11:58 -0400)]
pg_waldump: Don't call XLogDumpDisplayStats() if -q is specified.
Commit
ac44367efbef198c57a18b96dbc6a39191720994 introduced this
problem.
Report and fix by Fujii Masao.
Discussion: http://postgr.es/m/
d332b8f0-0c72-3cd6-6945-
7a86a503662a@oss.nttdata.com
Robert Haas [Fri, 3 Apr 2020 15:50:38 +0000 (11:50 -0400)]
Add checksum helper functions.
These functions make it easier to write code that wants to compute a
checksum for some data while allowing the user to configure the type
of checksum that gets used.
This is another piece of infrastructure for the upcoming patch to add
backup manifests.
Patch written from scratch by me, but it is similar to previous work
by Rushabh Lathia and Suraj Kharage. Suraj also reviewed this version
off-list. Advice on how not to break Windows from Davinder Singh.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZRTBiPyvQEwV79PU1ePTtSEo2UeVncrkJMbn1sU1gnRA@mail.gmail.com
Tom Lane [Fri, 3 Apr 2020 15:24:56 +0000 (11:24 -0400)]
Fix bogus CALLED_AS_TRIGGER() defenses.
contrib/lo's lo_manage() thought it could use
trigdata->tg_trigger->tgname in its error message about
not being called as a trigger. That naturally led to a core dump.
unique_key_recheck() figured it could Assert that fcinfo->context
is a TriggerData node in advance of having checked that it's
being called as a trigger. That's harmless in production builds,
and perhaps not that easy to reach in any case, but it's logically
wrong.
The first of these per bug #16340 from William Crowell;
the second from manual inspection of other CALLED_AS_TRIGGER
call sites.
Back-patch the lo.c change to all supported branches, the
other to v10 where the thinko crept in.
Discussion: https://postgr.es/m/16340-
591c7449dc7c8c47@postgresql.org
Fujii Masao [Fri, 3 Apr 2020 03:20:42 +0000 (12:20 +0900)]
Revert "Include information on buffer usage during planning phase, in EXPLAIN output."
This reverts commit
ed7a5095716ee498ecc406e1b8d5ab92c7662d10.
Per buildfarm member prion.
Fujii Masao [Fri, 3 Apr 2020 03:15:56 +0000 (12:15 +0900)]
Add wait events for recovery conflicts.
This commit introduces new wait events RecoveryConflictSnapshot and
RecoveryConflictTablespace. The former is reported while waiting for
recovery conflict resolution on a vacuum cleanup. The latter is reported
while waiting for recovery conflict resolution on dropping tablespace.
Also this commit changes the code so that the wait event Lock is reported
while waiting in ResolveRecoveryConflictWithVirtualXIDs() for recovery
conflict resolution on a lock. Basically the wait event Lock is reported
during that wait, but previously was not reported only when that wait
happened in ResolveRecoveryConflictWithVirtualXIDs().
Author: Masahiko Sawada
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/CA+fd4k4mXWTwfQLS3RPwGr4xnfAEs1ysFfgYHvmmoUgv6Zxvmg@mail.gmail.com
Michael Paquier [Fri, 3 Apr 2020 02:45:15 +0000 (11:45 +0900)]
Add support for \aset in pgbench
This option is similar to \gset, except that it is able to store all
results from combined SQL queries into separate variables. If a query
returns multiple rows, the last result is stored and if a query returns
no rows, nothing is stored.
While on it, add a TAP test for \gset to check for a failure when a
query returns multiple rows.
Author: Fabien Coelho
Reviewed-by: Ibrar Ahmed, Michael Paquier
Discussion: https://postgr.es/m/alpine.DEB.2.21.
1904081914200.2529@lancre
Fujii Masao [Fri, 3 Apr 2020 02:27:09 +0000 (11:27 +0900)]
Include information on buffer usage during planning phase, in EXPLAIN output.
When BUFFERS option is enabled, EXPLAIN command includes the information
on buffer usage during each plan node, in its output. In addition to that,
this commit makes EXPLAIN command include also the information on
buffer usage during planning phase, in its output. This feature makes it
easier to discern the cases where lots of buffer access happen during
planning.
Author: Julien Rouhaud, slightly revised by Fujii Masao
Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/16109-
26a1a88651e90608@postgresql.org
Robert Haas [Fri, 3 Apr 2020 00:25:04 +0000 (20:25 -0400)]
pg_waldump: Add a --quiet option.
The primary motivation for this change is that it will be used by the
upcoming patch to add backup manifests, but it also seems to have some
potential more general use.
Andres Freund and Robert Haas
Discussion: http://postgr.es/m/
20200330020814[email protected]
Tom Lane [Thu, 2 Apr 2020 23:43:48 +0000 (19:43 -0400)]
Improve stability fix for partition_aggregate test.
Instead of disabling autovacuum on these test tables, adjust the
partition boundaries so that the child partitions are not all the
same size. That should cause the planner to use a predictable
ordering of the per-partition scan nodes even in cases where
autovacuum causes the rowcount estimates to be off a bit.
Moreover, this also lets these tests show that the planner does
properly order the tables in descending size order, something
that wasn't being proven before.
The pagg_tab1 and pagg_tab2 partitions are still all the same
size, but that should be fine, because those tables are so small
that (1) autovacuum won't fire on them, and (2) even if it did,
it couldn't change the reltuples value --- with only one page,
it can't see just part of the relation.
Discussion: https://postgr.es/m/24467.
1585838693@sss.pgh.pa.us
Bruce Momjian [Thu, 2 Apr 2020 21:42:09 +0000 (17:42 -0400)]
doc: remove unnecessary INNER keyword
A join that was added in commit
9b2009c4cf that did not use the INNER
keyword but the existing query used it. It was cleaner to remove the
existing INNER keyword.
Reported-by: Peter Eisentraut
Discussion: https://postgr.es/m/
a1ffbfda-59d2-5732-e5fb-
3df8582b6434@2ndquadrant.com
Backpatch-through: 9.5
Bruce Momjian [Thu, 2 Apr 2020 21:27:43 +0000 (17:27 -0400)]
doc: remove comma, related to commit
92d31085e9
Reported-by: Peter Eisentraut
Discussion: https://postgr.es/m/
750b8832-d123-7f9b-931e-
43ce8321b2d7@2ndquadrant.com
Backpatch-through: 9.5
Tom Lane [Thu, 2 Apr 2020 19:04:51 +0000 (15:04 -0400)]
Improve user control over truncation of logged bind-parameter values.
This patch replaces the boolean GUC log_parameters_on_error introduced
by commit
ba79cb5dc with an integer log_parameter_max_length_on_error,
adding the ability to specify how many bytes to trim each logged
parameter value to. (The previous coding hard-wired that choice at
64 bytes.)
In addition, add a new parameter log_parameter_max_length that provides
similar control over truncation of query parameters that are logged in
response to statement-logging options, as opposed to errors. Previous
releases always logged such parameters in full, possibly causing log
bloat.
For backwards compatibility with prior releases,
log_parameter_max_length defaults to -1 (log in full), while
log_parameter_max_length_on_error defaults to 0 (no logging).
Per discussion, log_parameter_max_length is SUSET since the DBA should
control routine logging behavior, but log_parameter_max_length_on_error
is USERSET because it also affects errcontext data sent back to the
client.
Alexey Bashtanov, editorialized a little by me
Discussion: https://postgr.es/m/
b10493cc-a399-a03a-67c7-
068f2791ee50@imap.cc
Tomas Vondra [Thu, 2 Apr 2020 12:26:27 +0000 (14:26 +0200)]
Fix typo in SLRU stats documentation
Author: Noriyoshi Shinoda
Discussion: https://www.postgresql.org/message-id/flat/
20200119143707.gyinppnigokesjok@development
David Rowley [Thu, 2 Apr 2020 08:26:54 +0000 (21:26 +1300)]
Attempt to stabilize partitionwise_aggregate test
In
b07642dbc, we added code to trigger autovacuums based on the number of
INSERTs into a table. This seems to have cause some destabilization of
the regression tests. Likely this is due to an autovacuum triggering
mid-test and (per theory from Tom Lane) one of the test's queries causes
autovacuum to skip some number of pages, resulting in the reltuples
estimate changing.
The failure that this is attempting to fix is around the order of subnodes
in an Append. Since the planner orders these according to the subnode
cost, then it's possible that a small change in the reltuples value changes
the subnode's cost enough that it swaps position with one of its fellow
subnodes.
The failure here only seems to occur on slower buildfarm machines. In this
case, lousyjack, which seems have taken over 8 minutes to run just
the partitionwise_aggregate test. Such a slow run would increase the
chances that the autovacuum launcher would trigger a vacuum mid-test.
Faster machines run this test in sub second time, so have a much smaller
window for an autovacuum to trigger.
Here we fix this by disabling autovacuum on all tables created in the test.
Additionally, this reverts the change made in the
partitionwise_aggregate test in
2dc16efed.
Discussion: https://postgr.es/m/22297.
1585797192@sss.pgh.pa.us
Peter Eisentraut [Thu, 26 Mar 2020 07:14:00 +0000 (08:14 +0100)]
Add SQL functions for Unicode normalization
This adds SQL expressions NORMALIZE() and IS NORMALIZED to convert and
check Unicode normal forms, per SQL standard.
To support fast IS NORMALIZED tests, we pull in a new data file
DerivedNormalizationProps.txt from Unicode and build a lookup table
from that, using techniques similar to ones already used for other
Unicode data. make update-unicode will keep it up to date. We only
build and use these tables for the NFC and NFKC forms, because they
are too big for NFD and NFKD and the improvement is not significant
enough there.
Reviewed-by: Daniel Verite <[email protected]>
Reviewed-by: Andreas Karlsson <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/
c1909f27-c269-2ed9-12f8-
3ab72c8caf7a@2ndquadrant.com
Peter Eisentraut [Thu, 2 Apr 2020 06:56:12 +0000 (08:56 +0200)]
Fix whitespace
Peter Eisentraut [Thu, 2 Apr 2020 06:01:30 +0000 (08:01 +0200)]
doc: Update for Unix-domain sockets on Windows
Update the documentation to reflect that Unix-domain sockets are now
usable on Windows.
Peter Eisentraut [Thu, 2 Apr 2020 05:52:20 +0000 (07:52 +0200)]
Add some comments to some SQL features
Otherwise, it could be confusing to a reader that some of these
well-publicized features are simply listed as unsupported without
further explanation.
Thomas Munro [Thu, 2 Apr 2020 03:44:11 +0000 (16:44 +1300)]
Add maintenance_io_concurrency to postgresql.conf.sample.
New GUC from commit
fc34b0d9.
Amit Kapila [Thu, 2 Apr 2020 02:34:58 +0000 (08:04 +0530)]
Allow parallel vacuum to accumulate buffer usage.
Commit
40d964ec99 allowed vacuum command to process indexes in parallel but
forgot to accumulate the buffer usage stats of parallel workers. This
allows leader backend to accumulate buffer usage stats of all the parallel
workers.
Reported-by: Julien Rouhaud
Author: Sawada Masahiko
Reviewed-by: Dilip Kumar, Amit Kapila and Julien Rouhaud
Discussion: https://postgr.es/m/
20200328151721.GB12854@nol
Fujii Masao [Thu, 2 Apr 2020 02:20:19 +0000 (11:20 +0900)]
Allow pg_stat_statements to track planning statistics.
This commit makes pg_stat_statements support new GUC
pg_stat_statements.track_planning. If this option is enabled,
pg_stat_statements tracks the planning statistics of the statements,
e.g., the number of times the statement was planned, the total time
spent planning the statement, etc. This feature is useful to check
the statements that it takes a long time to plan. Previously since
pg_stat_statements tracked only the execution statistics, we could
not use that for the purpose.
The planning and execution statistics are stored at the end of
each phase separately. So there are not always one-to-one relationship
between them. For example, if the statement is successfully planned
but fails in the execution phase, only its planning statistics are stored.
This may cause the users to be able to see different pg_stat_statements
results from the previous version. To avoid this,
pg_stat_statements.track_planning needs to be disabled.
This commit bumps the version of pg_stat_statements to 1.8
since it changes the definition of pg_stat_statements function.
Author: Julien Rouhaud, Pascal Legrand, Thomas Munro, Fujii Masao
Reviewed-by: Sergei Kornilov, Tomas Vondra, Yoshikazu Imai, Haribabu Kommi, Tom Lane
Discussion: https://postgr.es/m/CAHGQGwFx_=DO-Gu-MfPW3VQ4qC7TfVdH2zHmvZfrGv6fQ3D-Tw@mail.gmail.com
Discussion: https://postgr.es/m/CAEepm=0e59Y_6Q_YXYCTHZkqOc6H2pJ54C_Xe=VFu50Aqqp_sA@mail.gmail.com
Discussion: https://postgr.es/m/DB6PR0301MB21352F6210E3B11934B0DCC790B00@DB6PR0301MB2135.eurprd03.prod.outlook.com
Tomas Vondra [Thu, 2 Apr 2020 00:11:38 +0000 (02:11 +0200)]
Collect statistics about SLRU caches
There's a number of SLRU caches used to access important data like clog,
commit timestamps, multixact, asynchronous notifications, etc. Until now
we had no easy way to monitor these shared caches, compute hit ratios,
number of reads/writes etc.
This commit extends the statistics collector to track this information
for a predefined list of SLRUs, and also introduces a new system view
pg_stat_slru displaying the data.
The list of built-in SLRUs is fixed, but additional SLRUs may be defined
in extensions. Unfortunately, there's no suitable registry of SLRUs, so
this patch simply defines a fixed list of SLRUs with entries for the
built-in ones and one entry for all additional SLRUs. Extensions adding
their own SLRU are fairly rare, so this seems acceptable.
This patch only allows monitoring of SLRUs, not tuning. The SLRU sizes
are still fixed (hard-coded in the code) and it's not entirely clear
which of the SLRUs might need a GUC to tune size. In a way, allowing us
to determine that is one of the goals of this patch.
Bump catversion as the patch introduces new functions and system view.
Author: Tomas Vondra
Reviewed-by: Alvaro Herrera
Discussion: https://www.postgresql.org/message-id/flat/
20200119143707.gyinppnigokesjok@development
Tom Lane [Wed, 1 Apr 2020 23:44:17 +0000 (19:44 -0400)]
Clean up parsing of ltree and lquery some more.
Fix lquery parsing to handle repeated flag characters correctly,
and to enforce the max label length correctly in some cases where
it did not before, and to detect empty labels in some cases where
it did not before.
In a more cosmetic vein, use a switch rather than if-then chains to
handle the different states, and avoid unnecessary checks on charlen
when looking for ASCII characters, and factor out multiple copies of
the label length checking code.
Tom Lane and Dmitry Belyavsky
Discussion: https://postgr.es/m/CADqLbzLVkBuPX0812o+z=c3i6honszsZZ6VQOSKR3VPbB56P3w@mail.gmail.com
Tom Lane [Wed, 1 Apr 2020 21:31:29 +0000 (17:31 -0400)]
Add support for binary I/O of ltree, lquery, and ltxtquery types.
Not much to say here --- does what it says on the tin. The "binary"
representation in each case is really just the same as the text format,
though we prefix a version-number byte in case anyone ever feels
motivated to change that. Thus, there's not any expectation of improved
speed or reduced space; the point here is just to allow clients to use
binary format for all columns of a query result or COPY data.
This makes use of the recently added ALTER TYPE support to add binary
I/O functions to an existing data type. As in commit
a80818605,
we can piggy-back on there already being a new-for-v13 version of the
ltree extension, so we don't need a new update script file.
Nino Floris, reviewed by Alexander Korotkov and myself
Discussion: https://postgr.es/m/CANmj9Vxx50jOo1L7iSRxd142NyTz6Bdcgg7u9P3Z8o0=HGkYyQ@mail.gmail.com
Tom Lane [Wed, 1 Apr 2020 18:49:49 +0000 (14:49 -0400)]
Check equality semantics for unique indexes on partitioned tables.
We require the partition key to be a subset of the set of columns
being made unique, so that physically-separate indexes on the different
partitions are sufficient to enforce the uniqueness constraint.
The existing code checked that the listed columns appear, but did not
inquire into the index semantics, which is a serious oversight given
that different index opclasses might enforce completely different
notions of uniqueness.
Ideally, perhaps, we'd just match the partition key opfamily to the
index opfamily. But hash partitioning uses hash opfamilies which we
can't directly match to btree opfamilies. Hence, look up the equality
operator in each family, and accept if it's the same operator. This
should be okay in a fairly general sense, since the equality operator
ought to precisely represent the opfamily's notion of uniqueness.
A remaining weak spot is that we don't have a cross-index-AM notion of
which opfamily member is "equality". But we know which one to use for
hash and btree AMs, and those are the only two that are relevant here
at present. (Any non-core AMs that know how to enforce equality are
out of luck, for now.)
Back-patch to v11 where this feature was introduced.
Guancheng Luo, revised a bit by me
Discussion: https://postgr.es/m/
D9C3CEF7-04E8-47A1-8300-
CA1DCD5ED40D@gmail.com
Tom Lane [Wed, 1 Apr 2020 14:32:33 +0000 (10:32 -0400)]
Improve selectivity estimation for assorted match-style operators.
Quite a few matching operators such as JSONB's @> used "contsel" and
"contjoinsel" as their selectivity estimators. That was a bad idea,
because (a) contsel is only a stub, yielding a fixed default estimate,
and (b) that default is 0.001, meaning we estimate these operators as
five times more selective than equality, which is surely pretty silly.
There's a good model for improving this in ltree's ltreeparentsel():
for any "var OP constant" query, we can try applying the operator
to all of the column's MCV and histogram values, taking the latter
as being a random sample of the non-MCV values. That code is
actually 100% generic, except for the question of exactly what
default selectivity ought to be plugged in when we don't have stats.
Hence, migrate the guts of ltreeparentsel() into the core code, provide
wrappers "matchingsel" and "matchingjoinsel" with a more-appropriate
default estimate, and use those for the non-geometric operators that
formerly used contsel (mostly JSONB containment operators and tsquery
matching).
Also apply this code to some match-like operators in hstore, ltree, and
pg_trgm, including the former users of ltreeparentsel as well as ones
that improperly used contsel. Since commit
911e70207 just created new
versions of those extensions that we haven't released yet, we can sneak
this change into those new versions instead of having to create an
additional generation of update scripts.
Patch by me, reviewed by Alexey Bashtanov
Discussion: https://postgr.es/m/12237.
1582833074@sss.pgh.pa.us
Peter Eisentraut [Wed, 1 Apr 2020 13:31:47 +0000 (15:31 +0200)]
Refactor code to look up local replication tuple
This unifies some duplicate code.
Author: Amit Langote <
[email protected]>
Discussion: https://www.postgresql.org/message-id/CA+HiwqFjYE5anArxvkjr37AQMd52L-LZtz9Ld2QrLQ3YfcYhTw@mail.gmail.com
Peter Eisentraut [Wed, 1 Apr 2020 12:43:45 +0000 (14:43 +0200)]
Update SQL features count
The previously listed total of 179 does not appear to be correct for
SQL:2016 anymore. (Previous SQL versions had slightly different
feature sets, so it's plausible that it was once correct.) The
currently correct count is the number of rows in the respective tables
in appendix F in SQL parts 2 and 11, minus 2 features that are listed
twice. Thus the correct count is currently 177. This also matches
the number of Core entries the built documentation currently shows, so
it's internally consistent.
Alexander Korotkov [Wed, 1 Apr 2020 12:07:53 +0000 (15:07 +0300)]
Fix typo in contrib/intarray documentation
Reported-by: Erik Rijkers
Discussion: https://postgr.es/m/
82529ecf9bcc58d5b5cf9f3ffb699881%40xs4all.nl
Alexander Korotkov [Wed, 1 Apr 2020 12:01:26 +0000 (15:01 +0300)]
Correct CREATE INDEX documentation for opclass parameters
Old versions of opclass parameters patch supported ability to specify DEFAULT
as the opclass name in CREATE INDEX command. This ability was removed in the
final version, but
911e702077 still mentions that in the documentation.
Alexander Korotkov [Wed, 1 Apr 2020 11:42:17 +0000 (14:42 +0300)]
Documentation corrections for opclass parameters
Discussion: https://postgr.es/m/
20200331024419.GB14618%40telsasoft.com
Author: Justin Pryzby
Michael Paquier [Wed, 1 Apr 2020 05:45:45 +0000 (14:45 +0900)]
Fix crash in psql when attempting to reuse old connection
In a psql session, if the connection to the server is abruptly cut, the
referenced connection would become NULL as of CheckConnection(). This
could cause a hard crash with psql if attempting to connect by reusing
the past connection's data because of a null-pointer dereference with
either PQhost() or PQdb(). This issue is fixed by making sure that no
reuse of the past connection is done if it does not exist.
Issue has been introduced by
6e5f8d4, so backpatch down to 12.
Reported-by: Hugh Wang
Author: Michael Paquier
Reviewed-by: Álvaro Herrera, Tom Lane
Discussion: https://postgr.es/m/16330-
b34835d83619e25d@postgresql.org
Backpatch-through: 12
Amit Kapila [Wed, 1 Apr 2020 03:58:13 +0000 (09:28 +0530)]
Fix coverity complaint about commit
40d964ec99.
The coverity complained that dividing integer expressions and then
converting the integer quotient to type "double" would lose fractional
part. Typecasting one of the arguments of expression with double should
fix the report.
Author: Mahendra Singh Thalor
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/
20200329224818[email protected]
Bruce Momjian [Wed, 1 Apr 2020 03:01:34 +0000 (23:01 -0400)]
psql: do file completion for \gx
This was missed when the feature was added.
Reported-by: Vik Fearing
Discussion: https://postgr.es/m/
eca20529-0b06-b493-ee38-
f071a75dcd5b@postgresfriends.org
Backpatch-through: 10
Michael Paquier [Wed, 1 Apr 2020 01:57:03 +0000 (10:57 +0900)]
Add -c/--restore-target-wal to pg_rewind
pg_rewind needs to copy from the source cluster to the target cluster a
set of relation blocks changed from the previous checkpoint where WAL
forked up to the end of WAL on the target. Building this list of
relation blocks requires a range of WAL segments that may not be present
anymore on the target's pg_wal, causing pg_rewind to fail. It is
possible to work around this issue by copying manually the WAL segments
needed but this may lead to some extra and actually useless work.
This commit introduces a new option allowing pg_rewind to use a
restore_command while doing the rewind by grabbing the parameter value
of restore_command from the target cluster configuration. This allows
the rewind operation to be more reliable, so as only the WAL segments
needed by the rewind are restored from the archives.
In order to be able to do that, a new routine is added to src/common/ to
allow frontend tools to restore files from archives using an
already-built restore command. This version is more simple than the
backend equivalent as there is no need to handle the non-recovery case.
Author: Alexey Kondratov
Reviewed-by: Andrey Borodin, Andres Freund, Alvaro Herrera, Alexander
Korotkov, Michael Paquier
Discussion: https://postgr.es/m/
a3acff50-5a0d-9a2c-b3b2-
ee36168955c1@postgrespro.ru
Bruce Momjian [Tue, 31 Mar 2020 22:44:29 +0000 (18:44 -0400)]
doc: remove mention of bitwise operators as solely type-limited
There are other operators that have limited number data type support, so
just remove the sentence.
Reported-by: Sergei Agalakov
Discussion: https://postgr.es/m/
158032651854.19851.
16261832706661813796@wrigleys.postgresql.org
Backpatch-through: 9.5
Bruce Momjian [Tue, 31 Mar 2020 22:10:39 +0000 (18:10 -0400)]
doc: clarify hierarchy of objects: global, db, schema, etc.
The previous wording was confusing because it wasn't in decreasing order
and had to backtrack. Also clarify role/user wording.
Reported-by: [email protected]
Discussion: https://postgr.es/m/
158057750885.1123.
2806779262588618988@wrigleys.postgresql.org
Backpatch-through: 9.5
Bruce Momjian [Tue, 31 Mar 2020 21:57:44 +0000 (17:57 -0400)]
doc: clarify when row-level locks are released
They are released just like table-level locks. Also clean up wording.
(Uses wording "rolled back to".)
Reported-by: [email protected]
Discussion: https://postgr.es/m/
158074944048.1095.
4309647363871637715@wrigleys.postgresql.org
Backpatch-through: 9.5
Peter Geoghegan [Tue, 31 Mar 2020 21:38:39 +0000 (14:38 -0700)]
Add CREATE INDEX deduplication assertions.
Add two assertions that verify the assumptions about posting list tuple
space accounting and suffix truncation made within nbtsort.c.
Bruce Momjian [Tue, 31 Mar 2020 21:32:00 +0000 (17:32 -0400)]
Revert erroroneous commit
834b80464d; my apologies
Backpatch-through: master
Bruce Momjian [Thu, 12 Mar 2020 19:42:35 +0000 (15:42 -0400)]
dummy commit
Bruce Momjian [Tue, 31 Mar 2020 21:16:33 +0000 (17:16 -0400)]
doc: add namespace column to pg_buffercache example query
Without the namespace, the table name could be ambiguous.
Reported-by: [email protected]
Discussion: https://postgr.es/m/
158155175140.23798.
2189464781144503491@wrigleys.postgresql.org
Backpatch-through: 9.5
Bruce Momjian [Tue, 31 Mar 2020 21:07:43 +0000 (17:07 -0400)]
doc: clarify which table creation is used for inheritance part.
Previously people might assume that the partition syntax version of
CREATE TABLE is to be used for the inheritance partition table example;
mention that the non-partitioned version should be used.
Reported-by: [email protected]
Discussion: https://postgr.es/m/
158089540905.1098.
15071165437284409576@wrigleys.postgresql.org
Backpatch-through: 10
Tom Lane [Tue, 31 Mar 2020 21:06:22 +0000 (17:06 -0400)]
Fix race condition in statext_store().
Must hold some lock on the pg_statistic_ext_data catalog *before*
we look up the tuple we aim to replace. Otherwise a concurrent
VACUUM FULL or similar operation could move it to a different TID,
leaving us trying to replace the wrong tuple.
Back-patch to v12 where this got broken.
Credit goes to Dean Rasheed; I'm just doing the clerical work.
Discussion: https://postgr.es/m/CAEZATCU0zHMDiQV0g8P2U+YSP9C1idUPrn79DajsbonwkN0xvQ@mail.gmail.com
Bruce Momjian [Tue, 31 Mar 2020 20:31:44 +0000 (16:31 -0400)]
doc: adjust UPDATE/DELETE's FROM/USING to match SELECT's FROM
Previously the syntax and wording were unclear.
Reported-by: Alexey Bashtanov
Discussion: https://postgr.es/m/
968d4724-8e58-788f-7c45-
f7b1813824cc@imap.cc
Backpatch-through: 9.5
Tom Lane [Tue, 31 Mar 2020 20:09:17 +0000 (16:09 -0400)]
Still another try at stabilizing stats_ext test results.
The stats_ext test is not expecting that autovacuum will touch
any of its tables; an expectation falsified by commit
b07642dbc.
Although I'm suspicious that there's something else going on that
makes extended stats estimates not 100% reproducible, it's pretty
easy to demonstrate that there are places in this test that fail
if an autovacuum updates the table's stats unexpectedly.
Hence, revert the band-aid changes made by
2dc16efed and
24566b359
in favor of summarily disabling autovacuum for all the tables that
this test checks estimated rowcounts for.
Also remove an evidently obsolete comment at the head of the test.
Discussion: https://postgr.es/m/15012.
1585623298@sss.pgh.pa.us
Alvaro Herrera [Tue, 31 Mar 2020 19:37:24 +0000 (16:37 -0300)]
Remove header noise from test_decoding test
Use psql's expanded output to avoid a pointless header.
Kyotaro Horiguchi, after an idea of Michael Paquier
Discussion: https://postgr.es/m/
20181120050744[email protected]
Fujii Masao [Tue, 31 Mar 2020 18:35:13 +0000 (03:35 +0900)]
Improve the message logged when recovery is paused.
When recovery target is reached and recovery is paused because of
recovery_target_action=pause, executing pg_wal_replay_resume() causes
the standby to promote, i.e., the recovery to end. So, in this case,
the previous message "Execute pg_wal_replay_resume() to continue"
logged was confusing because pg_wal_replay_resume() doesn't cause
the recovery to continue.
This commit improves the message logged when recovery is paused,
and the proper message is output based on what (pg_wal_replay_pause
or recovery_target_action) causes recovery to be paused.
Author: Sergei Kornilov, revised by Fujii Masao
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/
19168211580382043@myt5-
b646bde4b8f3.qloud-c.yandex.net
Bruce Momjian [Tue, 31 Mar 2020 18:17:32 +0000 (14:17 -0400)]
Allow ecpg to be built stand-alone, allow parallel libpq make
This change defines SHLIB_PREREQS for the libpgport dependency, rather
than using a makefile rule. This was broken in PG 12.
Reported-by: Filip Janus
Discussion: https://postgr.es/m/
[email protected]
Author: Dagfinn Ilmari Mannsåker (for libpq)
Backpatch-through: 12