Fix "missing continuation record" after standby promotion
authorAlvaro Herrera <[email protected]>
Wed, 23 Mar 2022 17:22:10 +0000 (18:22 +0100)
committerAlvaro Herrera <[email protected]>
Wed, 23 Mar 2022 17:22:10 +0000 (18:22 +0100)
Invalidate abortedRecPtr and missingContrecPtr after a missing
continuation record is successfully skipped on a standby. This fixes a
PANIC caused when a recently promoted standby attempts to write an
OVERWRITE_RECORD with an LSN of the previously read aborted record.

Backpatch to 10 (all stable versions).

Author: Sami Imseih <[email protected]>
Reviewed-by: Kyotaro Horiguchi <[email protected]>
Reviewed-by: Álvaro Herrera <[email protected]>
Discussion: https://postgr.es/m/44D259DE-7542-49C4-8A52-2AB01534DCA9@amazon.com

src/backend/access/transam/xlog.c
src/test/recovery/t/026_overwrite_contrecord.pl

index 8e8bdde7646f728326363da21c0b8721dde925e9..6772e248222165cecadb7c76cb963e7cf2b2a71f 100644 (file)
@@ -10261,6 +10261,10 @@ VerifyOverwriteContrecord(xl_overwrite_contrecord *xlrec, XLogReaderState *state
             (uint32) (state->overwrittenRecPtr >> 32),
             (uint32) state->overwrittenRecPtr);
 
+   /* We have safely skipped the aborted record */
+   abortedRecPtr = InvalidXLogRecPtr;
+   missingContrecPtr = InvalidXLogRecPtr;
+
    ereport(LOG,
            (errmsg("successfully skipped missing contrecord at %X/%X, overwritten at %s",
                    (uint32) (xlrec->overwritten_lsn >> 32),
index 57b2a6b7fb92aff5e4e36e1c26f6ae6a34f3ffc5..50902e59a56b4c5940a8eec1a3cd4fef920ef01e 100644 (file)
@@ -15,7 +15,7 @@ plan tests => 3;
 # Test: Create a physical replica that's missing the last WAL file,
 # then restart the primary to create a divergent WAL file and observe
 # that the replica replays the "overwrite contrecord" from that new
-# file.
+# file and the standby promotes successfully.
 
 my $node = PostgresNode->get_new_node('primary');
 $node->init(allows_streaming => 1);
@@ -105,5 +105,8 @@ like(
    qr[successfully skipped missing contrecord at],
    "found log line in standby");
 
+# Verify promotion is successful
+$node_standby->promote;
+
 $node->stop;
 $node_standby->stop;