Fix data loss on crash after sorted GiST index build.
authorHeikki Linnakangas <[email protected]>
Thu, 24 Feb 2022 14:15:12 +0000 (16:15 +0200)
committerHeikki Linnakangas <[email protected]>
Thu, 24 Feb 2022 14:15:12 +0000 (16:15 +0200)
If a checkpoint happens during sorted GiST index build, and the system
crashes after the checkpoint and after the index build has finished,
the data written to the index before the checkpoint started could be
lost. The checkpoint won't fsync it, and it won't be replayed at crash
recovery either. Fix by calling smgrimmedsync() after the index build,
just like in B-tree index build.

Backpatch to v14 where the sorted GiST index build was introduced.

Reported-by: Melanie Plageman
Discussion: https://www.postgresql.org/message-id/CAAKRu_ZJJynimxKj5xYBSziL62-iEtPE+fx-B=JzR=jUtP92mw@mail.gmail.com

src/backend/access/gist/gistbuild.c

index 4db896a533daa8e0004d67a0b29be3ccf9da91cd..e081e6571a4a2361fe2632df15b61b6bf713946e 100644 (file)
@@ -467,6 +467,18 @@ gist_indexsortbuild(GISTBuildState *state)
 
    pfree(levelstate->pages[0]);
    pfree(levelstate);
+
+   /*
+    * When we WAL-logged index pages, we must nonetheless fsync index files.
+    * Since we're building outside shared buffers, a CHECKPOINT occurring
+    * during the build has no way to flush the previously written data to
+    * disk (indeed it won't know the index even exists).  A crash later on
+    * would replay WAL from the checkpoint, therefore it wouldn't replay our
+    * earlier WAL entries. If we do not fsync those pages here, they might
+    * still not be on disk when the crash occurs.
+    */
+   if (RelationNeedsWAL(state->indexrel))
+       smgrimmedsync(RelationGetSmgr(state->indexrel), MAIN_FORKNUM);
 }
 
 /*