Be more careful about barriers when releasing BackgroundWorkerSlots.
authorTom Lane <[email protected]>
Sat, 15 May 2021 16:21:06 +0000 (12:21 -0400)
committerTom Lane <[email protected]>
Sat, 15 May 2021 16:21:06 +0000 (12:21 -0400)
ForgetBackgroundWorker lacked any memory barrier at all, while
BackgroundWorkerStateChange had one but unaccountably did
additional manipulation of the slot after the barrier.  AFAICS,
the rule must be that the barrier is immediately before setting
or clearing slot->in_use.

It looks like back in 9.6 when ForgetBackgroundWorker was first
written, there might have been some case for not needing a
barrier there, but I'm not very convinced of that --- the fact
that the load of bgw_notify_pid is in the caller doesn't seem
to guarantee no memory ordering problem.  So patch 9.6 too.

It's likely that this doesn't fix any observable bug on Intel
hardware, but machines with weaker memory ordering rules could
have problems here.

Discussion: https://postgr.es/m/4046084.1620244003@sss.pgh.pa.us

src/backend/postmaster/bgworker.c

index 9754c370fc219dd7d39a9c061c76068eb12c2859..cacd8fe7a9bd62cf21e08b93576dc64d941bccfd 100644 (file)
@@ -290,9 +290,11 @@ BackgroundWorkerStateChange(void)
             * bgw_notify_pid completes before the store to in_use.
             */
            notify_pid = slot->worker.bgw_notify_pid;
-           pg_memory_barrier();
            slot->pid = 0;
+
+           pg_memory_barrier();
            slot->in_use = false;
+
            if (notify_pid != 0)
                kill(notify_pid, SIGUSR1);
 
@@ -378,6 +380,8 @@ BackgroundWorkerStateChange(void)
  * points to it.  This convention allows deletion of workers during
  * searches of the worker list, and saves having to search the list again.
  *
+ * Caller is responsible for notifying bgw_notify_pid, if appropriate.
+ *
  * This function must be invoked only in the postmaster.
  */
 void
@@ -390,6 +394,13 @@ ForgetBackgroundWorker(slist_mutable_iter *cur)
 
    Assert(rw->rw_shmem_slot < max_worker_processes);
    slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
+   Assert(slot->in_use);
+
+   /*
+    * This memory barrier might not be completely necessary, but let's be
+    * sure.
+    */
+   pg_memory_barrier();
    slot->in_use = false;
 
    ereport(DEBUG1,