On 2020-06-09 20:19, Andres Freund wrote:
> Hi,
>
> On 2020-01-28 17:01:14 +0900, Michael Paquier wrote:
>> So attached is an updated patch which addresses the problem just by
>> marking a physical slot as dirty if any advancing is done. Some
>> documentation is added about the fact that an advance is persistent
>> only at the follow-up checkpoint. And the tests are fixed to not use
>> a fake LSN but instead advance to the latest LSN position produced.
>
>>
>> - /* Update the on disk state when lsn was updated. */
>> - if (XLogRecPtrIsInvalid(endlsn))
>> - {
>> - ReplicationSlotMarkDirty();
>> - ReplicationSlotsComputeRequiredXmin(false);
>> - ReplicationSlotsComputeRequiredLSN();
>> - ReplicationSlotSave();
>> - }
>> -
>
> I am quite confused by the wholesale removal of these lines. That
> wasn't
> in previous versions of the patch. As far as I can tell not calling
> ReplicationSlotsComputeRequiredLSN() for the physical slot leads to the
> global minimum LSN never beeing advanced, and thus WAL reserved by the
> slot not being removable. Only if there's some independent call to
> ReplicationSlotsComputeRequiredLSN() XLogSetReplicationSlotMinimumLSN()
> will be called, allowing for slots to advance.
>
> I realize this stuff has been broken since the introduction in
> 9c7d06d6068 (due to the above being if (XLogRecPtrIsInvalid()) rather
> than if (!XLogRecPtrIsInvalid()) , but this seems to make it even
> worse?
>
Yes, there was a ReplicationSlotsComputeRequiredLSN() call inside
pg_physical_replication_slot_advance() in the v5 of the patch:
@@ -370,6 +370,11 @@ pg_physical_replication_slot_advance(XLogRecPtr
moveto)
MyReplicationSlot->data.restart_lsn = moveto;
SpinLockRelease(&MyReplicationSlot->mutex);
retlsn = moveto;
+
+ ReplicationSlotMarkDirty();
+
+ /* We moved retart_lsn, update the global value. */
+ ReplicationSlotsComputeRequiredLSN();
But later it has been missed and we have not noticed that.
I think that adding it back as per attached will be enough.
>
> I find it really depressing how much obviously untested stuff gets
> added in this area.
>
Prior to this patch pg_replication_slot_advance was not being tested at
all. Unfortunately, added tests appeared to be not enough to cover all
cases. It seems that the whole machinery of WAL holding and trimming is
worth to be tested more thoroughly.
Regards
--
Alexey Kondratov
Postgres Professional https://www.postgrespro.com
Russian Postgres Company