Home > mailing lists

Re: [PATCH] Allow to specify restart_lsn inpg_create_physical_replication_slot() - Mailing list pgsql-hackers

From	Alexey Kondratov
Subject	Re: [PATCH] Allow to specify restart_lsn inpg_create_physical_replication_slot()
Date	June 19, 2020 14:20:11
Msg-id	[email protected] Whole thread Raw
In response to	Re: [PATCH] Allow to specify restart_lsn inpg_create_physical_replication_slot() (Michael Paquier <[email protected]>)
Responses	Re: [PATCH] Allow to specify restart_lsn inpg_create_physical_replication_slot()
List	pgsql-hackers

Tree view

On 2020-06-19 03:59, Michael Paquier wrote:
> On Thu, Jun 18, 2020 at 03:39:09PM +0300, Vyacheslav Makarov wrote:
>> If the WAL segment for the specified restart_lsn (STOP_LSN of the 
>> backup)
>> exists, then the function will create a physical replication slot and 
>> will
>> keep all the WAL segments required by the replica to catch up with the
>> primary. Otherwise, it returns error, which means that the required 
>> WAL
>> segments have been already utilised, so we do need to take a new 
>> backup.
>> Without passing this newly added parameter
>> pg_create_physical_replication_slot() works as before.
>> 
>> What do you think about this?
> 
> I think that this was discussed in the past (perhaps one of the
> threads related to WAL advancing actually?),
> 

I have searched through the archives a bit and found one thread related 
to slots advancing [1]. It was dedicated to a problem of advancing slots 
which do not reserve WAL yet, if I get it correctly. Although it is 
somehow related to the topic, it was a slightly different issue, IMO.

> 
> and this stuff is full of
> holes when it comes to think about error handling with checkpoints
> running in parallel, potentially doing recycling of segments you would
> expect to be around based on your input value for restart_lsn *while*
> pg_create_physical_replication_slot() is still running and
> manipulating the on-disk slot information. I suspect that this also
> breaks a couple of assumptions behind concurrent calls of the minimum
> LSN calculated across slots when a caller sees fit to recompute the
> thresholds (WAL senders mainly here, depending on the replication
> activity).
> 

These are the right concerns, but all of them should be applicable to 
the pg_create_physical_replication_slot() + immediately_reserve == true 
in the same way, doesn't it? I think so, since in that case we are doing 
a pretty similar thing — trying to reserve some WAL segment that may be 
concurrently deleted.

And this is exactly the reason why ReplicationSlotReserveWal() does it 
in several steps in a loop:

1. Creates a slot with some restart_lsn.
2. Does ReplicationSlotsComputeRequiredLSN() to prevent removal of the 
WAL segment with this restart_lsn.
3. Checks that required WAL segment is still there.
4. Repeat if this attempt to prevent WAL removal has failed.

I guess that the only difference in the case of proposed scenario is 
that we do not have a chance for step 4, since we do need some specific 
restart_lsn, not any recent restart_lsn, i.e. in this case we have to:

1. Create a slot with restart_lsn specified by user.
2. Do ReplicationSlotsComputeRequiredLSN() to prevent WAL removal.
3. Check that required WAL segment is still there and report ERROR to 
the user if it is not.

I have eyeballed the attached patch and it looks like doing exactly the 
same, so issues with concurrent deletion are not obvious for me. Or, 
there are should be the same issues for 
pg_create_physical_replication_slot() + immediately_reserve == true with 
current master implementation.

[1] 
https://www.postgresql.org/message-id/flat/20180626071305.GH31353%40paquier.xyz

Regards
-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

pgsql-hackers by date:

From: Tom Lane
Date: 19 June 2020, 14:00:51
Subject: Re: doing something about the broken dynloader.h symlink

From: Etsuro Fujita
Date: 19 June 2020, 14:58:02
Subject: Re: [POC] Fast COPY FROM command for the table with foreign partitions

Re: [PATCH] Allow to specify restart_lsn inpg_create_physical_replication_slot() - Mailing list pgsql-hackers

Previous

Next