Home > mailing lists

Re: [HACKERS] make async slave to wait for lsn to be replayed - Mailing list pgsql-hackers

From	Kartyshov Ivan
Subject	Re: [HACKERS] make async slave to wait for lsn to be replayed
Date	March 22, 2024 13:45:40
Msg-id	[email protected] Whole thread Raw
Responses	Re: [HACKERS] make async slave to wait for lsn to be replayed Re: [HACKERS] make async slave to wait for lsn to be replayed
List	pgsql-hackers

Tree view

On 2024-03-20 12:11, Alexander Korotkov wrote:
> On Wed, Mar 20, 2024 at 12:34 AM Kartyshov Ivan
> <[email protected]> wrote:
>> > 4.2 With an unreasonably high future LSN, BEGIN command waits
>> > unboundedly, shouldn't we check if the specified LSN is more than
>> > pg_last_wal_receive_lsn() error out?
> 
> I think limiting wait lsn by current received lsn would destroy the
> whole value of this feature.  The value is to wait till given LSN is
> replayed, whether it's already received or not.

Ok sounds reasonable, I`ll rollback the changes.

> But I don't see a problem here. On the replica, it's out of our
> control to check which lsn is good and which is not.  We can't check
> whether the lsn, which is in future for the replica, is already issued
> by primary.
> 
> For the case of wrong lsn, which could cause potentially infinite
> wait, there is the timeout and the manual query cancel.

Fully agree with this take.

>> > 4.3 With an unreasonably high wait time, BEGIN command waits
>> > unboundedly, shouldn't we restrict the wait time to some max
> value,
>> > say a day or so?
>> > SELECT pg_last_wal_receive_lsn() + 1 AS future_receive_lsn \gset
>> > BEGIN AFTER :'future_receive_lsn' WITHIN 100000;
>> 
>> Good idea, I put it 1 day. But this limit we should to discuss.
> 
> Do you think that specifying timeout in milliseconds is suitable?  I
> would prefer to switch to seconds (with ability to specify fraction of
> second).  This was expressed before by Alexander Lakhin.

It sounds like an interesting idea. Please review the result.

>> > https://github.com/macdice/redo-bench or similar tools?
> 
> Ivan, could you do this?

Yes, test redo-bench/crash-recovery.sh
This patch on master
91.327, 1.973
105.907, 3.338
98.412, 4.579
95.818, 4.19

REL_13-STABLE
116.645, 3.005
113.212, 2.568
117.644, 3.183
111.411, 2.782

master
124.712, 2.047
117.012, 1.736
116.328, 2.035
115.662, 1.797

Strange behavior, patched version is faster then REL_13-STABLE and 
master.

> I don't see this change in the patch.  Normally if a process gets a
> signal, that causes WaitLatch() to exit immediately.  It also exists
> immediately on query cancel.  IIRC, this 1 minute timeout is needed to
> handle some extreme cases when an interrupt is missing.  Other places
> have it equal to 1 minute. I don't see why we should have it
> different.

Ok, I`ll rollback my changes.

>> 4) added and expanded sections in the documentation
> 
> I don't see this in the patch.  I see only a short description in
> func.sgml, which is definitely not sufficient.  We need at least
> everything we have in the docs before to be adjusted with the current
> approach of procedure.

I didn't find another section where to add the description of 
pg_wait_lsn().
So I extend description on the bottom of the table.

>> 5) add default variant of timeout
>> pg_wait_lsn(trg_lsn pg_lsn, delay int8 DEFAULT 0)
>> example: pg_wait_lsn('0/31B1B60') equal pg_wait_lsn('0/31B1B60', 0)
> 
> Does zero here mean no timeout?  I think this should be documented.
> Also, I would prefer to see the timeout by default.  Probably one
> minute would be good for default.

Lets discuss this point. Loop in function WaitForLSN is made that way,
if we choose delay=0, only then we can wait infinitely to wait LSN
without timeout. So default must be 0.

Please take one more look on the patch.

-- 
Ivan Kartyshov
Postgres Professional: www.postgrespro.com

Attachment

v14_0001-Procedure-wait-lsn.patch

pgsql-hackers by date:

From: Sergey Prokhorenko
Date: 22 March 2024, 13:42:20
Subject: Re: UUID v7

From: Bertrand Drouvot
Date: 22 March 2024, 13:47:34
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation

Re: [HACKERS] make async slave to wait for lsn to be replayed - Mailing list pgsql-hackers

Attachment

Previous

Next