SKIP LOCKED assert triggered

Lists: pgsql-hackers
From: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: SKIP LOCKED assert triggered
Date: 2021-11-12 16:55:17
Message-ID: CANbhV-FeEwMnN8yuMyss7if1ZKjOKfjcgqB26n8pqu1e=q0ebg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

The combination of these two statements in a transaction hits an
Assert in heapam.c at line 4770 on REL_14_STABLE

BEGIN;
SELECT * FROM queue LIMIT 1 FOR UPDATE SKIP LOCKED;
...
UPDATE queue SET status = 'UPDATED' WHERE id = :id;
COMMIT;

pgbench reliably finds this, running from inside a PL/pgSQL function.

Alvaro suggests we just ignore the Assert, which on testing appears to
be the right approach. Patch attached, which solves it for me.

There is no formal test that does lock then update, so I have
attempted to create one, but this has not successfully reproduced it
(attached anyway), but this code is different from the pgbench test.

--
Simon Riggs http://www.EnterpriseDB.com/

Attachment Content-Type Size
skip_locked_assert.v1.patch application/octet-stream 629 bytes
skip_locked_then_update_test.v1.patch application/octet-stream 54.5 KB

From: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
To: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: SKIP LOCKED assert triggered
Date: 2021-12-01 19:33:05
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On 11/12/21, 8:56 AM, "Simon Riggs" <simon(dot)riggs(at)enterprisedb(dot)com> wrote:
> The combination of these two statements in a transaction hits an
> Assert in heapam.c at line 4770 on REL_14_STABLE

I've been unable to reproduce this. Do you have any tips for how to
do so? Does there need to be some sort of concurrent workload?

Nathan


From: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
To: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: SKIP LOCKED assert triggered
Date: 2021-12-01 19:38:22
Message-ID: CANbhV-Ek8nbeyOxGVUgpnj6TFwj8aZDJsQf-8z-FP1Sx3vK6FA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 1 Dec 2021 at 14:33, Bossart, Nathan <bossartn(at)amazon(dot)com> wrote:
>
> On 11/12/21, 8:56 AM, "Simon Riggs" <simon(dot)riggs(at)enterprisedb(dot)com> wrote:
> > The combination of these two statements in a transaction hits an
> > Assert in heapam.c at line 4770 on REL_14_STABLE
>
> I've been unable to reproduce this. Do you have any tips for how to
> do so? Does there need to be some sort of concurrent workload?

That path is only ever taken when there are multiple sessions, and as
I said, pgbench finds this reliably. I guess I didn't say "use -c 2"

--
Simon Riggs http://www.EnterpriseDB.com/


From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
Cc: "Bossart, Nathan" <bossartn(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SKIP LOCKED assert triggered
Date: 2022-01-03 22:27:42
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On 2021-Dec-01, Simon Riggs wrote:

> On Wed, 1 Dec 2021 at 14:33, Bossart, Nathan <bossartn(at)amazon(dot)com> wrote:
> >
> > On 11/12/21, 8:56 AM, "Simon Riggs" <simon(dot)riggs(at)enterprisedb(dot)com> wrote:
> > > The combination of these two statements in a transaction hits an
> > > Assert in heapam.c at line 4770 on REL_14_STABLE
> >
> > I've been unable to reproduce this. Do you have any tips for how to
> > do so? Does there need to be some sort of concurrent workload?
>
> That path is only ever taken when there are multiple sessions, and as
> I said, pgbench finds this reliably. I guess I didn't say "use -c 2"

Simon had sent me the pgbench scripts earlier, so I attach them here.
I don't actually get a crash with -c2 or -c3, but I do get almost
immediate crashes with -c4 and above. If I run it under "rr", it
doesn't occur either. I suspect the rr framework kills concurrency in
some way that hides the problem. I didn't find a way to reproduce it
with isolationtester. (If somebody wants to play with a debugger, I
find that it's much easier to reproduce by adding a short sleep after
the UpdateXmaxHintBits() call in line 4735; but that sleep occurs in a
session *other* than the one that dies. And under rr I still don't see
a crash with a sleep there; in fact the sleep doesn't seem to occur at
all, which is weird.)

The patch does fix the crasher under pgbench, and I think it makes sense
that you can get WouldBlock and yet have the tuple marked with
XMAX_INVALID: if transaction A is writing the tuple, and transaction B
is acquiring the tuple lock, then transaction C also tries to acquire
the tuple lock but that returns nay (because of B), then transaction A
completes, then transaction B could set the XMAX_INVALID flag in time
for C to have a seizure in its way out. So patching the assertion to
allow the case is correct.

What I don't understand is why hasn't this been reported already: this
bug is pretty old. My only explanation is that nobody runs sufficiently-
concurrent load with SKIP LOCKED in assert-enabled builds.

[1] https://www.postgresql.org/message-id/flat/CADLWmXUvd5Z%2BcFczi6Zj1WcTrXzipgP-wj0pZOWSaRUy%3DF0omQ%40mail.gmail.com

--
Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/

Attachment Content-Type Size
simon_setup.sql application/sql 498 bytes
simon_bench.sql application/sql 24 bytes

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
Cc: "Bossart, Nathan" <bossartn(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SKIP LOCKED assert triggered
Date: 2022-01-04 16:03:38
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On 2022-Jan-03, Alvaro Herrera wrote:

> What I don't understand is why hasn't this been reported already: this
> bug is pretty old. My only explanation is that nobody runs sufficiently-
> concurrent load with SKIP LOCKED in assert-enabled builds.

Pushed, thanks Simon for reporting this problem.

> [1] https://www.postgresql.org/message-id/flat/CADLWmXUvd5Z%2BcFczi6Zj1WcTrXzipgP-wj0pZOWSaRUy%3DF0omQ%40mail.gmail.com

Heh, I deleted a paragraph from my previous email and forgot to remove
the footnote that it referenced.

--
Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
"I love the Postgres community. It's all about doing things _properly_. :-)"
(David Garamond)


From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
Cc: "Bossart, Nathan" <bossartn(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SKIP LOCKED assert triggered
Date: 2022-01-04 16:13:23
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On 2022-Jan-04, Alvaro Herrera wrote:

> On 2022-Jan-03, Alvaro Herrera wrote:
>
> > What I don't understand is why hasn't this been reported already: this
> > bug is pretty old. My only explanation is that nobody runs sufficiently-
> > concurrent load with SKIP LOCKED in assert-enabled builds.
>
> Pushed, thanks Simon for reporting this problem.

Wow, what an embarrassing problem this fix has.

--
Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
"Pensar que el espectro que vemos es ilusorio no lo despoja de espanto,
sólo le suma el nuevo terror de la locura" (Perelandra, C.S. Lewis)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SKIP LOCKED assert triggered
Date: 2022-01-04 16:15:30
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> Pushed, thanks Simon for reporting this problem.

Umm ...

Assert(TM_WouldBlock || !(tuple->t_data->t_infomask & HEAP_XMAX_INVALID));

AFAICS, this assertion condition is constant-true,
because TM_WouldBlock is a nonzero constant. Perhaps you meant

Assert(result == TM_WouldBlock || !(tuple->t_data->t_infomask & HEAP_XMAX_INVALID));

?

I'd be inclined to format it more like the adjacent Assert, too.

regards, tom lane


From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SKIP LOCKED assert triggered
Date: 2022-01-04 16:55:59
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 04, 2022 at 11:15:30AM -0500, Tom Lane wrote:
> Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> > Pushed, thanks Simon for reporting this problem.
>
> Umm ...
>
> Assert(TM_WouldBlock || !(tuple->t_data->t_infomask & HEAP_XMAX_INVALID));
>
> AFAICS, this assertion condition is constant-true,

The new cfbot was failing like this

https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/36/3423
https://cirrus-ci.com/build/5839382107127808
[22:52:27.978] heapam.c:4754:24: error: converting the enum constant to a boolean [-Werror,-Wint-in-bool-context]


From: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SKIP LOCKED assert triggered
Date: 2022-01-04 20:09:37
Message-ID: CANbhV-GzirR680VMMgq+imuAH11UgTY_Q01Byq5j0O2kasc-mg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 4 Jan 2022 at 16:15, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> > Pushed, thanks Simon for reporting this problem.

And causing another; my bad, apologies.

> Umm ...
>
> Assert(TM_WouldBlock || !(tuple->t_data->t_infomask & HEAP_XMAX_INVALID));
>
> AFAICS, this assertion condition is constant-true,
> because TM_WouldBlock is a nonzero constant. Perhaps you meant
>
> Assert(result == TM_WouldBlock || !(tuple->t_data->t_infomask & HEAP_XMAX_INVALID));

Yes, I think that's what I meant.

--
Simon Riggs http://www.EnterpriseDB.com/