From: | Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Oleg Bartunov <obartunov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Filip Rembiałkowski <filip(dot)rembialkowski(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru> |
Subject: | Re: fix for BUG #3720: wrong results at using ltree |
Date: | 2020-03-27 22:43:35 |
Message-ID: | [email protected] |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 24.01.2020 21:29, Tomas Vondra wrote:
> Hi Nikita,
>
> This patch seems inactive / stuck in "waiting on author" since November.
> It's marked as bugfix, so it'd be good to get it committed instead of
> just punting it to the next CF.
>
> I did a quick review, and I came mostly with the same two complaints as
> Alexander ...
>
> On Wed, Jul 17, 2019 at 09:33:46PM +0300, Alexander Korotkov wrote:
>> Hi Nikita,
>>
>> On Tue, Jul 16, 2019 at 6:52 PM Nikita Glukhov
>> <n(dot)gluhov(at)postgrespro(dot)ru> wrote:
>>> I looked at "ltree syntax improvement" patch and found two more very
>>> old bugs in ltree/lquery (fixes are attached):
>>
>> Thank you for the fixes. I've couple notes on them.
>>
>> 0001-Fix-max-size-checking-for-ltree-and-lquery.patch
>>
>> +#define LTREE_MAX_LEVELS Min(PG_UINT16_MAX, MaxAllocSize /
>> sizeof(nodeitem))
>> +#define LQUERY_MAX_LEVELS Min(PG_UINT16_MAX, MaxAllocSize / ITEMSIZE)
>>
>> Looks over caution. PG_UINT16_MAX is not even close to MaxAllocSize /
>> sizeof(nodeitem) or MaxAllocSize / ITEMSIZE.
>>
>
> Yeah, I'm also puzzled by the usage of PG_UINT16_MAX here. It's so much
> lower than the other values we could jut use the constant directly, but
> let's say the structs could grow from the ~16B to chnge this.
Ok, LTREE_MAX_LEVELS and LQUERY_MAX_LEVELS are defined simply as PG_UINT16_MAX now.
>
> The main question is why we need PG_UINT16_MAX at all? It kinda implies
> we need to squish the value into a 2B counter or something, but is that
> actually true? I don't see anything like that in ltree_io.c.
ltree.numlevel and lquery.numlevel are uint16 fields.
I also found two places in ltree_concat() where numlevel can overflow.
The first is ltree_concat() (operator ||(ltree, ltree)):
=# SELECT nlevel(('a' || repeat('.a', 65533))::ltree || 'a');
nlevel
--------
65535
(1 row)
=# SELECT nlevel(('a' || repeat('.a', 65534))::ltree || 'a');
nlevel
--------
0
(1 row)
The second is parsing of low and high level limits in lquery_in():
=# SELECT '*{65535}'::lquery;
lquery
----------
*{65535}
(1 row)
=# SELECT '*{65536}'::lquery;
lquery
--------
*{0}
(1 row)
=# SELECT '*{65537}'::lquery;
lquery
--------
*{1}
(1 row)
The both problems are fixed in the new version of the patch.
> So it seems more like an arbitrary value considered "sane" - which is
> fine, but then a comment saying so would be nice, and we could pick a
> value that is "nicer" for humans. Or just use value computed from the
> MaxAllocSize limit, without the Min().
>
>> 0002-Fix-successive-lquery-ops.patch
>>
>> diff --git a/contrib/ltree/lquery_op.c b/contrib/ltree/lquery_op.c
>> index 62172d5..d4f4941 100644
>> --- a/contrib/ltree/lquery_op.c
>> +++ b/contrib/ltree/lquery_op.c
>> @@ -255,8 +255,8 @@ checkCond(lquery_level *curq, int query_numlevel,
>> ltree_level *curt, int tree_nu
>> }
>> else
>> {
>> - low_pos = cur_tpos + curq->low;
>> - high_pos = cur_tpos + curq->high;
>> + low_pos = Min(low_pos + curq->low, PG_UINT16_MAX);
>> + high_pos = Min(high_pos + curq->high, PG_UINT16_MAX);
>> if (ptr && ptr->q)
>> {
>> ptr->nq++;
>> @@ -282,8 +282,8 @@ checkCond(lquery_level *curq, int query_numlevel,
>> ltree_level *curt, int tree_nu
>> }
>> else
>> {
>> - low_pos = cur_tpos + curq->low;
>> - high_pos = cur_tpos + curq->high;
>> + low_pos = Min(low_pos + curq->low, PG_UINT16_MAX);
>> + high_pos = Min(high_pos + curq->high, PG_UINT16_MAX);
>> }
>>
>> curq = LQL_NEXT(curq);
>>
>> I'm not sure what do these checks do. Code around is uncommented and
>> puzzled. But could we guarantee the same invariant on the stage of
>> ltree/lquery parsing?
>>
>
> Unfortunately, the current code is somewhat undercommented :-(
The main problem is that no one really understands how it works now.
low_pos and high_pos seem to be a range of tree levels, from which is allowed
to match the rest of lquery.
For example, when we are matching '.b' in the 'a.*{2,3}.*{4,5}.b'::lquery,
low_pos = 1 + 2 + 4 = 7 and high_pos = 1 + 3 + 5 = 9.
The main goal of the patch is to fix calculation of low_pos and high_pos:
- low_pos = cur_tpos + curq->low;
- high_pos = cur_tpos + curq->high;
+ low_pos = low_pos + curq->low;
+ high_pos = high_pos + curq->high;
>
> Anyway, I don't quite understand why we need these caps. It kinda seems
> like a band-aid for potential overflow.
>
> Why should it be OK for the values to even get past the maximum, with
> sane input data? And isn't there a better upper limit (e.g. based on
> how much space we actually allocated)?
We can compare low_pos to tree_numlevel and return false earlier, if it is
greater. And it seems that high_pos we can also limit to tree_numlevel.
--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Fix-max-size-checking-for-ltree-and-lquery.patch | text/x-patch | 7.6 KB |
v2-0002-Fix-successive-lquery-ops.patch | text/x-patch | 2.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Fabien COELHO | 2020-03-27 22:53:03 | Re: psql FETCH_COUNT feature does not work with combined queries |
Previous Message | David Steele | 2020-03-27 22:42:49 | Re: pgbench - rework variable management |