Home > mailing lists

Re: WIP: Covering + unique indexes. - Mailing list pgsql-hackers

From	Anastasia Lubennikova
Subject	Re: WIP: Covering + unique indexes.
Date	February 2, 2016 12:50:41
Msg-id	[email protected] Whole thread Raw
In response to	Re: WIP: Covering + unique indexes. (David Rowley <[email protected]>)
Responses	Re: WIP: Covering + unique indexes. Re: WIP: Covering + unique indexes.
List	pgsql-hackers

Tree view

31.01.2016 11:04, David Rowley:
> On 27 January 2016 at 03:35, Anastasia Lubennikova
> <[email protected]> wrote:
>> including_columns_3.0 is the latest version of patch.
>> And changes regarding the previous version are attached in a separate patch.
>> Just to ease the review and debug.
> Hi,
>
> I've made another pass over the patch. There's still a couple of
> things that I think need to be looked at.
Thank you again.
I just write here to say that I do not disappear and I do remember about 
the issue.
But I'm very very busy this week. I'll send an updated patch next week 
as soon as possible.

> Do we need the "b (included)" here? The key is (a) = (1). Having
> irrelevant details might be confusing.
>
> postgres=# create table a (a int not null, b int not null);
> CREATE TABLE
> postgres=# create unique index on a (a) including(b);
> CREATE INDEX
> postgres=# insert into a values(1,1);
> INSERT 0 1
> postgres=# insert into a values(1,1);
> ERROR:  duplicate key value violates unique constraint "a_a_b_idx"
> DETAIL:  Key (a, b (included))=(1, 1) already exists.
I thought that it could be strange if user inserts two values and then 
sees only one of them in error message.
But now I see that you're right. I'll also look at the same functional 
in other DBs and fix it.

> In index_reform_tuple() I find it a bit scary that you change the
> TupleDesc's number of attributes then set it back again once you're
> finished reforming the shortened tuple.
> Maybe it would be better to modify index_form_tuple() to accept a new
> argument with a number of attributes, then you can just Assert that
> this number is never higher than the number of attributes in the
> TupleDesc.
Good point.
I agree that this function is a bit strange. I have to set 
tupdesc->nattrs to support compatibility with index_form_tuple().
I didn't want to add neither a new field to tupledesc nor a new 
parameter to index_form_tuple(), because they are used widely.
> I'm also not that keen on index_reform_tuple() in general. I wonder if
> there's a way we can just keep the Datum/isnull arrays a bit longer,
> and only form the tuple when needed. I've not looked into this in
> detail, but it does look like reforming the tuple is not going to be
> cheap.
It is used in splits, for example. There is no datum array, we just move 
tuple key from a child page to a parent page or something like that.
And according to INCLUDING algorithm we need to truncate nonkey attributes.
> If we do need to keep this function, I think a better name might be
> index_trim_tuple() and I don't think you need to pass the original
> length. It might make sense to Assert() that the trim length is
> smaller than the tuple size

As regards the performance, I don't think that it's a big problem here.
Do you suggest to do it in a following way memcpy(oldtup, newtup, 
newtuplength)?
I will
> in gistrescan() there is some code:
>
> for (attno = 1; attno <= natts; attno++)
> {
> TupleDescInitEntry(so->giststate->fetchTupdesc, attno, NULL,
>    scan->indexRelation->rd_opcintype[attno - 1],
>    -1, 0);
> }
>
> Going by RelationInitIndexAccessInfo() rd_opcintype[] is allocated to
> be sized by the number of key columns, but this loop goes over the
> number of attribute columns.
> Perhaps this is not a big problem since GIST does not support
> INCLUDING columns, but it does seem wrong still.

GiST doesn't support INCLUDING clause, so natts and nkeyatts are always 
equal. I don't see any problem here.
And I think that it's an extra work to this patch. Maybe I or someone 
else would add this feature to other access methods later.
> Which brings me to the fact that I've spent a bit of time trying to
> look for places where you've forgotten to change natts to nkeyatts. I
> did find this one, but I don't have much confidence that there's not
> lots more places that have been forgotten. Apart from this one, how
> confident are you that you've found all the places? I'm getting
> towards being happy with the code that I see that's been changed, but
> I'm hesitant to mark as "Ready for committer" due to not being all
> that comfortable that all the code that needs to be updated has been
> updated. I'm not quite sure of a good way to find all these places.
I found all mentions of natts and other related variables with grep, and 
replaced (or expand) them with nkeyatts where it was necessary.
As mentioned before, I didn't change other AMs.
I strongly agree that any changes related to btree require thorough 
inspection, so I'll recheck it again. But I'm almost sure that it's okay.

> I wondering if hacking the code so that each btree index which is
> created with > 1 column puts all but the first column into the
> INCLUDING columns, then run the regression tests to see if there are
> any crashes. I'm really not that sure of how else to increase the
> confidence levels on this. Do you have ideas?

Do I understand correctly that you suggest to replace all multicolumn 
indexes with (1key column) + included?
I don't think it's a good idea. INCLUDING clause brings some 
disadvantages. For example, included columns must be filtered after the 
search, while key columns could be used in scan key directly. I already 
mentioned this in test example:

explain analyze select c1, c2 from tbl where c1<10000 and c3<20;
If columns' opclasses are used, new query plan uses them in  Index Cond: 
((c1 < 10000) AND (c3 < 20))
Otherwise, new query can not use included column in Index Cond and uses 
filter instead:
Index Cond: (c1 < 10000)
Filter: (c3 < 20)
Rows Removed by Filter: 9993
It slows down the query significantly.

And besides that, we still want to have multicolumn unique indexes. 
CREATE UNIQUE INDEX on tbl (a, b, c) INCLUDING (d);

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

pgsql-hackers by date:

From: Artur Zakirov
Date: 02 February 2016, 12:45:20
Subject: Re: Fuzzy substring searching with the pg_trgm extension

From: Alvaro Herrera
Date: 02 February 2016, 13:02:59
Subject: Re: WIP: Covering + unique indexes.

Re: WIP: Covering + unique indexes. - Mailing list pgsql-hackers

Previous

Next