Re: WIP: Covering + unique indexes. - Mailing list pgsql-hackers
From | Anastasia Lubennikova |
---|---|
Subject | Re: WIP: Covering + unique indexes. |
Date | |
Msg-id | [email protected] Whole thread Raw |
In response to | Re: WIP: Covering + unique indexes. (Anastasia Lubennikova <[email protected]>) |
Responses |
Re: WIP: Covering + unique indexes.
|
List | pgsql-hackers |
02.02.2016 15:50, Anastasia Lubennikova: > 31.01.2016 11:04, David Rowley: >> On 27 January 2016 at 03:35, Anastasia Lubennikova >> <[email protected]> wrote: >>> including_columns_3.0 is the latest version of patch. >>> And changes regarding the previous version are attached in a >>> separate patch. >>> Just to ease the review and debug. >> Hi, >> >> I've made another pass over the patch. There's still a couple of >> things that I think need to be looked at. > Thank you again. > I just write here to say that I do not disappear and I do remember > about the issue. > But I'm very very busy this week. I'll send an updated patch next week > as soon as possible. > As promised, here's the new version of the patch "including_columns_4.0". I fixed all issues except some points mentioned below. Besides, I did some refactoring: - use macros IndexRelationGetNumberOfAttributes, IndexRelationGetNumberOfKeyAttributes where possible. Use macro RelationGetNumberOfAttributes. Maybe that's a bit unrelated changes, but it'll make development much easier in future. - rename related variables to indnatts, indnkeyatts. >> I'm also not that keen on index_reform_tuple() in general. I wonder if >> there's a way we can just keep the Datum/isnull arrays a bit longer, >> and only form the tuple when needed. I've not looked into this in >> detail, but it does look like reforming the tuple is not going to be >> cheap. > It is used in splits, for example. There is no datum array, we just > move tuple key from a child page to a parent page or something like that. > And according to INCLUDING algorithm we need to truncate nonkey > attributes. >> If we do need to keep this function, I think a better name might be >> index_trim_tuple() and I don't think you need to pass the original >> length. It might make sense to Assert() that the trim length is >> smaller than the tuple size > > As regards the performance, I don't think that it's a big problem here. > Do you suggest to do it in a following way memcpy(oldtup, newtup, > newtuplength)? I've tested it some more, and still didn't find any performance issues. >> in gistrescan() there is some code: >> >> for (attno = 1; attno <= natts; attno++) >> { >> TupleDescInitEntry(so->giststate->fetchTupdesc, attno, NULL, >> scan->indexRelation->rd_opcintype[attno - 1], >> -1, 0); >> } >> >> Going by RelationInitIndexAccessInfo() rd_opcintype[] is allocated to >> be sized by the number of key columns, but this loop goes over the >> number of attribute columns. >> Perhaps this is not a big problem since GIST does not support >> INCLUDING columns, but it does seem wrong still. > > GiST doesn't support INCLUDING clause, so natts and nkeyatts are > always equal. I don't see any problem here. > And I think that it's an extra work to this patch. Maybe I or someone > else would add this feature to other access methods later. Still the same. >> Which brings me to the fact that I've spent a bit of time trying to >> look for places where you've forgotten to change natts to nkeyatts. I >> did find this one, but I don't have much confidence that there's not >> lots more places that have been forgotten. Apart from this one, how >> confident are you that you've found all the places? I'm getting >> towards being happy with the code that I see that's been changed, but >> I'm hesitant to mark as "Ready for committer" due to not being all >> that comfortable that all the code that needs to be updated has been >> updated. I'm not quite sure of a good way to find all these places. > I found all mentions of natts and other related variables with grep, > and replaced (or expand) them with nkeyatts where it was necessary. > As mentioned before, I didn't change other AMs. > I strongly agree that any changes related to btree require thorough > inspection, so I'll recheck it again. But I'm almost sure that it's okay. > I rechecked everything again and fixed couple of omissions. Thank you for being exacting reviewer) I don't know how to ensure that everything is ok, but I have no idea what else I can do. >> I wondering if hacking the code so that each btree index which is >> created with > 1 column puts all but the first column into the >> INCLUDING columns, then run the regression tests to see if there are >> any crashes. I'm really not that sure of how else to increase the >> confidence levels on this. Do you have ideas? > > Do I understand correctly that you suggest to replace all multicolumn > indexes with (1key column) + included? > I don't think it's a good idea. INCLUDING clause brings some > disadvantages. For example, included columns must be filtered after > the search, while key columns could be used in scan key directly. I > already mentioned this in test example: > > explain analyze select c1, c2 from tbl where c1<10000 and c3<20; > If columns' opclasses are used, new query plan uses them in Index > Cond: ((c1 < 10000) AND (c3 < 20)) > Otherwise, new query can not use included column in Index Cond and > uses filter instead: > Index Cond: (c1 < 10000) > Filter: (c3 < 20) > Rows Removed by Filter: 9993 > It slows down the query significantly. > > And besides that, we still want to have multicolumn unique indexes. > CREATE UNIQUE INDEX on tbl (a, b, c) INCLUDING (d); > I started a new thread about related refactoring, because I think that it should be a separate patch. http://www.postgresql.org/message-id/[email protected] -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
pgsql-hackers by date: