Re: Ideas about a better API for postgres_fdw remote estimates - Mailing list pgsql-hackers
From | Andrey Lepikhov |
---|---|
Subject | Re: Ideas about a better API for postgres_fdw remote estimates |
Date | |
Msg-id | [email protected] Whole thread Raw |
In response to | Re: Ideas about a better API for postgres_fdw remote estimates (Ashutosh Bapat <[email protected]>) |
List | pgsql-hackers |
On 8/31/20 6:19 PM, Ashutosh Bapat wrote: > On Mon, Aug 31, 2020 at 3:36 PM Andrey V. Lepikhov > <[email protected]> wrote: >> I agreed that this task needs to implement an API for >> serialization/deserialization of statistics: >> pg_load_relation_statistics(json_string text); >> pg_get_relation_statistics(relname text); >> We can use a version number for resolving conflicts with different >> statistics implementations. >> "Load" function will validate the values[] anyarray while deserializing >> the input json string to the datatype of the relation column. > > This is a valuable feature. Analysing a foreign table by fetching rows > from the foreign server isn't very efficient. In fact the current FDW > API for doing that forges that in-efficiency by requiring the FDW to > return a sample of rows that will be analysed by the core. That's why > I see that your patch introduces a new API to get foreign rel stat. I > don't think there's any point in maintaining these two APIs just for > ANALYSING table. Instead we should have only one FDW API which will do > whatever it wants and return statistics that can be understood by the > core and let core install it in the catalogs. I believe that's doable. I think the same. > > In case of PostgreSQL it could get the stats available as is from the > foreign server, convert it into a form that the core understands and > returns. The patch introduces a new function postgres_fdw_stat() which > will be available only from version 14 onwards. Can we use > row_to_json(), which is available in all the supported versions, > instead? I started from here. But we need to convert starelid, staop[] stacoll[] oids into portable format. Also we need to explicitly specify the type of each values[] array. And no one guaranteed that anyarray values[] can't contained an array of complex type values, containing oids, that can't be correctly converted to database objects on another server... These considerations required me to add new postgres_fdw_stat() routine that can be moved into the core. > > In case of some other foreign server, an FDW will be responsible to > return statistics in a form that the core will understand. It may > fetch rows from the foreign server or be a bit smart and fetch the > statistics and convert. I don't think I fully understood your idea. Please explain in more detail if possible. > > This also means that FDWs will have to deal with the statistics format > that the core understands and thus will need changes in their code > with every version in the worst case. But AFAIR, PostgreSQL supports > different forms of statistics so the problem may not remain that > severe if FDWs and core agree on some bare minimum format that the > core supports for long. I don't think FDW needs to know anything about the internals of statistics. It only need to execute query like "SELECT extract_statistics(namespace.relation);" and apply the text representation by the function call like this: store_statistics(const char *stat); All validation and update pg_statistic operations will be performed into the core. > > I think the patch has some other problems like it works only for > regular tables on foreign server but a foreign table can be pointing > to any relation like a materialized view, partitioned table or a > foreign table on the foreign server all of which have statistics > associated with them. Ok. It was implemented for discussion, test and as a base of development. > I didn't look closely but it does not consider > that the foreign table may not have all the columns from the relation > on the foreign server or may have different names. Here we get full statistics from remote server and extract statistics only for columns, included into the tuple descriptor of foreign table. > But I think those > problems are kind of secondary. We have to agree on the design first. +1. I only want to point out the following. In previous threads statistics was converted row-by-row. I want to suggest to serialize all statistics tuples for the relation into single json string. On apply phase we can filter unneeded attributes. -- regards, Andrey Lepikhov Postgres Professional
pgsql-hackers by date: