Skip to content

Вставка за один раз больше 50000 строк. ERROR: bogus varno: 65001 #112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sgrinko opened this issue Aug 10, 2017 · 15 comments
Assignees
Labels
Milestone

Comments

@sgrinko
Copy link

sgrinko commented Aug 10, 2017

Работаю на следующих версиях:
select get_pathman_lib_version()
10402

select version()
PostgreSQL 9.6.3 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609, 64-bit

использую RANGE секционирование 1 месяц. Выполняю такой код:

INSERT INTO base.fact_events(id, rgu_service_id, rgu_authority_id, status, review_status, event_date, created_at, updated_at, email)
  SELECT
    ev.id, srv.id, au.id, ev.status, ev.review_status, ev.date, ev.created_at, ev.updated_at, ev.email::varchar(128)
  FROM public.hershel_events AS ev
    INNER JOIN base.rgu_services AS srv ON srv.foreign_code = ev.service_reg_id
    INNER JOIN base.rgu_authorities AS au ON au.foreign_code = ev.authority_reg_id
  WHERE ev.id BETWEEN 100001 AND 1000000;

ERROR: bogus varno: 65001

Если я разбиваю интервалы на меньшие, то всё проходит.
Получается, что ошибка возникает в случает, если интервал большой
По 50 тыщ записей работает, а по 100 тыщ - нет.

Мне нужно заливать данные в таблицу большими пакетами.
Если делать мелкими, то сильно возрастает время заливки.

@funbringer
Copy link
Collaborator

Добрый день, Сергей!

Вы можете предоставить нам (упрощенную) схему таблицы или какой-нибудь скрипт, чтобы мы могли ускорить воспроизведение проблемы?

@funbringer funbringer added the bug label Aug 10, 2017
@ildus ildus self-assigned this Aug 10, 2017
@ildus
Copy link
Collaborator

ildus commented Aug 10, 2017

Еще не помешал бы EXPLAIN обоих вариантов

@sgrinko
Copy link
Author

sgrinko commented Aug 10, 2017

всё подготовлю завтра...

@sgrinko
Copy link
Author

sgrinko commented Aug 11, 2017

Я смог добиться повторения этой ошибки :)
вот код для его воспроизведения на тестовом окружении

Моё окружение:

select pgpro_version()
PostgresPro 9.6.3.3 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609, 64-bit

--
-- новая БД test_src

CREATE DATABASE test_src
  WITH OWNER = postgres
       ENCODING = 'UTF8'
       TABLESPACE = pg_default
       LC_COLLATE = 'en_US.UTF-8'
       LC_CTYPE = 'en_US.UTF-8'
       CONNECTION LIMIT = -1;
       
CREATE TABLE public.hershel_events
(
  id serial NOT NULL,
  packet_id integer,
  vendor_id integer,
  foreign_id character varying(510) DEFAULT NULL::character varying,
  authority_reg_id character varying(510) DEFAULT NULL::character varying,
  authority_name character varying(510) DEFAULT NULL::character varying,
  service_reg_id character varying(510) DEFAULT NULL::character varying,
  service_name character varying(510) DEFAULT NULL::character varying,
  user_id integer,
  user_foreign_id character varying(510) DEFAULT NULL::character varying,
  email character varying(510) DEFAULT NULL::character varying,
  mobile character varying(510) DEFAULT '70000000000'::character varying,
  date date,
  monitoring_id integer DEFAULT 1,
  created_at timestamp with time zone NOT NULL,
  updated_at timestamp with time zone NOT NULL,
  okato character varying(510) DEFAULT NULL::character varying,
  procedure_reg_id character varying(510) DEFAULT NULL::character varying,
  procedure_name character varying(510) DEFAULT NULL::character varying,
  status integer DEFAULT 0,
  authority_id integer,
  service_id integer,
  commentary text,
  store text,
  form_id integer,
  spot_id integer,
  cached_votes_total integer DEFAULT 0,
  cached_votes_score integer DEFAULT 0,
  cached_votes_up integer DEFAULT 0,
  cached_votes_down integer DEFAULT 0,
  cashed_rate double precision DEFAULT '0'::double precision,
  conversed_at date,
  authority_foreign_id character varying(510) DEFAULT NULL::character varying,
  review_status integer DEFAULT '-1'::integer,
  contact_method character varying(510) DEFAULT NULL::character varying,
  CONSTRAINT hershel_events_pkey PRIMARY KEY (id)
);

-- truncate table public.hershel_events;

insert into public.hershel_events (packet_id, vendor_id, user_id, date, created_at, updated_at, authority_id, service_id, commentary,
                                   store, form_id, spot_id, conversed_at)
select 1000, 2000, 3000, '2008-01-01', dt, dt+'1 day'::interval, 4000, 5000, dt::varchar(50),
       (dt+'1 day'::interval)::varchar(50), 6000, 7000, dt::date 
from generate_series('2008-01-01 00:00'::timestamp,'2016-12-31 23:59:00', '10 minutes') as t(dt);

-- select count(*) from public.hershel_events

--
-- новая БД test_dst

CREATE DATABASE test_dst
  WITH OWNER = postgres
       ENCODING = 'UTF8'
       TABLESPACE = pg_default
       LC_COLLATE = 'en_US.UTF-8'
       LC_CTYPE = 'en_US.UTF-8'
       CONNECTION LIMIT = -1;

CREATE EXTENSION pg_pathman;
CREATE EXTENSION postgres_fdw;

CREATE SERVER foreign_test_src
   FOREIGN DATA WRAPPER postgres_fdw
  OPTIONS (host '127.0.0.1',dbname 'test_src',port '5432');
  
CREATE USER MAPPING 
   FOR postgres
   SERVER foreign_test_src
  OPTIONS (user 'postgres',password '<ваш пароль>');
  
CREATE SCHEMA base AUTHORIZATION postgres;

IMPORT FOREIGN SCHEMA public LIMIT TO (hershel_events)
    FROM SERVER foreign_test_src INTO public;
    
-- select * from public.hershel_events limit 10;

CREATE TABLE base.rgu_services
(
  id serial NOT NULL,
  rgu_authority_id integer,
  ssn integer,
  created_at timestamp without time zone NOT NULL DEFAULT now(),
  updated_at timestamp without time zone NOT NULL,
  deleted_at timestamp without time zone,
  name character varying(4600),
  short_name character varying(1700),
  foreign_code character varying(30),
  rgu_status character varying(40),
  CONSTRAINT rgu_services_pkey PRIMARY KEY (id)
);

CREATE TABLE base.rgu_authorities
(
  id serial NOT NULL,
  rgu_authority_group_id smallint,
  rgu_authority_level_id smallint,
  ssn integer,
  created_at timestamp without time zone NOT NULL DEFAULT now(),
  updated_at timestamp without time zone NOT NULL,
  deleted_at timestamp without time zone,
  name character varying(800),
  short_name character varying(500),
  foreign_code character varying(30),
  parent_code character varying(30),
  head_name character varying(600),
  website character varying(255),
  email character varying(200),
  phone_number character varying(500),
  rgu_status character varying(40),
  CONSTRAINT rgu_authorities_pkey PRIMARY KEY (id)
);

insert into base.rgu_services (id, rgu_authority_id, ssn, created_at, updated_at, deleted_at, 
            name, short_name, foreign_code, rgu_status)
select id, 1000, 2000, '2008-01-01', '2008-01-01', NULL, 'Проба пера', 'Проба пера', '1120400010000004433', NULL
from generate_series(1, 10000) as t(id);

INSERT INTO base.rgu_authorities(id, rgu_authority_group_id, rgu_authority_level_id, ssn, created_at, 
            updated_at, deleted_at, name, short_name, foreign_code, parent_code, 
            head_name, website, email, phone_number, rgu_status)
select id, 10, 200, 0, '2008-01-01', '2008-01-01', NULL, 'Проба пера', 'Проба пера', '112040001000000', '112040001000000',
        '7', '8', '9', '10', NULL
from generate_series(1, 10000) as t(id);

-- select * from public.hershel_events limit 10;
-- drop TABLE base.fact_events cascade;
CREATE TABLE base.fact_events
(
  id bigserial NOT NULL,
  rgu_service_id integer,
  rgu_authority_id integer,
  status integer,
  review_status integer,
  created_at timestamp without time zone NOT NULL DEFAULT now(),
  updated_at timestamp without time zone NOT NULL,
  deleted_at timestamp without time zone,
  event_date date,
  token character varying(50),
  mobile character varying(25),
  email character varying(128),
  CONSTRAINT fact_events_pkey PRIMARY KEY (id)
);

SELECT create_range_partitions('base.fact_events', 'created_at', '2008-01-01'::date, '1 month'::interval, 1);

select count(*) from base.fact_events;

теперь в БД test_dst выполняем :

explain (COSTS)
INSERT INTO base.fact_events(id, rgu_service_id, rgu_authority_id, status, review_status, event_date, created_at, updated_at, email)
  SELECT
    ev.id, srv.id, au.id, ev.status, ev.review_status, ev.date, ev.created_at, ev.updated_at, ev.email::varchar(128)
  FROM public.hershel_events AS ev
    LEFT JOIN base.rgu_services AS srv ON srv.foreign_code = ev.service_reg_id
    LEFT JOIN base.rgu_authorities AS au ON au.foreign_code = ev.authority_reg_id

смотрим на план

QUERY PLAN
Insert on fact_events  (cost=638.67..42622.35 rows=1960784 width=390)
  ->  Custom Scan (PartitionFilter)  (cost=638.67..42622.35 rows=1960784 width=390)
        ->  Hash Right Join  (cost=638.67..42622.35 rows=1960784 width=390)
              Hash Cond: ((au.foreign_code)::text = (ev.authority_reg_id)::text)
              ->  Seq Scan on rgu_authorities au  (cost=0.00..293.00 rows=10000 width=20)
              ->  Hash  (cost=513.67..513.67 rows=10000 width=1068)
                    ->  Hash Right Join  (cost=112.17..513.67 rows=10000 width=1068)
                          Hash Cond: ((srv.foreign_code)::text = (ev.service_reg_id)::text)
                          ->  Seq Scan on rgu_services srv  (cost=0.00..264.00 rows=10000 width=24)
                          ->  Hash  (cost=111.53..111.53 rows=51 width=1580)
                                ->  Foreign Scan on hershel_events ev  (cost=100.00..111.53 rows=51 width=1580)

план с другими характеристиками - BUFFERS, TIMING, ANALYZE, VERBOSE приводит к ошибке...

и теперь если выполнить запрос:

INSERT INTO base.fact_events(id, rgu_service_id, rgu_authority_id, status, review_status, event_date, created_at, updated_at, email)
  SELECT
    ev.id, srv.id, au.id, ev.status, ev.review_status, ev.date, ev.created_at, ev.updated_at, ev.email::varchar(128)
  FROM public.hershel_events AS ev
    LEFT JOIN base.rgu_services AS srv ON srv.foreign_code = ev.service_reg_id
    LEFT JOIN base.rgu_authorities AS au ON au.foreign_code = ev.authority_reg_id

получаем ошибку:

ERROR:  bogus varno: 65000
********** Error **********

ERROR: bogus varno: 65000
SQL state: XX000

что важно:

  • добавление limit в данном случае ошибку убирает.
  • ошибка исчезает если уже создано некое число секций

@funbringer
Copy link
Collaborator

Не воспроизводится на следующих конфигурациях:

  • PostgreSQL 10 beta2 + pg_pathman 1.4.2
  • PostgreSQL 9.6.3 + pg_pathman 1.4.2
  • PostgresPro 9.6.3.3 + встроенный pg_pathman (1.4.2)

В самом начале партиция только одна:

issue_112=# table pathman_partition_list ;
      parent      |     partition      | parttype |    expr    |      range_min      |      range_max
------------------+--------------------+----------+------------+---------------------+---------------------
 base.fact_events | base.fact_events_1 |        2 | created_at | 2008-01-01 00:00:00 | 2008-02-01 00:00:00
(1 row)

Запрос выполняю тот же самый, план ничем не отличается:

explain (COSTS)
INSERT INTO base.fact_events(id, rgu_service_id, rgu_authority_id, status, review_status, event_date, created_at, updated_at, email)
  SELECT
    ev.id, srv.id, au.id, ev.status, ev.review_status, ev.date, ev.created_at, ev.updated_at, ev.email::varchar(128)
  FROM public.hershel_events AS ev
    LEFT JOIN base.rgu_services AS srv ON srv.foreign_code = ev.service_reg_id
    LEFT JOIN base.rgu_authorities AS au ON au.foreign_code = ev.authority_reg_id;
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Insert on fact_events  (cost=1976.67..45416.35 rows=1960784 width=390)
   ->  Custom Scan (PartitionFilter)  (cost=1976.67..45416.35 rows=1960784 width=390)
         ->  Hash Right Join  (cost=1976.67..45416.35 rows=1960784 width=390)
               Hash Cond: ((au.foreign_code)::text = (ev.authority_reg_id)::text)
               ->  Seq Scan on rgu_authorities au  (cost=0.00..293.00 rows=10000 width=20)
               ->  Hash  (cost=513.67..513.67 rows=10000 width=1068)
                     ->  Hash Right Join  (cost=112.17..513.67 rows=10000 width=1068)
                           Hash Cond: ((srv.foreign_code)::text = (ev.service_reg_id)::text)
                           ->  Seq Scan on rgu_services srv  (cost=0.00..264.00 rows=10000 width=24)
                           ->  Hash  (cost=111.53..111.53 rows=51 width=1580)
                                 ->  Foreign Scan on hershel_events ev  (cost=100.00..111.53 rows=51 width=1580)
(11 rows)

Но вот результат совсем другой:

INSERT INTO base.fact_events(id, rgu_service_id, rgu_authority_id, status, review_status, event_date, created_at, updated_at, email)
  SELECT
    ev.id, srv.id, au.id, ev.status, ev.review_status, ev.date, ev.created_at, ev.updated_at, ev.email::varchar(128)
  FROM public.hershel_events AS ev
    LEFT JOIN base.rgu_services AS srv ON srv.foreign_code = ev.service_reg_id
    LEFT JOIN base.rgu_authorities AS au ON au.foreign_code = ev.authority_reg_id;
INSERT 0 473472

select count(*) from base.fact_events;
 count
--------
 473472
(1 row)

Я думаю, данных условий не достаточно для воспроизведения проблемы.

@funbringer
Copy link
Collaborator

Может быть, вы используете какие-нибудь расширения?

@sgrinko
Copy link
Author

sgrinko commented Aug 11, 2017

Попробую сделать бэкап кластера.

@sgrinko
Copy link
Author

sgrinko commented Aug 14, 2017

Привет,
я подготовил бэкап кластера.
Сделал очень просто, остановил сервер и полностью сжал в архив всю папку DATA
test_cluster.tgz
в этом бэкапе 2 БД с данными (500 Мбт)
test_dst - БД назначения, куда пытаемся вставить данные запросом
test_src - БД с данными
кластер подготовлен так, чтобы после восстановления можно было сразу выполнить запрос и получить (надеюсь) ошибку

INSERT INTO base.fact_events(id, rgu_service_id, rgu_authority_id, status, review_status, event_date, created_at, updated_at, email)
  SELECT
    ev.id, srv.id, au.id, ev.status, ev.review_status, ev.date, ev.created_at, ev.updated_at, ev.email::varchar(128)
  FROM public.hershel_events AS ev
    LEFT JOIN base.rgu_services AS srv ON srv.foreign_code = ev.service_reg_id
    LEFT JOIN base.rgu_authorities AS au ON au.foreign_code = ev.authority_reg_id

для приведения БД test_dst в исходное состояние, достаточно выполнить:

drop TABLE base.fact_events cascade;

CREATE TABLE base.fact_events
(
  id bigserial NOT NULL,
  rgu_service_id integer,
  rgu_authority_id integer,
  status integer,
  review_status integer,
  created_at timestamp without time zone NOT NULL DEFAULT now(),
  updated_at timestamp without time zone NOT NULL,
  deleted_at timestamp without time zone,
  event_date date,
  token character varying(50),
  mobile character varying(25),
  email character varying(128),
  CONSTRAINT fact_events_pkey PRIMARY KEY (id)
);

SELECT create_range_partitions('base.fact_events', 'created_at', '2008-01-01'::date, '1 month'::interval, 1);

как только бэкап на cloud mail ru не нужен будет, сообщите... я его удалю.

@funbringer
Copy link
Collaborator

Я нашел проблему: мы не совсем правильно конструируем узел PartitionFilter. Мы исправим это в ближайшем обновлении.

У меня проблема не воспроизводилась потому, что я не ставил auto_explain (который, очевидно, есть у вас). Если бы я сразу знал обо всех используемых расширениях, это сэкономило бы довольно много времени :(

@funbringer
Copy link
Collaborator

На самом деле проблема именно в использовании VERBOSE и т.д. при выполнении EXPLAIN, но я не обратил на это должного внимания. В вашей конфигурации auto_explain выполняет EXPLAIN с VERBOSE, отсюда проблема с PartitionFilter.

@funbringer funbringer added this to the Release 1.4.3 milestone Aug 14, 2017
funbringer added a commit to funbringer/pg_pathman that referenced this issue Aug 14, 2017
@sgrinko
Copy link
Author

sgrinko commented Aug 15, 2017

На будущее буду знать, что надо в окружении приводить полный список всех установленных расширений.
Ждём обновления, спасибо!

@funbringer funbringer self-assigned this Aug 15, 2017
@sgrinko
Copy link
Author

sgrinko commented Aug 15, 2017

Могу ли я уже удалить бэкап на cloud mail ru ?

@funbringer
Copy link
Collaborator

@sgrinko да, конечно.

@funbringer
Copy link
Collaborator

Вышла версия 1.4.3 с исправлением.

@sgrinko
Copy link
Author

sgrinko commented Aug 24, 2017

Большое спасибо!
Завтра будем пробовать.

antamel added a commit that referenced this issue May 4, 2023
  Caused by:
  - 2489d76c4906f4461a364ca8ad7e0751ead8aa0d (PostgreSQL)
  Make Vars be outer-join-aware.
 - 8109191 (pg_pathman)
 fix bogus varno in PartitionFilter's targetlist (issue #112)
  - 92f0734 (pg_pathman)
  [PGPRO-7630] Post-processing for nodes added in plan tree by pathman
   PGPRO-5628: Port pg_proaudit to STD15
Tags: pg_pathman
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants