Home > mailing lists

Re: Parallel sec scan in plpgsql - Mailing list pgsql-hackers

From	Alex Ignatov
Subject	Re: Parallel sec scan in plpgsql
Date	September 20, 2016 11:15:54
Msg-id	[email protected] Whole thread Raw
In response to	Re: Parallel sec scan in plpgsql (Amit Kapila <[email protected]>)
List	pgsql-hackers

Tree view

On 18.09.2016 06:54, Amit Kapila wrote:
> On Fri, Sep 16, 2016 at 8:48 PM, Alex Ignatov <[email protected]> wrote:
>>
>> On 16.09.2016 16:50, Amit Kapila wrote:
>>>
>>>
>>> Can you try by setting force_parallel_mode = off;?  I think it is
>>> sending the whole function execution to worker due to
>>> force_parallel_mode.
>>>
>>>
>>
>> No changes:
>>
>
> Okay, it just skipped from my mind that we don't support parallel
> queries for SQL statement execution (or statements executed via
> exec_stmt_execsql) from plpgsql.  For detailed explanation of why that
> is not feasible you can refer one of my earlier e-mails [1] on similar
> topic.  I think if we can somehow get the results via Perform
> statement, then it could be possible to use parallelism via plpgsql.
>
> However, you can use it via SQL functions, an example is below:
>
> set min_parallel_relation_size =0;
> set parallel_tuple_cost=0;
> set parallel_setup_cost=0;
>
> Load 'auto_explain';
> set auto_explain.log_min_duration = 0;
> set auto_explain.log_analyze = true;
> set auto_explain.log_nested_statements = true;
>
> create table test_plpgsql(c1 int, c2 char(1000));
> insert into test_plpgsql values(generate_series(1,100000),'aaa');
>
> create or replace function parallel_test_set_sql() returns
> setof bigint as $$
> select count(*) from test_plpgsql;
> $$language sql PARALLEL SAFE STRICT STABLE;
>
> Then execute function as: select * from parallel_test_set_sql();  You
> can see below plan if auto_explain module is loaded.
>
>         Finalize Aggregate  (cost=14806.85..14806.86 rows=1 width=8) (actual tim
> e=1094.966..1094.967 rows=1 loops=1)
>           ->  Gather  (cost=14806.83..14806.84 rows=2 width=8) (actual time=472.
> 216..1094.943 rows=3 loops=1)
>                 Workers Planned: 2
>                 Workers Launched: 2
>                 ->  Partial Aggregate  (cost=14806.83..14806.84 rows=1 width=8)
> (actual time=177.867..177.868 rows=1 loops=3)
>                       ->  Parallel Seq Scan on test_plpgsql  (cost=0.00..14702.6
> 7 rows=41667 width=0) (actual time=0.384..142.565 rows=33333 loops=3)
> CONTEXT:  SQL function "parallel_test_set_sql" statement 1
> LOG:  duration: 2965.040 ms  plan:
>         Query Text: select * from parallel_test_set_sql();
>         Function Scan on parallel_test_set_sql  (cost=0.25..10.25 rows=1000 widt
> h=8) (actual time=2538.620..2776.955 rows=1 loops=1)
>
>
> In general, I think we should support the cases as required (or
> written) by you from plpgsql or sql functions.  We need more work to
> support such cases. There are probably two ways of supporting such
> cases, we can build some intelligence in plpgsql execution such that
> it can recognise such queries and allow to use parallelism or we need
> to think of enabling parallelism for cases where we don't run the plan
> to completion.  Most of the use cases from plpgsql or sql function
> fall into later category as they don't generally run the plan to
> completion.
>
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1K8kaO_jRk42-o2rmhSRbKV-3mR%2BiNVcONLdbcSXW5TfQ%40mail.gmail.com
>

Thank you for you sugestion! That works.

But what  we can do with this function:
create or replace function parallel_test_sql(t int) returns setof bigint as
$$   select count(*) from (select a,b,c,d,e,sum(bytes) from test where a>= $1 group by a,b,c,d,e)t;
$$ language sql PARALLEL SAFE STRICT STABLE;

explain (analyze,buffers) select  * from  parallel_test_sql(2);

"Function Scan on parallel_test_sql  (cost=0.25..10.25 rows=1000 width=8) (actual time=2410.789..2410.790 rows=1
loops=1)"
"  Buffers: shared hit=63696"
"Planning time: 0.082 ms"
"Execution time: 2410.841 ms"

2016-09-20 14:09:04 MSK [13037]: [75-1] user=ipdr,db=ipdr,app=pgAdmin III - Query Tool,client=127.0.0.1 LOG:  duration:
2410.135ms  plan:        Query Text:           select count(*) from (select a,b,c,d,e,sum(bytes) from test where a>= $1
groupby a,b,c,d,e)t;
 
        Aggregate  (cost=230701.42..230701.43 rows=1 width=8)          ->  HashAggregate  (cost=230363.59..230513.74
rows=15015width=28)                Group Key: test.a, test.b, test.c, test.d, test.e                ->  Seq Scan on
test (cost=0.00..188696.44 rows=3333372 width=20)                      Filter: (a >= $1)
 


No parallelism again. Looks like that Filter: (a >= $1)  breaks parallelism



Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

pgsql-hackers by date:

From: valeriof
Date: 20 September 2016, 11:02:20
Subject: Re: Error running custom plugin: “output plugins have to declare the _PG_output_plugin_init symbol”

From: Matteo Beccati
Date: 20 September 2016, 12:19:46
Subject: Re: kqueue

Re: Parallel sec scan in plpgsql - Mailing list pgsql-hackers

Previous

Next