Quick Links

auto_explain sample rate

Lists:	pgsql-hackers

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	auto_explain sample rate
Date:	2015-05-29 01:39:00
Message-ID:	CAMsr+YHMZC6COFE4q+246Q_X_6HEdswh=Pu9RROhuNW-=LmgPA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi all

It's sometimes desirable to collect auto_explain data with ANALYZE in order
to track down hard-to-reproduce issues, but the performance impacts can be
pretty hefty on the DB.

I'm inclined to add a sample rate to auto_explain so that it can trigger
only on x percent of queries, and also add a sample test hook that can be
used to target statements of interest more narrowly (using a C hook
function).

Sound like a reasonable approach?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2015-05-29 03:35:17
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Craig Ringer <craig(at)2ndquadrant(dot)com> writes:
> It's sometimes desirable to collect auto_explain data with ANALYZE in order
> to track down hard-to-reproduce issues, but the performance impacts can be
> pretty hefty on the DB.

> I'm inclined to add a sample rate to auto_explain so that it can trigger
> only on x percent of queries,

That sounds reasonable ...

> and also add a sample test hook that can be
> used to target statements of interest more narrowly (using a C hook
> function).

You'd have to be pretty desperate, *and* knowledgeable, to write a
C function for that. Can't we invent something a bit more user-friendly
for the purpose? No idea what it should look like though.

regards, tom lane

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2015-06-02 07:07:06
Message-ID:	CAMsr+YHvWTECkTxEpvfpj0cMOMtYngOvEHuupkZDbpbf4RsJbw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 29 May 2015 at 11:35, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Craig Ringer <craig(at)2ndquadrant(dot)com> writes:
> > It's sometimes desirable to collect auto_explain data with ANALYZE in
> order
> > to track down hard-to-reproduce issues, but the performance impacts can
> be
> > pretty hefty on the DB.
>
> > I'm inclined to add a sample rate to auto_explain so that it can trigger
> > only on x percent of queries,
>
> That sounds reasonable ...
>

Cool, I'll cook that up then. Thanks for the sanity check.

> > and also add a sample test hook that can be
> > used to target statements of interest more narrowly (using a C hook
> > function).
>
> You'd have to be pretty desperate, *and* knowledgeable, to write a
> C function for that. Can't we invent something a bit more user-friendly
> for the purpose? No idea what it should look like though.
>

I've been that desperate.

For the majority of users I'm sure it's sufficient to just have a sample
rate.

Anything that's trying to match individual queries could be interested in
all sorts of different things. Queries that touch a particular table being
one of the more obvious things, or queries that mention a particular
literal. Rather than try to design something complicated in advance that
anticipates all needs, I'm thinking it makes sense to just throw a hook in
there. If some patterns start to emerge in terms of useful real world
filtering criteria then that'd better inform any more user accessible
design down the track.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2015-06-02 07:11:12
Message-ID:	CAFj8pRBFgrBBQVV+RUAjS-5TRCdxq_2v4KC0jc3PJgnMddhfHA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

2015-06-02 9:07 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:

> On 29 May 2015 at 11:35, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Craig Ringer <craig(at)2ndquadrant(dot)com> writes:
>> > It's sometimes desirable to collect auto_explain data with ANALYZE in
>> order
>> > to track down hard-to-reproduce issues, but the performance impacts can
>> be
>> > pretty hefty on the DB.
>>
>> > I'm inclined to add a sample rate to auto_explain so that it can trigger
>> > only on x percent of queries,
>>
>> That sounds reasonable ...
>>
>
> Cool, I'll cook that up then. Thanks for the sanity check.
>
>
>> > and also add a sample test hook that can be
>> > used to target statements of interest more narrowly (using a C hook
>> > function).
>>
>> You'd have to be pretty desperate, *and* knowledgeable, to write a
>> C function for that. Can't we invent something a bit more user-friendly
>> for the purpose? No idea what it should look like though.
>>
>
> I've been that desperate.
>
> For the majority of users I'm sure it's sufficient to just have a sample
> rate.
>
> Anything that's trying to match individual queries could be interested in
> all sorts of different things. Queries that touch a particular table being
> one of the more obvious things, or queries that mention a particular
> literal. Rather than try to design something complicated in advance that
> anticipates all needs, I'm thinking it makes sense to just throw a hook in
> there. If some patterns start to emerge in terms of useful real world
> filtering criteria then that'd better inform any more user accessible
> design down the track.
>

same method can be interesting for interactive EXPLAIN ANALYZE too. TIMING
has about 20%-30% overhead and usually we don't need a perfectly exact
numbers

Regards

Pavel

>
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2015-06-03 07:17:10
Message-ID:	CAMsr+YE5vWM+Eft8KpVk5594hbbEVnux7iBjHSv9wsOeCACj6w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2 June 2015 at 15:11, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:

>
>
> 2015-06-02 9:07 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:
>
>>
>> For the majority of users I'm sure it's sufficient to just have a sample
>> rate.
>>
>> Anything that's trying to match individual queries could be interested in
>> all sorts of different things. Queries that touch a particular table being
>> one of the more obvious things, or queries that mention a particular
>> literal. Rather than try to design something complicated in advance that
>> anticipates all needs, I'm thinking it makes sense to just throw a hook in
>> there. If some patterns start to emerge in terms of useful real world
>> filtering criteria then that'd better inform any more user accessible
>> design down the track.
>>
>
> same method can be interesting for interactive EXPLAIN ANALYZE too. TIMING
> has about 20%-30% overhead and usually we don't need a perfectly exact
> numbers
>

I don't understand what you are suggesting here.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2015-06-03 07:22:07
Message-ID:	CAFj8pRABvcJ3K2tEav017=uqS=QfhpRsE6PqnWcUgX-hv42ShQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

2015-06-03 9:17 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:

>
>
> On 2 June 2015 at 15:11, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>
>>
>>
>> 2015-06-02 9:07 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:
>>
>>>
>>> For the majority of users I'm sure it's sufficient to just have a sample
>>> rate.
>>>
>>> Anything that's trying to match individual queries could be interested
>>> in all sorts of different things. Queries that touch a particular table
>>> being one of the more obvious things, or queries that mention a particular
>>> literal. Rather than try to design something complicated in advance that
>>> anticipates all needs, I'm thinking it makes sense to just throw a hook in
>>> there. If some patterns start to emerge in terms of useful real world
>>> filtering criteria then that'd better inform any more user accessible
>>> design down the track.
>>>
>>
>> same method can be interesting for interactive EXPLAIN ANALYZE too.
>> TIMING has about 20%-30% overhead and usually we don't need a perfectly
>> exact numbers
>>
>
> I don't understand what you are suggesting here.
>

using some sampling for EXPLAIN ANALYZE statement

Pavel

>
>
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2015-06-03 07:46:37
Message-ID:	CAMsr+YHBmBvjNDrp30mK2xAVkmJY4nHxL_B3FK4b_-e2VXzraw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 3 June 2015 at 15:22, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:

>
>
> 2015-06-03 9:17 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:
>
>>
>>
>> On 2 June 2015 at 15:11, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>>
>>>
>>>
>>> 2015-06-02 9:07 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:
>>>
>>>>
>>>> For the majority of users I'm sure it's sufficient to just have a
>>>> sample rate.
>>>>
>>>> Anything that's trying to match individual queries could be interested
>>>> in all sorts of different things. Queries that touch a particular table
>>>> being one of the more obvious things, or queries that mention a particular
>>>> literal. Rather than try to design something complicated in advance that
>>>> anticipates all needs, I'm thinking it makes sense to just throw a hook in
>>>> there. If some patterns start to emerge in terms of useful real world
>>>> filtering criteria then that'd better inform any more user accessible
>>>> design down the track.
>>>>
>>>
>>> same method can be interesting for interactive EXPLAIN ANALYZE too.
>>> TIMING has about 20%-30% overhead and usually we don't need a perfectly
>>> exact numbers
>>>
>>
>> I don't understand what you are suggesting here.
>>
>
> using some sampling for EXPLAIN ANALYZE statement
>
>
Do you mean that you'd like to be able to set a fraction of queries on
which auto_explain does ANALYZE, so most of the time it just outputs an
ordinary EXPLAIN?

Or maybe we're talking about different things re the original proposal? I
don't see how this would work. If you run EXPLAIN ANALYZE interactively
like you said above. You'd surely want it to report costs and timings, or
whatever it is that you ask for, all the time. Not just some of the time
based on some background setting.

Are you advocating a profiling-based approach for EXPLAIN ANALYZE timing
where we sample which executor node we're under at regular intervals,
instead of timing everything? Or suggesting a way to filter out sub-trees
so you only get timing data on some sub-portion of a query?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2015-06-03 07:59:41
Message-ID:	CAFj8pRDenJ6XgkPaKGgBi0FkAZzcyL0evpi+3HzVgcawZ2i5ig@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

2015-06-03 9:46 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:

>
> On 3 June 2015 at 15:22, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>
>>
>>
>> 2015-06-03 9:17 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:
>>
>>>
>>>
>>> On 2 June 2015 at 15:11, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>>>
>>>>
>>>>
>>>> 2015-06-02 9:07 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:
>>>>
>>>>>
>>>>> For the majority of users I'm sure it's sufficient to just have a
>>>>> sample rate.
>>>>>
>>>>> Anything that's trying to match individual queries could be interested
>>>>> in all sorts of different things. Queries that touch a particular table
>>>>> being one of the more obvious things, or queries that mention a particular
>>>>> literal. Rather than try to design something complicated in advance that
>>>>> anticipates all needs, I'm thinking it makes sense to just throw a hook in
>>>>> there. If some patterns start to emerge in terms of useful real world
>>>>> filtering criteria then that'd better inform any more user accessible
>>>>> design down the track.
>>>>>
>>>>
>>>> same method can be interesting for interactive EXPLAIN ANALYZE too.
>>>> TIMING has about 20%-30% overhead and usually we don't need a perfectly
>>>> exact numbers
>>>>
>>>
>>> I don't understand what you are suggesting here.
>>>
>>
>> using some sampling for EXPLAIN ANALYZE statement
>>
>>
> Do you mean that you'd like to be able to set a fraction of queries on
> which auto_explain does ANALYZE, so most of the time it just outputs an
> ordinary EXPLAIN?
>
> Or maybe we're talking about different things re the original proposal? I
> don't see how this would work. If you run EXPLAIN ANALYZE interactively
> like you said above. You'd surely want it to report costs and timings, or
> whatever it is that you ask for, all the time. Not just some of the time
> based on some background setting.
>
> Are you advocating a profiling-based approach for EXPLAIN ANALYZE timing
> where we sample which executor node we're under at regular intervals,
> instead of timing everything? Or suggesting a way to filter out sub-trees
> so you only get timing data on some sub-portion of a query?
>
>
lot of variants - I would to see cost and times for EXPLAIN ANALYZE every
time - but the precision of time can be reduced to 1ms. It is question if
we can significantly reduce the cost (or number of calls) of getting system
time.

Pavel

>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2015-06-03 08:20:57
Message-ID:	CAMsr+YHQpsAdrzaPg0xsJv5=WG5gdqHx71p-xe-wiqxb4B7=4A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

>
>
>> lot of variants - I would to see cost and times for EXPLAIN ANALYZE every
> time - but the precision of time can be reduced to 1ms. It is question if
> we can significantly reduce the cost (or number of calls) of getting system
> time.
>
> Pavel
>
>
>
OK, so you're suggesting a sampling-based EXPLAIN.

That'd be interesting, but is totally unrelated to this work on
auto_explain.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2015-06-03 10:54:24
Message-ID:	CAMsr+YE11CUsa_8bV7odghKKPM3L+kWMyr-HZyrhGpk16OsX1g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2 June 2015 at 15:07, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:

OK, here we go.

To make sure it doesn't trigger on all backends at once, and to ensure it
doesn't rely on a shared point of contention in shmem, this sets up a
counter with a random value on each backend start.

Because it needs to either always run both the Start and End hooks, or run
neither, this doesn't count nested statements for sampling purposes. So if
you run my_huge_plpgsql_function() then either all its statements will be
explained or none of them will. This only applies if nested statement
explain is enabled. It's possible to get around this by adding a separate
nested statement counter that's reset at each top level End hook, but it
doesn't seem worthwhile.

The sample rate has no effect on ANALYZE, which remains enabled or disabled
for all queries. I don't see any point adding a separate sample rate
control to ANALYZE only some sub-proportion of EXPLAINed statements.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Tra

Attachment	Content-Type	Size
0001-Allow-sampling-of-only-some-queries-by-auto_explain.patch	text/x-patch	4.9 KB

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2015-06-03 12:04:34
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2015-06-03 18:54:24 +0800, Craig Ringer wrote:
> OK, here we go.

Hm. Wouldn't random sampling be better than what you do? If your queries
have a pattern to them (e.g. you always issue the same 10 queries in
succession), this will possibly only show a subset of the queries.

I think a formulation in fraction (i.e. a float between 0 and 1) will
also be easier to understand.

Andres

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2015-06-03 13:00:25
Message-ID:	CAMsr+YFXQ+G_9MHfoRvVo83knPbcaCCS7+P+dft_qJY45cFeEQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 3 June 2015 at 20:04, Andres Freund <andres(at)anarazel(dot)de> wrote:

> On 2015-06-03 18:54:24 +0800, Craig Ringer wrote:
> > OK, here we go.
>
> Hm. Wouldn't random sampling be better than what you do? If your queries
> have a pattern to them (e.g. you always issue the same 10 queries in
> succession), this will possibly only show a subset of the queries.
>
> I think a formulation in fraction (i.e. a float between 0 and 1) will
> also be easier to understand.

Could be, yeah. I was thinking about the cost of generating a random each
time, but it's going to vanish in the noise compared to the rest of the
costs in query execution.

---
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2015-07-05 16:22:33
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 03/06/2015 15:00, Craig Ringer wrote:
>
>
> On 3 June 2015 at 20:04, Andres Freund <andres(at)anarazel(dot)de
> <mailto:andres(at)anarazel(dot)de>> wrote:
>
> On 2015-06-03 18:54:24 +0800, Craig Ringer wrote:
> > OK, here we go.
>
> Hm. Wouldn't random sampling be better than what you do? If your queries
> have a pattern to them (e.g. you always issue the same 10 queries in
> succession), this will possibly only show a subset of the queries.
>
> I think a formulation in fraction (i.e. a float between 0 and 1) will
> also be easier to understand.
>
>
> Could be, yeah. I was thinking about the cost of generating a random
> each time, but it's going to vanish in the noise compared to the rest of
> the costs in query execution.
>

Hello, I've just reviewed the patch.

I'm not sure if there's a consensus on the sample rate format. FWIW, I
also think a fraction would be easier to understand. Any news about
generating a random at each call to avoid the query pattern problem ?

The patch applies without error. I wonder if there's any reason for
using pg_lrand48() instead of random(), as there's a port for random()
if the system lacks it.

After some quick checks, I found that auto_explain_sample_counter is
always initialized with the same value. After some digging, it seems
that pg_lrand48() always returns the same values in the same order, at
least on my computer. Have I missed something?

Otherwise, after replacing the pg_lrand48() call with a random(), it
works just fine.

> ---
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

From:	Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2015-07-07 13:37:07
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 05/07/2015 18:22, Julien Rouhaud wrote:
> On 03/06/2015 15:00, Craig Ringer wrote:
>>
>>
>> On 3 June 2015 at 20:04, Andres Freund <andres(at)anarazel(dot)de
>> <mailto:andres(at)anarazel(dot)de>> wrote:
>>
>> On 2015-06-03 18:54:24 +0800, Craig Ringer wrote:
>> > OK, here we go.
>>
>> Hm. Wouldn't random sampling be better than what you do? If your queries
>> have a pattern to them (e.g. you always issue the same 10 queries in
>> succession), this will possibly only show a subset of the queries.
>>
>> I think a formulation in fraction (i.e. a float between 0 and 1) will
>> also be easier to understand.
>>
>>
>> Could be, yeah. I was thinking about the cost of generating a random
>> each time, but it's going to vanish in the noise compared to the rest of
>> the costs in query execution.
>>
>
> Hello, I've just reviewed the patch.
>
> I'm not sure if there's a consensus on the sample rate format. FWIW, I
> also think a fraction would be easier to understand. Any news about
> generating a random at each call to avoid the query pattern problem ?
>
> The patch applies without error. I wonder if there's any reason for
> using pg_lrand48() instead of random(), as there's a port for random()
> if the system lacks it.
>
> After some quick checks, I found that auto_explain_sample_counter is
> always initialized with the same value. After some digging, it seems
> that pg_lrand48() always returns the same values in the same order, at
> least on my computer. Have I missed something?
>

Well, I obviously missed that pg_srand48() is only used if the system
lacks random/srandom, sorry for the noise. So yes, random() must be
used instead of pg_lrand48().

I'm attaching a new version of the patch fixing this issue just in case.

> Otherwise, after replacing the pg_lrand48() call with a random(), it
> works just fine.
>
>> ---
>> Craig Ringer http://www.2ndQuadrant.com/
>> PostgreSQL Development, 24x7 Support, Training & Services
>
>

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Attachment	Content-Type	Size
auto_explain_sample_rate-v2.patch	text/x-patch	4.3 KB

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2015-07-17 05:53:26
Message-ID:	CAMsr+YGH8jhOhi9uWya6qB6KC1ZGSUmyiNAQzZTnepzpyP4W=Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 7 July 2015 at 21:37, Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com> wrote:

> Well, I obviously missed that pg_srand48() is only used if the system
> lacks random/srandom, sorry for the noise. So yes, random() must be
> used instead of pg_lrand48().
>
> I'm attaching a new version of the patch fixing this issue just in case.

Thanks for picking this up. I've been trying to find time to come back
to it but been swamped in priority work.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc:	Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2015-08-25 12:45:28
Message-ID:	CAB7nPqStA9fSvJFSv=E_211RvskMyof86bSPf+hbzq6Q4-2F6Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jul 17, 2015 at 2:53 PM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> On 7 July 2015 at 21:37, Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com> wrote:
>
>> Well, I obviously missed that pg_srand48() is only used if the system
>> lacks random/srandom, sorry for the noise. So yes, random() must be
>> used instead of pg_lrand48().
>>
>> I'm attaching a new version of the patch fixing this issue just in case.
>
> Thanks for picking this up. I've been trying to find time to come back
> to it but been swamped in priority work.

For now I am marking that as returned with feedback.
--
Michael

From:	Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2016-02-16 21:24:59
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 25/08/2015 14:45, Michael Paquier wrote:
> On Fri, Jul 17, 2015 at 2:53 PM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>> On 7 July 2015 at 21:37, Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com> wrote:
>>
>>> Well, I obviously missed that pg_srand48() is only used if the system
>>> lacks random/srandom, sorry for the noise. So yes, random() must be
>>> used instead of pg_lrand48().
>>>
>>> I'm attaching a new version of the patch fixing this issue just in case.
>>
>> Thanks for picking this up. I've been trying to find time to come back
>> to it but been swamped in priority work.
>
> For now I am marking that as returned with feedback.
>

PFA v3 of the patch, rebased on current head. It fixes the last issue
(sample a percentage of queries).

I'm adding it to the next commitfest.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Attachment	Content-Type	Size
auto_explain_sample_rate-v3.patch	text/x-patch	3.0 KB

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2016-02-16 21:51:38
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Julien Rouhaud wrote:

Hijacking this macro is just too obscure:

> #define auto_explain_enabled() \
> (auto_explain_log_min_duration >= 0 && \
> - (nesting_level == 0 || auto_explain_log_nested_statements))
> + (nesting_level == 0 || auto_explain_log_nested_statements) && \
> + current_query_sampled)

because it then becomes hard to figure out that assigning to _sampled is
what makes the enabled() check pass or not depending on sampling:

> @@ -191,6 +211,14 @@ _PG_fini(void)
> static void
> explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
> {
> + /*
> + * For ratio sampling, randomly choose top-level statement. Either
> + * all nested statements will be explained or none will.
> + */
> + if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
> + current_query_sampled = (random() < auto_explain_sample_ratio *
> + MAX_RANDOM_VALUE);
> +
> if (auto_explain_enabled())
> {

I think it's better to keep the "enabled" macro unmodified, and just add
another conditional to the "if" test there.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2016-02-17 00:17:35
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 16/02/2016 22:51, Alvaro Herrera wrote:
> Julien Rouhaud wrote:
>
> Hijacking this macro is just too obscure:
>
>> #define auto_explain_enabled() \
>> (auto_explain_log_min_duration >= 0 && \
>> - (nesting_level == 0 || auto_explain_log_nested_statements))
>> + (nesting_level == 0 || auto_explain_log_nested_statements) && \
>> + current_query_sampled)
>
> because it then becomes hard to figure out that assigning to _sampled is
> what makes the enabled() check pass or not depending on sampling:
>
>> @@ -191,6 +211,14 @@ _PG_fini(void)
>> static void
>> explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
>> {
>> + /*
>> + * For ratio sampling, randomly choose top-level statement. Either
>> + * all nested statements will be explained or none will.
>> + */
>> + if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
>> + current_query_sampled = (random() < auto_explain_sample_ratio *
>> + MAX_RANDOM_VALUE);
>> +
>> if (auto_explain_enabled())
>> {
>
> I think it's better to keep the "enabled" macro unmodified, and just add
> another conditional to the "if" test there.
>

Thanks for looking at this!

Agreed, it's too obscure. Attached v4 fixes as you said.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Attachment	Content-Type	Size
auto_explain_sample_rate-v4.patch	text/x-patch	3.6 KB

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2016-03-10 03:37:39
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 17/02/16 01:17, Julien Rouhaud wrote:
>
> Agreed, it's too obscure. Attached v4 fixes as you said.
>

Seems to be simple enough patch and works. However I would like
documentation to say that the range is 0 to 1 and represents fraction of
the queries sampled, because right now both the GUC description and the
documentation say it's in percent but that's not really true as percent
is 0 to 100.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2016-03-10 19:59:28
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/03/2016 04:37, Petr Jelinek wrote:
> On 17/02/16 01:17, Julien Rouhaud wrote:
>>
>> Agreed, it's too obscure. Attached v4 fixes as you said.
>>
>
> Seems to be simple enough patch and works. However I would like
> documentation to say that the range is 0 to 1 and represents fraction of
> the queries sampled, because right now both the GUC description and the
> documentation say it's in percent but that's not really true as percent
> is 0 to 100.
>

Agreed. v5 attached fixes that.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Attachment	Content-Type	Size
auto_explain_sample_rate-v5.diff	text/x-patch	3.7 KB

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2016-03-10 21:07:55
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/03/16 20:59, Julien Rouhaud wrote:
> On 10/03/2016 04:37, Petr Jelinek wrote:
>> On 17/02/16 01:17, Julien Rouhaud wrote:
>>>
>>> Agreed, it's too obscure. Attached v4 fixes as you said.
>>>
>>
>> Seems to be simple enough patch and works. However I would like
>> documentation to say that the range is 0 to 1 and represents fraction of
>> the queries sampled, because right now both the GUC description and the
>> documentation say it's in percent but that's not really true as percent
>> is 0 to 100.
>>
>
> Agreed. v5 attached fixes that.
>

Great, I will test it once more (just because when I don't bugs suddenly
appear out of nowhere) and mark it ready for committer.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2016-03-11 10:45:25
Message-ID:	CABUevEwh0zqLEwFwZo5M9dLjcfvq2yP4cSc0nXeJLqLs3DeiKA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 10, 2016 at 10:07 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:

> On 10/03/16 20:59, Julien Rouhaud wrote:
>
>> On 10/03/2016 04:37, Petr Jelinek wrote:
>>
>>> On 17/02/16 01:17, Julien Rouhaud wrote:
>>>
>>>>
>>>> Agreed, it's too obscure. Attached v4 fixes as you said.
>>>>
>>>>
>>> Seems to be simple enough patch and works. However I would like
>>> documentation to say that the range is 0 to 1 and represents fraction of
>>> the queries sampled, because right now both the GUC description and the
>>> documentation say it's in percent but that's not really true as percent
>>> is 0 to 100.
>>>
>>>
>> Agreed. v5 attached fixes that.
>>
>>
> Great, I will test it once more (just because when I don't bugs suddenly
> appear out of nowhere) and mark it ready for committer.
>
>
Coming back to the previous discussions about random() - AFAICT this patch
will introduce the random() call always (in explain_ExecutorStart):

+ if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
+ current_query_sampled = (random() < auto_explain_sample_ratio *
+ MAX_RANDOM_VALUE);

Not sure what the overhead is, but wouldn't it be better to include a check
for current_query_sampled>0 in the if part of that statement? Regardless
of performance, that feels cleaner to me. Or am I missing something?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>, Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2016-03-11 13:51:00
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/03/2016 11:45, Magnus Hagander wrote:
>
> Coming back to the previous discussions about random() - AFAICT this
> patch will introduce the random() call always (in explain_ExecutorStart):
>
> +if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
> +current_query_sampled = (random() < auto_explain_sample_ratio *
> +MAX_RANDOM_VALUE);
>
>
> Not sure what the overhead is, but wouldn't it be better to include a
> check for current_query_sampled>0 in the if part of that statement?
> Regardless of performance, that feels cleaner to me. Or am I missing
> something?

You mean check for auto_explain_sample_ratio > 0 ?

There would be a corner case if a query is sampled
(current_query_sampled is true) and then auto_explain_sample_ratio is
set to 0, all subsequent queries in this backend would be processed.

We could add a specific test for this case to spare a random() call, but
I fear it'd be overkill. Maybe it's better to document that the good way
to deactivate auto_explain is to set auto_explain.log_min_duration to -1.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2016-03-11 14:03:01
Message-ID:	CABUevEz4hCtNSXxwugRKVna-J9dAmxwEu2wOR04ncLP731vpNg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Mar 11, 2016 at 2:51 PM, Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
wrote:

> On 11/03/2016 11:45, Magnus Hagander wrote:
> >
> > Coming back to the previous discussions about random() - AFAICT this
> > patch will introduce the random() call always (in explain_ExecutorStart):
> >
> > +if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
> > +current_query_sampled = (random() < auto_explain_sample_ratio *
> > +MAX_RANDOM_VALUE);
> >
> >
> > Not sure what the overhead is, but wouldn't it be better to include a
> > check for current_query_sampled>0 in the if part of that statement?
> > Regardless of performance, that feels cleaner to me. Or am I missing
> > something?
>
> You mean check for auto_explain_sample_ratio > 0 ?
>

I did, but I think what I should have meant is auto_explain_sample_ratio <
1.

> There would be a corner case if a query is sampled
> (current_query_sampled is true) and then auto_explain_sample_ratio is
> set to 0, all subsequent queries in this backend would be processed.
>

There would have to be an else block as well of course, that set it back.

> We could add a specific test for this case to spare a random() call, but
> I fear it'd be overkill. Maybe it's better to document that the good way
> to deactivate auto_explain is to set auto_explain.log_min_duration to -1.
>

I guess in the end it probably is - a general random() call is pretty cheap
compared to all the things we're doing. I think my mind was stuck in
crypto-random which can be more expensive :)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2016-03-11 14:06:16
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/03/16 11:45, Magnus Hagander wrote:
> On Thu, Mar 10, 2016 at 10:07 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com
> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
>
> On 10/03/16 20:59, Julien Rouhaud wrote:
>
> On 10/03/2016 04:37, Petr Jelinek wrote:
>
> On 17/02/16 01:17, Julien Rouhaud wrote:
>
>
> Agreed, it's too obscure. Attached v4 fixes as you said.
>
>
> Seems to be simple enough patch and works. However I would like
> documentation to say that the range is 0 to 1 and represents
> fraction of
> the queries sampled, because right now both the GUC
> description and the
> documentation say it's in percent but that's not really true
> as percent
> is 0 to 100.
>
>
> Agreed. v5 attached fixes that.
>
>
> Great, I will test it once more (just because when I don't bugs
> suddenly appear out of nowhere) and mark it ready for committer.
>
>
> Coming back to the previous discussions about random() - AFAICT this
> patch will introduce the random() call always (in explain_ExecutorStart):
>
> +if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
> +current_query_sampled = (random() < auto_explain_sample_ratio *
> +MAX_RANDOM_VALUE);
>

No it doesn't as the documented way to turn off auto_explain is to set
auto_explain_log_min_duration to -1 which is also the default.

In any case the code for that would have to be something like
if (auto_explain_sample_ratio == 0)
current_query_sampled = false
else if <the original if>

Not sure if I consider that cleaner but it would definitely remove the
call to random() in case user has set auto_explain_log_min_duration to
something else than -1 and "turned off" the auto_explain by setting
auto_explain_sample_ratio to 0.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2016-03-11 14:11:11
Message-ID:	CABUevEzLUa+FzYZFiZBGKMT5+5Gth=yQ3YRyZqfF87pvwoXCVw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Mar 11, 2016 at 3:03 PM, Magnus Hagander <magnus(at)hagander(dot)net>
wrote:

>
>
> On Fri, Mar 11, 2016 at 2:51 PM, Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com
> > wrote:
>
>> On 11/03/2016 11:45, Magnus Hagander wrote:
>> >
>> > Coming back to the previous discussions about random() - AFAICT this
>> > patch will introduce the random() call always (in
>> explain_ExecutorStart):
>> >
>> > +if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
>> > +current_query_sampled = (random() < auto_explain_sample_ratio *
>> > +MAX_RANDOM_VALUE);
>> >
>> >
>> > Not sure what the overhead is, but wouldn't it be better to include a
>> > check for current_query_sampled>0 in the if part of that statement?
>> > Regardless of performance, that feels cleaner to me. Or am I missing
>> > something?
>>
>> You mean check for auto_explain_sample_ratio > 0 ?
>>
>
> I did, but I think what I should have meant is auto_explain_sample_ratio <
> 1.
>
>
>
>> There would be a corner case if a query is sampled
>> (current_query_sampled is true) and then auto_explain_sample_ratio is
>> set to 0, all subsequent queries in this backend would be processed.
>>
>
> There would have to be an else block as well of course, that set it back.
>
>
>
>> We could add a specific test for this case to spare a random() call, but
>> I fear it'd be overkill. Maybe it's better to document that the good way
>> to deactivate auto_explain is to set auto_explain.log_min_duration to -1.
>>
>
> I guess in the end it probably is - a general random() call is pretty
> cheap compared to all the things we're doing. I think my mind was stuck in
> crypto-random which can be more expensive :)
>
>
Applied with a minor word-fix in the documentation and removal of some
unrelated whitespace changes.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2016-03-11 14:16:19
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/03/2016 15:11, Magnus Hagander wrote:
>
>
> On Fri, Mar 11, 2016 at 3:03 PM, Magnus Hagander <magnus(at)hagander(dot)net
> <mailto:magnus(at)hagander(dot)net>> wrote:
>
>
>
> On Fri, Mar 11, 2016 at 2:51 PM, Julien Rouhaud
> <julien(dot)rouhaud(at)dalibo(dot)com <mailto:julien(dot)rouhaud(at)dalibo(dot)com>> wrote:
>
> On 11/03/2016 11:45, Magnus Hagander wrote:
> >
> > Coming back to the previous discussions about random() - AFAICT this
> > patch will introduce the random() call always (in explain_ExecutorStart):
> >
> > +if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
> > +current_query_sampled = (random() < auto_explain_sample_ratio *
> > +MAX_RANDOM_VALUE);
> >
> >
> > Not sure what the overhead is, but wouldn't it be better to include a
> > check for current_query_sampled>0 in the if part of that statement?
> > Regardless of performance, that feels cleaner to me. Or am I missing
> > something?
>
> You mean check for auto_explain_sample_ratio > 0 ?
>
>
> I did, but I think what I should have meant is
> auto_explain_sample_ratio < 1.
>
>
>
> There would be a corner case if a query is sampled
> (current_query_sampled is true) and then
> auto_explain_sample_ratio is
> set to 0, all subsequent queries in this backend would be processed.
>
>
> There would have to be an else block as well of course, that set it
> back.
>
>
>
> We could add a specific test for this case to spare a random()
> call, but
> I fear it'd be overkill. Maybe it's better to document that the
> good way
> to deactivate auto_explain is to set
> auto_explain.log_min_duration to -1.
>
>
> I guess in the end it probably is - a general random() call is
> pretty cheap compared to all the things we're doing. I think my mind
> was stuck in crypto-random which can be more expensive :)
>
>
> Applied with a minor word-fix in the documentation and removal of some
> unrelated whitespace changes.
>

Thanks!

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>, Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2016-03-11 16:33:46
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On Fri, 2016-03-11 at 15:11 +0100, Magnus Hagander wrote:
>
> >
> Applied with a minor word-fix in the documentation and removal of
> some unrelated whitespace changes.

A bit late, but I think we should rename the GUC variable to
"sampling_rate" (instead of sample_ratio) as that's what pgbench uses
for the same thing. That'd be more consistent.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2016-03-11 16:55:19
Message-ID:	CA+TgmoZoRunWtv8zHiKUgmv193h0m40JXKY4i-SALgCk91MSUg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Mar 11, 2016 at 11:33 AM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> A bit late, but I think we should rename the GUC variable to
> "sampling_rate" (instead of sample_ratio) as that's what pgbench uses
> for the same thing. That'd be more consistent.

I like that idea. It seems like slightly better terminology.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2016-03-12 14:20:20
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/03/2016 17:55, Robert Haas wrote:
> On Fri, Mar 11, 2016 at 11:33 AM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> A bit late, but I think we should rename the GUC variable to
>> "sampling_rate" (instead of sample_ratio) as that's what pgbench uses
>> for the same thing. That'd be more consistent.
>
> I like that idea. It seems like slightly better terminology.
>

I like it too. I also just noticed that I duplicated the var type by
mistake in the documentation :/

Attached patch fixes both.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Attachment	Content-Type	Size
auto_explain_rate.diff	text/plain	2.9 KB

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: auto_explain sample rate
Date:	2016-03-13 12:21:14
Message-ID:	CABUevEzTR2twjHEQvP_xV+oD_9odwVDsAd2zSb0YKo-cVma21g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Mar 12, 2016 at 3:20 PM, Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
wrote:

> On 11/03/2016 17:55, Robert Haas wrote:
> > On Fri, Mar 11, 2016 at 11:33 AM, Tomas Vondra
> > <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> >> A bit late, but I think we should rename the GUC variable to
> >> "sampling_rate" (instead of sample_ratio) as that's what pgbench uses
> >> for the same thing. That'd be more consistent.
> >
> > I like that idea. It seems like slightly better terminology.
> >
>
> I like it too. I also just noticed that I duplicated the var type by
> mistake in the documentation :/
>
> Attached patch fixes both.
>
>
I also agree, and thanks for doing the work :) Applied!

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/