Lists: | pgsql-hackers |
---|
From: | Robins Tharakan <tharakan(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Add pg_accept_connections_start_time() for better uptime calculation |
Date: | 2025-02-16 07:05:18 |
Message-ID: | CAEP4nAx-Mp52k=tyAEJrD5SS4nOJm60uz4J3KgtKfTt2qPRyzA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi,
This patch introduces a new function pg_accept_connections_start_time().
Currently, pg_postmaster_start_time() is used to determine when the
database started. However, this is not accurate since the postmaster
process can sometimes be up whereas the database is not accepting
connections (for e.g. during child process crash [1],
long crash-recovery etc.)
This can lead to inaccurate database uptime calculations.
The new function, pg_accept_connections_start_time(), returns the
time when the database became ready to accept connections. This is
helpful, since in both of the above cases (quick crash-recovery on
child process crash, long crash-recovery on startup), this timestamp
would get reset - an example scenario given below [3].
This function can be used to tell:
1. Whether the database did a quick crash-recovery (without
a postmaster restart) in a production setup. In particular, this would
help a long-running client confirm whether a connection blip was a
server restart, or a session-abort / network / client-side issue [2].
2. Calculate database uptime (more accurately)
The patch passes `make check`, adds a brief function description
in func.sgml, works in single-user mode and applies cleanly on
master as of 9e17ac997 (14th Feb).
Look forward to feedback, but in particular:
- Good to have a second opinion on a better position to capture
timestamp during startup in single-user mode.
- Function name - I think it is too verbose, but it felt most unambiguous.
-
Thanks
Robins
1. pg_postmaster_start_time() doesn't tell when db became available:
https://www.postgresql.org/message-id/598d4a75-57d9-b41a-a927-7584be6278b2%40rblst.info
2. IIUC knowing that a crash-recovery happened may have helped here?
https://www.postgresql.org/message-id/954419.1623092217%40sss.pgh.pa.us
3. Sample usage of the function - Kill 'walwriter' to force postmaster
to do a quick crash-recovery - where pg_postmaster_start_time() does
not change, pg_accept_connections_start_time() does get updated
to the time when database (once again) became available for connections:
```
robins(at)camry:~/proj/postgres$ psql postgres -c "select
pg_accept_connections_start_time(), pg_postmaster_start_time();"
pg_accept_connections_start_time | pg_postmaster_start_time
----------------------------------+----------------------------------
2025-02-16 11:40:37.355906+10:30 | 2025-02-16 11:40:37.351776+10:30
(1 row)
robins(at)camry:~/proj/postgres$ ps -ef | grep postgres
robins 2935044 1 0 11:40 ? 00:00:00
/home/robins/proj/localpg/bin/postgres -D data
robins 2935045 2935044 0 11:40 ? 00:00:00 postgres: checkpointer
robins 2935046 2935044 0 11:40 ? 00:00:00 postgres: background
writer
robins 2935048 2935044 0 11:40 ? 00:00:00 postgres: walwriter
robins 2935049 2935044 0 11:40 ? 00:00:00 postgres: autovacuum
launcher
robins 2935050 2935044 0 11:40 ? 00:00:00 postgres: logical
replication launcher
robins 2937754 1769260 0 13:57 pts/1 00:00:00 grep --color=auto
postgres
robins(at)camry:~/proj/postgres$ kill -9 `ps -ef | grep postgres | grep
walwriter | awk '{print $2}'`
robins(at)camry:~/proj/postgres$ ps -ef | grep postgres
robins 2935044 1 0 11:40 ? 00:00:00
/home/robins/proj/localpg/bin/postgres -D data
robins 2937761 2935044 0 13:57 ? 00:00:00 postgres: checkpointer
robins 2937762 2935044 0 13:57 ? 00:00:00 postgres: background
writer
robins 2937763 2935044 0 13:57 ? 00:00:00 postgres: walwriter
robins 2937764 2935044 0 13:57 ? 00:00:00 postgres: autovacuum
launcher
robins 2937766 1769260 0 13:57 pts/1 00:00:00 grep --color=auto
postgres
robins(at)camry:~/proj/postgres$ psql postgres -c "select
pg_accept_connections_start_time(), pg_postmaster_start_time();"
pg_accept_connections_start_time | pg_postmaster_start_time
----------------------------------+----------------------------------
2025-02-16 13:57:52.914587+10:30 | 2025-02-16 11:40:37.351776+10:30
(1 row)
```
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Add-support-for-pg_accept_connections_start_time.patch | application/x-patch | 6.0 KB |
From: | Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> |
---|---|
To: | Robins Tharakan <tharakan(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Add pg_accept_connections_start_time() for better uptime calculation |
Date: | 2025-02-16 10:47:46 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Sun, 2025-02-16 at 17:35 +1030, Robins Tharakan wrote:
> This patch introduces a new function pg_accept_connections_start_time().
>
> Currently, pg_postmaster_start_time() is used to determine when the
> database started. However, this is not accurate since the postmaster
> process can sometimes be up whereas the database is not accepting
> connections (for e.g. during child process crash [1],
> long crash-recovery etc.)
>
> This can lead to inaccurate database uptime calculations.
>
> The new function, pg_accept_connections_start_time(), returns the
> time when the database became ready to accept connections. This is
> helpful, since in both of the above cases (quick crash-recovery on
> child process crash, long crash-recovery on startup), this timestamp
> would get reset - an example scenario given below [3].
>
> This function can be used to tell:
> 1. Whether the database did a quick crash-recovery (without
> a postmaster restart) in a production setup. In particular, this would
> help a long-running client confirm whether a connection blip was a
> server restart, or a session-abort / network / client-side issue [2].
> 2. Calculate database uptime (more accurately)
>
> The patch passes `make check`, adds a brief function description
> in func.sgml, works in single-user mode and applies cleanly on
> master as of 9e17ac997 (14th Feb).
>
> Look forward to feedback, but in particular:
> - Good to have a second opinion on a better position to capture
> timestamp during startup in single-user mode.
> - Function name - I think it is too verbose, but it felt most unambiguous.
I myself have never felt the need for such a function - but perhaps it
can be useful in these times of hosted database services, when accessing
the log file might be difficult.
I would probably have called the function pg_uptime(), yet maybe that
is too Unix-centric.
Would it make sense to add that information to the output of
"pg_ctl status" as well? Perhaps as a new option, so that default output
format doesn't change.
Yours,
Laurenz Albe
--
*E-Mail Disclaimer*
Der Inhalt dieser E-Mail ist ausschliesslich fuer den
bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorgesehene Adressat
dieser E-Mail oder dessen Vertreter sein sollten, so beachten Sie bitte,
dass jede Form der Kenntnisnahme, Veroeffentlichung, Vervielfaeltigung oder
Weitergabe des Inhalts dieser E-Mail unzulaessig ist. Wir bitten Sie, sich
in diesem Fall mit dem Absender der E-Mail in Verbindung zu setzen.
*CONFIDENTIALITY NOTICE & DISCLAIMER
*This message and any attachment are
confidential and may be privileged or otherwise protected from disclosure
and solely for the use of the person(s) or entity to whom it is intended.
If you have received this message in error and are not the intended
recipient, please notify the sender immediately and delete this message and
any attachment from your system. If you are not the intended recipient, be
advised that any use of this message is prohibited and may be unlawful, and
you must not copy this message or attachment or disclose the contents to
any other person.
From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> |
Cc: | Robins Tharakan <tharakan(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Add pg_accept_connections_start_time() for better uptime calculation |
Date: | 2025-02-17 00:14:09 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Sun, Feb 16, 2025 at 11:47:46AM +0100, Laurenz Albe wrote:
> I myself have never felt the need for such a function - but perhaps it
> can be useful in these times of hosted database services, when accessing
> the log file might be difficult.
Yeah. I'm not surprised by that.
> I would probably have called the function pg_uptime(), yet maybe that
> is too Unix-centric.
pg_accept_connections_start_time() is very verbose. As it is about
the postmaster being open to connections, perhaps
pg_postmaster_open_time() for consistency with the existing start
function?
> Would it make sense to add that information to the output of
> "pg_ctl status" as well? Perhaps as a new option, so that default output
> format doesn't change.
Hmm. Sounds to me that we could just have a SQL function that's able
to parse postmaster.pid and returns its data in a format that makes
its post-processing easier if we're OK to live with the fact that this
could only be queried when the postmaster is able to accept
connections, like a JSON object with dedicated key/value pairs.
A separate function that only returns the open-for-connection time has
benefits on its own because its execution can be granted to separate
users, without knowing about the full contents of postmaster.pid.
That last part should matter for cloud vendors. And this file only
knows about MyStartTime.
--
Michael
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Robins Tharakan <tharakan(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Add pg_accept_connections_start_time() for better uptime calculation |
Date: | 2025-02-17 00:53:06 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Michael Paquier <michael(at)paquier(dot)xyz> writes:
> On Sun, Feb 16, 2025 at 11:47:46AM +0100, Laurenz Albe wrote:
>> Would it make sense to add that information to the output of
>> "pg_ctl status" as well? Perhaps as a new option, so that default output
>> format doesn't change.
> A separate function that only returns the open-for-connection time has
> benefits on its own because its execution can be granted to separate
> users, without knowing about the full contents of postmaster.pid.
> That last part should matter for cloud vendors. And this file only
> knows about MyStartTime.
Yeah. Making that happen would require extending the contents of
postmaster.pid, which is likely to break assorted peoples' tooling.
I doubt that this feature clears the bar for justifying that.
regards, tom lane
From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Robins Tharakan <tharakan(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Add pg_accept_connections_start_time() for better uptime calculation |
Date: | 2025-02-17 03:05:28 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Sun, Feb 16, 2025 at 07:53:06PM -0500, Tom Lane wrote:
> Yeah. Making that happen would require extending the contents of
> postmaster.pid, which is likely to break assorted peoples' tooling.
> I doubt that this feature clears the bar for justifying that.
Sure, agreed to not touch postmaster.pid.
Now my point is also that I would not object to a patch that wants to
show the information of postmaster.pid in a nicer way than it is now
through SQL, as one tuple with one attribute per field written, or
something like a JSON object. With the format of postmaster.pid being
very stable across releases, perhaps one attribute per field is
better.
--
Michael
From: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Robins Tharakan <tharakan(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Add pg_accept_connections_start_time() for better uptime calculation |
Date: | 2025-03-04 16:52:33 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 2025/02/16 16:05, Robins Tharakan wrote:
> Hi,
>
> This patch introduces a new function pg_accept_connections_start_time().
>
> Currently, pg_postmaster_start_time() is used to determine when the
> database started. However, this is not accurate since the postmaster
> process can sometimes be up whereas the database is not accepting
> connections (for e.g. during child process crash [1],
> long crash-recovery etc.)
>
> This can lead to inaccurate database uptime calculations.
>
> The new function, pg_accept_connections_start_time(), returns the
> time when the database became ready to accept connections.
Shouldn't this function also handle the time when the postmaster
starts accepting read-only connections? With the patch, it doesn’t
seem to cover that case, and it looks like an unexpected timestamp
is returned when run on a standby server. Maybe the function should
return a record with two columns — one for when the postmaster
starts accepting read-only connections and another for normal
connections?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
From: | Robins Tharakan <tharakan(at)gmail(dot)com> |
---|---|
To: | Fujii Masao <fujii(at)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Add pg_accept_connections_start_time() for better uptime calculation |
Date: | 2025-03-06 12:55:18 |
Message-ID: | CAEP4nAxEYDsAEx1N8nfC6MOAPNkfypiwS3ScH34JXOoW40k+Nw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi,
Thanks for taking a look at the patch, and for your feedback.
On Wed, 5 Mar 2025 at 03:22, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
wrote:
> On 2025/02/16 16:05, Robins Tharakan wrote:
> > This patch introduces a new function pg_accept_connections_start_time().
>
Shouldn't this function also handle the time when the postmaster
> starts accepting read-only connections? With the patch, it doesn’t
> seem to cover that case, and it looks like an unexpected timestamp
> is returned when run on a standby server. Maybe the function should
> return a record with two columns — one for when the postmaster
> starts accepting read-only connections and another for normal
> connections?
>
I am not sure if I understand the question. For a given (re)start, a
database user would either be looking for a read-only or a read-write
start time (based on whether the server is a standby or not). Are you
saying that for a given instance of start, a database user would be
interested in two timestamps (once when the database became
ready to accept read-only connections, and then quickly thereafter
also began accepting read-writes?) Even if possible, that feels
unnecessary - but I may be misunderstanding here.
But you bring up a good point around standbys. Attached is v2 of
the patch that returns a more accurate time on a standby (ie. it
captures the time just after emitting a message that it's ready for
read-only connections).
Also, while at it, I also implemented Michael's suggestion [1] for
a better name pg_postmaster_open_time() which is in line with
the existing pg_postmaster_start_time().
Also, updated the documentation to reflect the above, patch
passes `make check` and applies cleanly on HEAD as of
588acf6d0ec1 (6th Mar).
-
robins
Reference:
1. https://www.postgresql.org/message-id/Z7J_UZYfvtPiNMSy%40paquier.xyz
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Add-support-for-pg_postmaster_open_time.patch | application/x-patch | 6.6 KB |
From: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Robins Tharakan <tharakan(at)gmail(dot)com>, Fujii Masao <fujii(at)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Add pg_accept_connections_start_time() for better uptime calculation |
Date: | 2025-03-06 13:34:35 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 2025/03/06 21:55, Robins Tharakan wrote:
> Hi,
>
> Thanks for taking a look at the patch, and for your feedback.
>
> On Wed, 5 Mar 2025 at 03:22, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com <mailto:masao(dot)fujii(at)oss(dot)nttdata(dot)com>> wrote:
>
> On 2025/02/16 16:05, Robins Tharakan wrote:
> > This patch introduces a new function pg_accept_connections_start_time().
>
> Shouldn't this function also handle the time when the postmaster
> starts accepting read-only connections? With the patch, it doesn’t
> seem to cover that case, and it looks like an unexpected timestamp
> is returned when run on a standby server. Maybe the function should
> return a record with two columns — one for when the postmaster
> starts accepting read-only connections and another for normal
> connections?
>
>
>
> I am not sure if I understand the question. For a given (re)start, a
> database user would either be looking for a read-only or a read-write
> start time (based on whether the server is a standby or not). Are you
> saying that for a given instance of start, a database user would be
> interested in two timestamps (once when the database became
> ready to accept read-only connections, and then quickly thereafter
> also began accepting read-writes?) Even if possible, that feels
> unnecessary - but I may be misunderstanding here.
With the v1 patch, running pg_accept_connections_start_time() on
a standby returned an unexpected timestamp:
=# select * from pg_accept_connections_start_time();
pg_accept_connections_start_time
----------------------------------
2000-01-01 09:00:00+09
So my comment meant that this seems odd and should be fixed.
Since I've not fully understood how this function is used,
I'm not sure what timestamp should be returned in the standby.
But I just thought it seems intuitive to return the timestamp
when the standby started accepting read-only connections, in that case.
> But you bring up a good point around standbys. Attached is v2 of
> the patch that returns a more accurate time on a standby (ie. it
> captures the time just after emitting a message that it's ready for
> read-only connections).
Thanks for updating the patch!
With v2 patch. when the standby is promoted to primary,
the result of pg_postmaster_open_time() appears to switch to
the time when the primary began accepting normal connections.
If this is intentional, it's better to clarify this behavior
in the documentation.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION