Skip to content

Commit 8b0b630

Browse files
committed
Try to ensure that stats collector's receive buffer size is at least 100KB.
Since commit 4e37b3e, buildfarm member frogmouth has been failing occasionally with symptoms indicating that some expected stats data is getting dropped. The reason that that commit changed the behavior seems probably to be that more data is getting shoved at the collector in a short span of time. In current sources, the stats test's first session sends about 9KB of data while exiting, which is probably the same as what was sent just before wait_for_stats() in the previous test design. But now, the test's second session is starting up concurrently, and it sends another 2KB (presumably reflecting its initial catalog accesses). Since frogmouth is running on Windows XP, which reputedly has a default socket receive buffer size of only 8KB, it is not very surprising if this has put us over the threshold where the receive buffer can overflow and drop messages. The same mechanism could very easily explain the intermittent stats test failures we've been seeing for years, since background processes such as the bgwriter will sometimes send data concurrently with all this, and could thus cause occasional buffer overflows. Hence, insert some code into pgstat_init() to increase the stats socket's receive buffer size to 100KB if it's less than that. (On failure, emit a LOG message, but keep going.) Modern systems seem to have default sizes in the range of 100KB-250KB, but older platforms don't. I couldn't find any platforms that wouldn't accept 100KB, so in theory this won't cause any portability problems. If this is successful at reducing the buildfarm failure rate in HEAD, we should back-patch it, because it's certain that similar buffer overflows happen in the field on platforms with small buffer sizes. Going forward, there might be an argument for trying to increase the buffer size even more, but let's take a baby step first. Discussion: https://postgr.es/m/[email protected]
1 parent 59f4056 commit 8b0b630

File tree

1 file changed

+32
-0
lines changed

1 file changed

+32
-0
lines changed

src/backend/postmaster/pgstat.c

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,9 @@
9393
#define PGSTAT_POLL_LOOP_COUNT (PGSTAT_MAX_WAIT_TIME / PGSTAT_RETRY_DELAY)
9494
#define PGSTAT_INQ_LOOP_COUNT (PGSTAT_INQ_INTERVAL / PGSTAT_RETRY_DELAY)
9595

96+
/* Minimum receive buffer size for the collector's socket. */
97+
#define PGSTAT_MIN_RCVBUF (100 * 1024)
98+
9699

97100
/* ----------
98101
* The initial size hints for the hash tables used in the collector.
@@ -574,6 +577,35 @@ pgstat_init(void)
574577
goto startup_failed;
575578
}
576579

580+
/*
581+
* Try to ensure that the socket's receive buffer is at least
582+
* PGSTAT_MIN_RCVBUF bytes, so that it won't easily overflow and lose
583+
* data. Use of UDP protocol means that we are willing to lose data under
584+
* heavy load, but we don't want it to happen just because of ridiculously
585+
* small default buffer sizes (such as 8KB on older Windows versions).
586+
*/
587+
{
588+
int old_rcvbuf;
589+
int new_rcvbuf;
590+
ACCEPT_TYPE_ARG3 rcvbufsize = sizeof(old_rcvbuf);
591+
592+
if (getsockopt(pgStatSock, SOL_SOCKET, SO_RCVBUF,
593+
(char *) &old_rcvbuf, &rcvbufsize) < 0)
594+
{
595+
elog(LOG, "getsockopt(SO_RCVBUF) failed: %m");
596+
/* if we can't get existing size, always try to set it */
597+
old_rcvbuf = 0;
598+
}
599+
600+
new_rcvbuf = PGSTAT_MIN_RCVBUF;
601+
if (old_rcvbuf < new_rcvbuf)
602+
{
603+
if (setsockopt(pgStatSock, SOL_SOCKET, SO_RCVBUF,
604+
(char *) &new_rcvbuf, sizeof(new_rcvbuf)) < 0)
605+
elog(LOG, "setsockopt(SO_RCVBUF) failed: %m");
606+
}
607+
}
608+
577609
pg_freeaddrinfo_all(hints.ai_family, addrs);
578610

579611
return;

0 commit comments

Comments
 (0)