Skip to content

Commit 7247e24

Browse files
committed
Try to read data from the socket in pqSendSome's write_failed paths.
Even when we've concluded that we have a hard write failure on the socket, we should continue to try to read data. This gives us an opportunity to collect any final error message that the backend might have sent before closing the connection; moreover it is the job of pqReadData not pqSendSome to close the socket once EOF is detected. Due to an oversight in 1f39a1c, pqSendSome failed to try to collect data in the case where we'd already set write_failed. The problem was masked for ordinary query operations (which really only make one write attempt anyway), but COPY to the server would continue to send data indefinitely after a mid-COPY connection loss. Hence, add pqReadData calls into the paths where pqSendSome drops data because of write_failed. If we've lost the connection, this will eventually result in closing the socket and setting CONNECTION_BAD, which will cause PQputline and siblings to report failure, allowing the application to terminate the COPY sooner. (Basically this restores what happened before 1f39a1c.) There are related issues that this does not solve; for example, if the backend sends an error but doesn't drop the connection, we did and still will keep pumping COPY data as long as the application sends it. Fixing that will require application-visible behavior changes though, and anyway it's an ancient behavior that we've had few complaints about. For now I'm just trying to fix the regression from 1f39a1c. Per a complaint from Andres Freund. Back-patch into v12 where 1f39a1c came in. Discussion: https://postgr.es/m/[email protected]
1 parent 92f33bb commit 7247e24

File tree

1 file changed

+20
-1
lines changed

1 file changed

+20
-1
lines changed

src/interfaces/libpq/fe-misc.c

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -823,6 +823,10 @@ pqReadData(PGconn *conn)
823823
* Return 0 on success, -1 on failure and 1 when not all data could be sent
824824
* because the socket would block and the connection is non-blocking.
825825
*
826+
* Note that this is also responsible for consuming data from the socket
827+
* (putting it in conn->inBuffer) in any situation where we can't send
828+
* all the specified data immediately.
829+
*
826830
* Upon write failure, conn->write_failed is set and the error message is
827831
* saved in conn->write_err_msg, but we clear the output buffer and return
828832
* zero anyway; this is because callers should soldier on until it's possible
@@ -842,12 +846,20 @@ pqSendSome(PGconn *conn, int len)
842846
* on that connection. Even if the kernel would let us, we've probably
843847
* lost message boundary sync with the server. conn->write_failed
844848
* therefore persists until the connection is reset, and we just discard
845-
* all data presented to be written.
849+
* all data presented to be written. However, as long as we still have a
850+
* valid socket, we should continue to absorb data from the backend, so
851+
* that we can collect any final error messages.
846852
*/
847853
if (conn->write_failed)
848854
{
849855
/* conn->write_err_msg should be set up already */
850856
conn->outCount = 0;
857+
/* Absorb input data if any, and detect socket closure */
858+
if (conn->sock != PGINVALID_SOCKET)
859+
{
860+
if (pqReadData(conn) < 0)
861+
return -1;
862+
}
851863
return 0;
852864
}
853865

@@ -917,6 +929,13 @@ pqSendSome(PGconn *conn, int len)
917929

918930
/* Discard queued data; no chance it'll ever be sent */
919931
conn->outCount = 0;
932+
933+
/* Absorb input data if any, and detect socket closure */
934+
if (conn->sock != PGINVALID_SOCKET)
935+
{
936+
if (pqReadData(conn) < 0)
937+
return -1;
938+
}
920939
return 0;
921940
}
922941
}

0 commit comments

Comments
 (0)