Description
Bug report
Bug description:
PR incoming! It's a 10 second fix.
TLDR
BaseSelectorEventLoop._accept_connection
incorrectly return
s early from its for _ in range(backlog)
loop when accept(2)
returns -ECONNABORTED
(raised in Python as ConnectionAbortedError
), whereas it should continue
. This was introduced in #27906 by this commit, which whilst great, had a slight oversight in not separating ConnectionAbortedError
from (BlockingIOError
and InterruptedError
) when putting them inside a loop ;) Ironically the commit was introduced to give a more contiguous timeslot for accepting sockets in an eventloop, and now with the fix to this issue it'll be even more contiguous on OpenBSD, continuing past the aborted connections instead of the event loop having to re-poll the server socket and call _accept_connection
again. All is good! :D
A brief explanation / reproduction of ECONNABORTED
from accept(2)
, for AF_INET
on OpenBSD
It's worth writing this up as there is not much documentation online about ECONNABORTED
s occurrences from accept(2)
, and I have been intermittently in pursuit of this errno for over 2 years!
Some OS kernels including OpenBSD and Linux (tested and confirmed) continue queueing connections that were aborted before calling accept(2)
. However the behaviour accept
's return value differs between OpenBSD and Linux!
Suppose the following sequence of TCP packets occurs when a client connects to a server, the client's kernel and server's kernel communicating over TCP/IP, and this happens before the server's userspace program calls accept
on its listening socket:
>SYN, <SYNACK, >ACK, >RST
, ie a standard TCP 3WHS but followed by the client sending a RST
.
- On OpenBSD when the server's userspace program calls
accept
on the listening socket it receives-1
, witherrno==ECONNABORTED
- On Linux when the server's userspace program calls
accept
on the listening socket it receives0
, with noerrno
set, ie everything is fine. But of course when trying tosend
on the socketEPIPE
is either set aserrno
or delivered asSIGPIPE
One can test this with the following script
#!/usr/bin/env python3
import socket
import time
import struct
ADDR = ("127.0.0.1", 3156)
def connect_disconnect_client(*, enable_rst: bool):
client = socket.socket()
if enable_rst:
# send an RST when we call close()
client.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack("ii", 1, 0))
client.connect(ADDR)
client.close()
time.sleep(0.1) # let the FIN/RST reach the kernel's TCP/IP machinery
def main() -> None:
server_server = socket.socket()
server_server.bind(ADDR)
server_server.listen(64)
connect_disconnect_client(enable_rst=True)
connect_disconnect_client(enable_rst=False)
connect_disconnect_client(enable_rst=False)
connect_disconnect_client(enable_rst=True)
connect_disconnect_client(enable_rst=False)
for _ in range(5):
try:
server_client, server_client_addr = server_server.accept()
print("Okay")
except ConnectionAbortedError as e:
print(f"{e.strerror}")
if __name__ == "__main__":
main()
On Linux the output is
Okay
Okay
Okay
Okay
Okay
On OpenBSD the output is
Software caused connection abort
Okay
Okay
Software caused connection abort
Okay
Observe that both kernels kept the aborted connections queued. I used OpenBSD 7.4 on Instant Workstation to test this.
BaseSelectorEventLoop._accept_connection
's fix
To demonstrate asyncio
's issue, we create the following test script to connect five clients to a base_events.Server
being served in a selector_events.BaseSelectorEventLoop
. Two of the clients are going to be naughty and send an RST
to abort their connection before it is accepted into userspace. We monkey patch in a print()
statement just to let us know when BaseSelectorEventLoop._accept_connection
is called. Ideally this should be once, since the server's default backlog
of 100
is sufficient, but as we will see OpenBSD's raising of ConnectionAbortedError
changes this:
#!/usr/bin/env python3
import socket
import asyncio
import time
import struct
ADDR = ("127.0.0.1", 31415)
def connect_disconnect_client(*, enable_rst: bool):
client = socket.socket()
if enable_rst:
# send an RST when we call close()
client.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack("ii", 1, 0))
client.connect(ADDR)
client.close()
time.sleep(0.1) # let the FIN/RST reach the kernel's TCP/IP machinery
async def handler(reader: asyncio.StreamReader, writer: asyncio.StreamWriter):
try:
print("connected handler")
finally:
writer.close()
# monkey patch in a print() statement just for debugging sake
import asyncio.selector_events
_accept_connection_old = asyncio.selector_events.BaseSelectorEventLoop._accept_connection
def _accept_connection_new(*args, **kwargs):
print("_accept_connection called")
return _accept_connection_old(*args, **kwargs)
asyncio.selector_events.BaseSelectorEventLoop._accept_connection = _accept_connection_new
async def amain() -> None:
server = await asyncio.start_server(handler, *ADDR)
connect_disconnect_client(enable_rst=True)
connect_disconnect_client(enable_rst=False)
connect_disconnect_client(enable_rst=False)
connect_disconnect_client(enable_rst=True)
connect_disconnect_client(enable_rst=False)
await server.start_serving() # listen(3)
await server.serve_forever()
def main() -> None:
asyncio.run(amain())
if __name__ == "__main__":
main()
On Linux the output is
_accept_connection called
connected handler
connected handler
connected handler
connected handler
connected handler
On OpenBSD the output is
_accept_connection called
_accept_connection called
_accept_connection called
connected handler
connected handler
connected handler
The first _accept_connection
returns immediately because of client 1's ECONNABORTED
. The second _accept_connection
brings in clients 2 and 3, then returns because of 4's ECONNABORTED
, and then the third _accept_connection
returns due to client 5's ECONNABORTED
.
With the PR patch incoming the OpenBSD behaviour / output is corrected to
_accept_connection called
connected handler
connected handler
connected handler
All connections are accepted in one single stroke of _accept_connection
.
The Odyssey for ECONNABORTED
on Linux
This is just a personal addendum for the record.
I use Linux and I like collecting all the signal(7)
s and errno(3)
s, it reminds me in a way of Lego Star Wars; it's nice to have a complete collection. Part of Python's exception hierarchy is
ConnectionError
├── BrokenPipeError
├── ConnectionAbortedError
├── ConnectionRefusedError
└── ConnectionResetError
In the past two years of me doing socket programming on Linux, for AF_INET
and AF_UNIX
I have easily been able to produce ConnectionRefusedError
, ConnectionResetError
, and BrokenPipeError
, but I have still never been able to produce ConnectionAbortedError
with accept()
. Looking at the Linux kernel's source code for net/socket.c
and net/ipv4/
implementing sockets and TCP/IP I can only conclude that ECONNABORTED
could possibly occur as a race condition between ops->accept()
and ops->getname()
, where there is a nanosecond when the socket is not protected by a spinlock.
I've tried various TCP situations including TCP_FASTOPEN
, TCP_NODELAY
, O_NONBLOCK
connect()
s, combined with SO_LINGER
, trying to create the most disgusting TCP handshakes, all to no avail. SYN,SYNACK,RST
gets dropped and does not get accept()
ed.
So to any similarly eclectically minded programmers out there who wish to know for the record how to get accept(2)
to produce ECONNABORTED
: just try the scripts above on OpenBSD and save your time lol!
This one's for you, OpenBSD friends, thanks for OpenSSH!
CPython versions tested on:
CPython main branch
Operating systems tested on:
Other
Linked PRs
Metadata
Metadata
Assignees
Labels
Projects
Status