Skip to content

asyncio's BaseSelectorEventLoop._accept_connection returns when it should continue on ConnectionAbortedError #127529

Closed
@jb2170

Description

@jb2170

Bug report

Bug description:

PR incoming! It's a 10 second fix.

TLDR

BaseSelectorEventLoop._accept_connection incorrectly returns early from its for _ in range(backlog) loop when accept(2) returns -ECONNABORTED (raised in Python as ConnectionAbortedError), whereas it should continue. This was introduced in #27906 by this commit, which whilst great, had a slight oversight in not separating ConnectionAbortedError from (BlockingIOError and InterruptedError) when putting them inside a loop ;) Ironically the commit was introduced to give a more contiguous timeslot for accepting sockets in an eventloop, and now with the fix to this issue it'll be even more contiguous on OpenBSD, continuing past the aborted connections instead of the event loop having to re-poll the server socket and call _accept_connection again. All is good! :D

A brief explanation / reproduction of ECONNABORTED from accept(2), for AF_INET on OpenBSD

It's worth writing this up as there is not much documentation online about ECONNABORTEDs occurrences from accept(2), and I have been intermittently in pursuit of this errno for over 2 years!

Some OS kernels including OpenBSD and Linux (tested and confirmed) continue queueing connections that were aborted before calling accept(2). However the behaviour accept's return value differs between OpenBSD and Linux!

Suppose the following sequence of TCP packets occurs when a client connects to a server, the client's kernel and server's kernel communicating over TCP/IP, and this happens before the server's userspace program calls accept on its listening socket:

>SYN, <SYNACK, >ACK, >RST, ie a standard TCP 3WHS but followed by the client sending a RST.

  • On OpenBSD when the server's userspace program calls accept on the listening socket it receives -1, with errno==ECONNABORTED
  • On Linux when the server's userspace program calls accept on the listening socket it receives 0, with no errno set, ie everything is fine. But of course when trying to send on the socket EPIPE is either set as errno or delivered as SIGPIPE

One can test this with the following script

#!/usr/bin/env python3

import socket
import time
import struct

ADDR = ("127.0.0.1", 3156)

def connect_disconnect_client(*, enable_rst: bool):
    client = socket.socket()
    if enable_rst:
        # send an RST when we call close()
        client.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack("ii", 1, 0))
    client.connect(ADDR)
    client.close()
    time.sleep(0.1) # let the FIN/RST reach the kernel's TCP/IP machinery

def main() -> None:
    server_server = socket.socket()
    server_server.bind(ADDR)
    server_server.listen(64)

    connect_disconnect_client(enable_rst=True)
    connect_disconnect_client(enable_rst=False)
    connect_disconnect_client(enable_rst=False)
    connect_disconnect_client(enable_rst=True)
    connect_disconnect_client(enable_rst=False)

    for _ in range(5):
        try:
            server_client, server_client_addr = server_server.accept()
            print("Okay")
        except ConnectionAbortedError as e:
            print(f"{e.strerror}")

if __name__ == "__main__":
    main()

On Linux the output is

Okay
Okay
Okay
Okay
Okay

On OpenBSD the output is

Software caused connection abort
Okay
Okay
Software caused connection abort
Okay

Observe that both kernels kept the aborted connections queued. I used OpenBSD 7.4 on Instant Workstation to test this.

BaseSelectorEventLoop._accept_connection's fix

To demonstrate asyncio's issue, we create the following test script to connect five clients to a base_events.Server being served in a selector_events.BaseSelectorEventLoop. Two of the clients are going to be naughty and send an RST to abort their connection before it is accepted into userspace. We monkey patch in a print() statement just to let us know when BaseSelectorEventLoop._accept_connection is called. Ideally this should be once, since the server's default backlog of 100 is sufficient, but as we will see OpenBSD's raising of ConnectionAbortedError changes this:

#!/usr/bin/env python3

import socket
import asyncio
import time
import struct

ADDR = ("127.0.0.1", 31415)

def connect_disconnect_client(*, enable_rst: bool):
    client = socket.socket()
    if enable_rst:
        # send an RST when we call close()
        client.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack("ii", 1, 0))
    client.connect(ADDR)
    client.close()
    time.sleep(0.1) # let the FIN/RST reach the kernel's TCP/IP machinery

async def handler(reader: asyncio.StreamReader, writer: asyncio.StreamWriter):
    try:
        print("connected handler")
    finally:
        writer.close()

# monkey patch in a print() statement just for debugging sake
import asyncio.selector_events
_accept_connection_old = asyncio.selector_events.BaseSelectorEventLoop._accept_connection
def _accept_connection_new(*args, **kwargs):
    print("_accept_connection called")
    return _accept_connection_old(*args, **kwargs)
asyncio.selector_events.BaseSelectorEventLoop._accept_connection = _accept_connection_new

async def amain() -> None:
    server = await asyncio.start_server(handler, *ADDR)

    connect_disconnect_client(enable_rst=True)
    connect_disconnect_client(enable_rst=False)
    connect_disconnect_client(enable_rst=False)
    connect_disconnect_client(enable_rst=True)
    connect_disconnect_client(enable_rst=False)

    await server.start_serving() # listen(3)
    await server.serve_forever()

def main() -> None:
    asyncio.run(amain())

if __name__ == "__main__":
    main()

On Linux the output is

_accept_connection called
connected handler
connected handler
connected handler
connected handler
connected handler

On OpenBSD the output is

_accept_connection called
_accept_connection called
_accept_connection called
connected handler
connected handler
connected handler

The first _accept_connection returns immediately because of client 1's ECONNABORTED. The second _accept_connection brings in clients 2 and 3, then returns because of 4's ECONNABORTED, and then the third _accept_connection returns due to client 5's ECONNABORTED.

With the PR patch incoming the OpenBSD behaviour / output is corrected to

_accept_connection called
connected handler
connected handler
connected handler

All connections are accepted in one single stroke of _accept_connection.

The Odyssey for ECONNABORTED on Linux

This is just a personal addendum for the record.

I use Linux and I like collecting all the signal(7)s and errno(3)s, it reminds me in a way of Lego Star Wars; it's nice to have a complete collection. Part of Python's exception hierarchy is

ConnectionError
├── BrokenPipeError
├── ConnectionAbortedError
├── ConnectionRefusedError
└── ConnectionResetError

In the past two years of me doing socket programming on Linux, for AF_INET and AF_UNIX I have easily been able to produce ConnectionRefusedError, ConnectionResetError, and BrokenPipeError, but I have still never been able to produce ConnectionAbortedError with accept(). Looking at the Linux kernel's source code for net/socket.c and net/ipv4/ implementing sockets and TCP/IP I can only conclude that ECONNABORTED could possibly occur as a race condition between ops->accept() and ops->getname(), where there is a nanosecond when the socket is not protected by a spinlock.

I've tried various TCP situations including TCP_FASTOPEN, TCP_NODELAY, O_NONBLOCK connect()s, combined with SO_LINGER, trying to create the most disgusting TCP handshakes, all to no avail. SYN,SYNACK,RST gets dropped and does not get accept()ed.

So to any similarly eclectically minded programmers out there who wish to know for the record how to get accept(2) to produce ECONNABORTED: just try the scripts above on OpenBSD and save your time lol!

This one's for you, OpenBSD friends, thanks for OpenSSH!

CPython versions tested on:

CPython main branch

Operating systems tested on:

Other

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.12only security fixes3.13bugs and security fixes3.14bugs and security fixesstdlibPython modules in the Lib dirtopic-asynciotype-bugAn unexpected behavior, bug, or error

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions