Skip to content

subprocess.Popen: Performance regression on Linux since 124af17b6e [CVE-2023-6507] #112334

Closed
@vain

Description

@vain

Bug report

Bug description:

Apologies if this is a duplicate. I couldn’t find a similar report, though.

The issue and how to reproduce

We’re seeing a performance regression since 124af17. The best way to reproduce it is to spawn lots of processes from a ThreadPoolExecutor. For example:

#!/usr/bin/env python3


from concurrent.futures import ThreadPoolExecutor, wait
from subprocess import Popen


def target(i):
    p = Popen(['touch', f'/tmp/reproc-{i}'])
    p.communicate()


executor = ThreadPoolExecutor(max_workers=64)
futures = set()

for i in range(10_000):
    futures.add(executor.submit(target, i))

wait(futures)

Before, on 49cae39, it’s roughly this:

real    0m2.419s
user    0m4.524s
sys     0m0.976s

Since 124af17, it’s roughly this:

real    0m11.772s
user    0m10.287s
sys     0m14.409s

An attempt at an analysis and possible fix

strace shows that the new code doesn’t use vfork() anymore but clone(). I believe that the reason for this is an incorrect check of num_groups (or extra_group_size, as it is now called on main).

124af17 checks if extra_group_size is less than zero to determine if we can use vfork(). Is that correct? Maybe this should be equal to zero? I’m talking about these two locations (diff relative to main/9e56eedd018e1a46):

diff --git a/Modules/_posixsubprocess.c b/Modules/_posixsubprocess.c
index 2898eedc3e..fb6c235901 100644
--- a/Modules/_posixsubprocess.c
+++ b/Modules/_posixsubprocess.c
@@ -889,7 +889,7 @@ do_fork_exec(char *const exec_array[],
         /* These are checked by our caller; verify them in debug builds. */
         assert(uid == (uid_t)-1);
         assert(gid == (gid_t)-1);
-        assert(extra_group_size < 0);
+        assert(extra_group_size == 0);
         assert(preexec_fn == Py_None);

         /* Drop the GIL so that other threads can continue execution while this
@@ -1208,7 +1208,7 @@ subprocess_fork_exec_impl(PyObject *module, PyObject *process_args,
     /* Use vfork() only if it's safe. See the comment above child_exec(). */
     sigset_t old_sigs;
     if (preexec_fn == Py_None && allow_vfork &&
-        uid == (uid_t)-1 && gid == (gid_t)-1 && extra_group_size < 0) {
+        uid == (uid_t)-1 && gid == (gid_t)-1 && extra_group_size == 0) {
         /* Block all signals to ensure that no signal handlers are run in the
          * child process while it shares memory with us. Note that signals
          * used internally by C libraries won't be blocked by

extra_group_size is the result of the call to PySequence_Size(extra_groups_packed). If I understand the docs correctly, then this function only returns negative values to indicate errors. This error condition is already checked, right after the call itself:

extra_group_size = PySequence_Size(extra_groups_packed);

if (extra_group_size < 0)
    goto cleanup;

Later in the code, extra_group_size can never be less than zero. It can, however, be equal to zero if extra_groups is an empty list. I believe this is what was meant to be checked here.

I’ll happily open a PR for this if you agree that this is the way to go.

CPython versions tested on:

3.11, 3.12, 3.13, CPython main branch

Operating systems tested on:

Linux

Linked PRs

Metadata

Metadata

Labels

3.12only security fixes3.13bugs and security fixesextension-modulesC modules in the Modules dirperformancePerformance or resource usagerelease-blockertype-bugAn unexpected behavior, bug, or errortype-securityA security issue

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions