Description
Bug report
Bug description:
Apologies if this is a duplicate. I couldn’t find a similar report, though.
The issue and how to reproduce
We’re seeing a performance regression since 124af17. The best way to reproduce it is to spawn lots of processes from a ThreadPoolExecutor
. For example:
#!/usr/bin/env python3
from concurrent.futures import ThreadPoolExecutor, wait
from subprocess import Popen
def target(i):
p = Popen(['touch', f'/tmp/reproc-{i}'])
p.communicate()
executor = ThreadPoolExecutor(max_workers=64)
futures = set()
for i in range(10_000):
futures.add(executor.submit(target, i))
wait(futures)
Before, on 49cae39, it’s roughly this:
real 0m2.419s
user 0m4.524s
sys 0m0.976s
Since 124af17, it’s roughly this:
real 0m11.772s
user 0m10.287s
sys 0m14.409s
An attempt at an analysis and possible fix
strace
shows that the new code doesn’t use vfork()
anymore but clone()
. I believe that the reason for this is an incorrect check of num_groups
(or extra_group_size
, as it is now called on main
).
124af17 checks if extra_group_size
is less than zero to determine if we can use vfork()
. Is that correct? Maybe this should be equal to zero? I’m talking about these two locations (diff relative to main
/9e56eedd018e1a46):
diff --git a/Modules/_posixsubprocess.c b/Modules/_posixsubprocess.c
index 2898eedc3e..fb6c235901 100644
--- a/Modules/_posixsubprocess.c
+++ b/Modules/_posixsubprocess.c
@@ -889,7 +889,7 @@ do_fork_exec(char *const exec_array[],
/* These are checked by our caller; verify them in debug builds. */
assert(uid == (uid_t)-1);
assert(gid == (gid_t)-1);
- assert(extra_group_size < 0);
+ assert(extra_group_size == 0);
assert(preexec_fn == Py_None);
/* Drop the GIL so that other threads can continue execution while this
@@ -1208,7 +1208,7 @@ subprocess_fork_exec_impl(PyObject *module, PyObject *process_args,
/* Use vfork() only if it's safe. See the comment above child_exec(). */
sigset_t old_sigs;
if (preexec_fn == Py_None && allow_vfork &&
- uid == (uid_t)-1 && gid == (gid_t)-1 && extra_group_size < 0) {
+ uid == (uid_t)-1 && gid == (gid_t)-1 && extra_group_size == 0) {
/* Block all signals to ensure that no signal handlers are run in the
* child process while it shares memory with us. Note that signals
* used internally by C libraries won't be blocked by
extra_group_size
is the result of the call to PySequence_Size(extra_groups_packed)
. If I understand the docs correctly, then this function only returns negative values to indicate errors. This error condition is already checked, right after the call itself:
extra_group_size = PySequence_Size(extra_groups_packed);
if (extra_group_size < 0)
goto cleanup;
Later in the code, extra_group_size
can never be less than zero. It can, however, be equal to zero if extra_groups
is an empty list. I believe this is what was meant to be checked here.
I’ll happily open a PR for this if you agree that this is the way to go.
CPython versions tested on:
3.11, 3.12, 3.13, CPython main branch
Operating systems tested on:
Linux
Linked PRs
Metadata
Metadata
Assignees
Labels
Projects
Status