Description
Bug report
When using a ProcessPoolExecutor with forked child processes, if one of the child processes suddenly dies (segmentation fault, not a Python exception) and if simultaneously data is being sent into the call queue, then the parent process hangs forever.
Reproduction
import ctypes
from concurrent.futures import ProcessPoolExecutor
def segfault():
ctypes.string_at(0)
def func(i, data):
print(f"Start {i}.")
if i == 1:
segfault()
print(f"Done {i}.")
return i
data = list(range(100_000_000))
count = 10
with ProcessPoolExecutor(2) as pool:
list(pool.map(func, range(count), [data] * count))
print(f"OK")
In Python 3.8.10 it raises a BrokenProcessPool exception whereas in 3.9.13 and 3.10.5 it hangs.
Analysis
When a crash happens in a child process, all workers are terminated and they stop reading in communication pipes. However if data is being send in the call queue, the call queue thread which writes data from buffer to pipe (multiprocessing.queues.Queue._feed
) can get stuck in send_bytes(obj)
when the unix pipe it's writing to is full. _ExecutorManagerThread
is blocked in self.join_executor_internals()
on line
cpython/Lib/concurrent/futures/process.py
Line 515 in da49128
self.terminate_broken()
). The main thread itself is blocked on cpython/Lib/concurrent/futures/process.py
Line 775 in da49128
__exit__
method of the Executor.
Proposed solution
Drain call queue buffer either in terminate_broken
method before calling join_executor_internals
or in queue close
method.
I will create a pull request with a possible implementation.
Your environment
- CPython versions tested on: reproduced in 3.10.5 and 3.9.13 (works well in 3.8.10: BrokenProcessPool exception)
- Operating system and architecture: Linux, x86_64