ProcessPoolExecutor deadlock when a child process crashes while data is being sent in call queue

**Bug report**

When using a ProcessPoolExecutor with forked child processes, if one of the child processes suddenly dies (segmentation fault, not a Python exception) and if simultaneously data is being sent into the call queue, then the parent process hangs forever.

*Reproduction*

```
import ctypes
from concurrent.futures import ProcessPoolExecutor


def segfault():
    ctypes.string_at(0)


def func(i, data):
    print(f"Start {i}.")
    if i == 1:
        segfault()
    print(f"Done {i}.")
    return i


data = list(range(100_000_000))
count = 10
with ProcessPoolExecutor(2) as pool:
    list(pool.map(func, range(count), [data] * count))
print(f"OK")
```

In Python 3.8.10 it raises a BrokenProcessPool exception whereas in 3.9.13 and 3.10.5 it hangs.

*Analysis*

When a crash happens in a child process, all workers are terminated and they stop reading in communication pipes. However if data is being send in the call queue, the call queue thread which writes data from buffer to pipe (`multiprocessing.queues.Queue._feed`) can get stuck in `send_bytes(obj)` when the unix pipe it's writing to is full. `_ExecutorManagerThread` is blocked in `self.join_executor_internals()` on line https://github.com/python/cpython/blob/da4912885f11f525a82a83f795ebffba06560e13/Lib/concurrent/futures/process.py#L515 (called from `self.terminate_broken()`). The main thread itself is blocked on https://github.com/python/cpython/blob/da4912885f11f525a82a83f795ebffba06560e13/Lib/concurrent/futures/process.py#L775 coming from the `__exit__` method of the Executor.

*Proposed solution*

Drain call queue buffer either in `terminate_broken` method before calling `join_executor_internals` or in queue `close` method.
I will create a pull request with a possible implementation.

**Your environment**

- CPython versions tested on: reproduced in 3.10.5 and 3.9.13 (works well in 3.8.10: BrokenProcessPool exception)
- Operating system and architecture: Linux, x86_64



### Linked PRs
* gh-94784
* gh-106607
* gh-106609

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ProcessPoolExecutor deadlock when a child process crashes while data is being sent in call queue #94777

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

ProcessPoolExecutor deadlock when a child process crashes while data is being sent in call queue #94777

Description

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions