Description
Bug report
I've seen this in the free-threaded build, but I think the problem can theoretically occur in the default build as well.
The problem is that after a fork()
, an already dead ThreadHandle
may be deallocated before it's marked as not joinable. The ThreadHandle_dealloc()
function can crash in PyThread_detach_thread()
:
cpython/Modules/_threadmodule.c
Lines 66 to 70 in bcccf1f
The steps leading to the crash are:
- A thread
T2
starts and finishes, but is not joined. TheThreadHandle
is not immediately deallocated, either because it's part of a larger reference cycle or due to biased reference counting (in the free-threaded build) - The main thread calls
fork()
- In the child process, during
PyOS_AfterFork_Child()
, theThreadHandle
is deallocated. I've seen this happen in the free-threaded build due to biased reference counting merging the thread states inPyThreadState_Clear()
. I believe this can also happen in the default build if, for example, a GC is triggered early on duringthreading._after_fork()
before we get to marking theThreadHandle
as not joinable.
Proposed fix
Early on in PyOS_AfterFork_Child()
, we should fix up all ThreadHandle
objects from C (without executing Python code) -- we should mark the dead ones as not joinable and update the remaining active thread.
I think it's important to do this without executing Python code. Once we start executing Python code, almost anything can happen, such as GC collections, destructors, etc.
cc @pitrou @gpshead @ericsnowcurrently