Description
Bug report
tl;dr Switching between interpreters while finalizing causes the main thread to exit. The fix should be simple.
We use PyThreadState_Swap()
to switch between interpreters. That function almost immediately calls _PyEval_AcquireLock()
. During finalization, _PyEval_AcquireLock()
immediately causes the thread to exit if the current thread state doesn't match the one that was active when Py_FinalizeEx()
was called.
Thus, if we switch interpreters during finalization then the thread will exit. If we do this in the finalizing (main) thread then the process immediately exits with an exit code of 0.
One notable consequence is that a Python process with an unhandled exception will print the traceback like normal but can end up with an exit code of 0 instead of 1 (and some of the runtime finalization code never gets executed). 1
Reproducer
$ cat > script.py << EOF
import _xxsubinterpreters as _interpreters
interpid = _interpreters.create()
raise Exception
EOF
$ ./python script.py
Traceback (most recent call last):
File ".../check-swapped-exitcode.py", line 3, in <module>
raise Exception
Exception
$ echo $?
0
In this case, "interpid" is a PyInterpreterIDObject
bound to the __main__
module (of the main interpreter). It is still bound there when the script ends and the executable starts finalizing the runtime by calling Py_FinalizeEx()
. 2
Here's what happens in Py_FinalizeEx()
:
- wait for non-daemon threads to finish 3
- run any remaining pending calls belong to the main interpreter
- run at exit hooks
- mark the runtime as finalizing (storing the pointer to the current tstate, which belongs to the main interpreter)
- delete all other tstates belong to the main interpreter (i.e. all daemon threads)
- remove our custom signal handlers
- finalize the import state
- clean up
sys.modules
of the main interpreter (finalize_modules()
in Python/pylifecycle.c)
At the point the following happens:
- the
__main__
module is dealloc'ed - "interpid" is dealloc'ed (
PyInterpreterID_Type.tp_dealloc
) _PyInterpreterState_IDDecref()
is called, which finalizes the corresponding interpreter state
4, beforePy_EndInterpreter()
is called, we call_PyThreadState_Swap()
to switch to a tstate belonging to the subinterpreter- that calls
_PyEval_AcquireLock()
- that basically calls
_PyThreadState_MustExit()
, which sees that the current tstate pointer isn't the one we stored as "finalizing" - it then calls
PyThread_exit_thread()
, which kills the main thread - the process exits with an exitcode of 0
Notably, the rest of Py_FinalizeEx()
(and Py_Main()
, etc.) does not execute. main()
never gets a chance to return an exitcode of 1.
Background
Runtime finalization happens in whichever thread called Py_FinalizeEx()
and happens relative to whichever PyThreadState
is active there. This is typically the main thread and the main interpreter.
Other threads may still be running when we start finalization, whether daemon threads or not, and each of those threads has a thread state corresponding to the interpreter that is active in that thread. 4 One of the first things we do during finalization is to wait for all non-daemon threads to finish running. Daemon threads are a different story. They must die!
Back in 2011 we identified that daemon threads were interfering with finalization, sometimes causing crashes or making the Python executable hang. 5 At the time, we applied a best-effort solution where we kill the current thread if it isn't the one where Py_FinalizeEx()
was called.
However, that solution checked the tstate pointer rather than the thread ID, so swapping interpreters in the finalizing thread was broken, and here we are.
History:
- shutdown (exit) can hang or segfault with daemon threads running #46164 (2011; commit 0d5e52d) - exit thread during finalization in
PyEval_RestoreThread()
(also add_Py_Finalizing
) - gh-??? (2014; commit 17548dd) - do same in
_PyEval_EvalFrameDefault()
(eval loop, right after re-acquiring GIL when handling eval breaker) - PyEval_AcquireLock() and PyEval_AcquireThread() do not handle runtime finalization properly. #80656 (2019; PR: bpo-36475: Finalize PyEval_AcquireLock() and PyEval_AcquireThread() properly #12667) - do same in
PyEval_AcquireLock()
andPyEval_AcquireThread()
(also addexit_thread_if_finalizing()
) - Daemon thread is crashing in PyEval_RestoreThread() while the main thread is exiting the process #84058 (2020; PR: bpo-39877: Fix PyEval_RestoreThread() for daemon threads #18811) - use
_PyRuntime
directly - Daemon thread is crashing in PyEval_RestoreThread() while the main thread is exiting the process #84058 (2020; PR: bpo-39877: Refactor take_gil() function #18885) - move all the checks to
take_gil()
Related: gh-87135 (PRs: gh-105805, gh-28525)
Linked PRs
- gh-109793: Allow Switching Interpreters During Finalization #109794
- [3.12] gh-109793: Allow Switching Interpreters During Finalization (gh-109794) #110705
Footnotes
-
This may help explain why, when we re-run some tests in subprocesses, they aren't marked as failures even when they actually fail. ↩
-
Note that we did not create any extra threads; we stayed exclusively in the main thread. We also didn't even run any code in the subinterpreter. ↩
-
FYI, IIRC we used to abort right before this point if there were any subinterpreters around still. ↩
-
In any given OS thread, each interpreter has a distinct tstate. Each tstate (mostly) corresponds to exactly one OS thread. ↩
-
If a daemon thread keeps running and tries to access any objects or other runtime state then there's a decent chance of a crash. ↩
Metadata
Metadata
Assignees
Labels
Projects
Status