PyThreadState_Swap() During Finalization Causes Immediate Exit (AKA Daemon Threads Are Still the Worst!)

# Bug report

tl;dr Switching between interpreters while finalizing causes the main thread to exit.  The fix should be simple.

We use `PyThreadState_Swap()` to switch between interpreters.  That function almost immediately calls `_PyEval_AcquireLock()`.  During finalization, `_PyEval_AcquireLock()` immediately causes the thread to exit if the current thread state doesn't match the one that was active when `Py_FinalizeEx()` was called.

Thus, if we switch interpreters during finalization then the thread will exit.  If we do this in the finalizing (main) thread then the process immediately exits with an exit code of 0.

One notable consequence is that a Python process with an unhandled exception will print the traceback like normal but can end up with an exit code of 0 instead of 1 (and some of the runtime finalization code never gets executed). [^1]

[^1]: This may help explain why, when we re-run some tests in subprocesses, they aren't marked as failures even when they actually fail.


## Reproducer

```shell
$ cat > script.py << EOF
import _xxsubinterpreters as _interpreters
interpid = _interpreters.create()
raise Exception
EOF
$ ./python script.py
Traceback (most recent call last):
  File ".../check-swapped-exitcode.py", line 3, in <module>
    raise Exception
Exception
$ echo $?
0
```

In this case, "interpid" is a `PyInterpreterIDObject` bound to the `__main__` module (of the main interpreter).  It is still bound there when the script ends and the executable starts finalizing the runtime by calling `Py_FinalizeEx()`. [^2]

[^2]: Note that we did not create any extra threads; we stayed exclusively in the main thread.  We also didn't even run any code in the subinterpreter.

Here's what happens in `Py_FinalizeEx()`:

1. wait for non-daemon threads to finish [^3]
2. run any remaining pending calls belong to the main interpreter
3. run at exit hooks
4. mark the runtime as finalizing (storing the pointer to the current tstate, which belongs to the main interpreter)
5. delete all other tstates belong to the main interpreter (i.e. all daemon threads)
6. remove our custom signal handlers
7. finalize the import state
9. clean up `sys.modules` of the main interpreter (`finalize_modules()` in Python/pylifecycle.c)

[^3]: FYI, IIRC we used to abort right before this point if there were any subinterpreters around still.

At the point the following happens:

1. the `__main__` module is dealloc'ed
2. "interpid" is dealloc'ed (`PyInterpreterID_Type.tp_dealloc`)
3. `_PyInterpreterState_IDDecref()` is called, which finalizes the corresponding interpreter state
4, before `Py_EndInterpreter()` is called, we call `_PyThreadState_Swap()` to switch to a tstate belonging to the subinterpreter
5. that calls `_PyEval_AcquireLock()`
6. that basically calls `_PyThreadState_MustExit()`, which sees that the current tstate pointer isn't the one we stored as "finalizing"
7. it then calls `PyThread_exit_thread()`, which kills the main thread
8. the process exits with an exitcode of 0

Notably, the rest of `Py_FinalizeEx()` (and `Py_Main()`, etc.) does *not* execute.  `main()` never gets a chance to return an exitcode of 1.


## Background

Runtime finalization happens in whichever thread called `Py_FinalizeEx()` and happens relative to whichever `PyThreadState` is active there.  This is typically the main thread and the main interpreter.

Other threads may still be running when we start finalization, whether daemon threads or not, and each of those threads has a thread state corresponding to the interpreter that is active in that thread. [^4]  One of the first things we do during finalization is to wait for all non-daemon threads to finish running.  Daemon threads are a different story.  They must die!

[^4]: In any given OS thread, each interpreter has a distinct tstate.  Each tstate (mostly) corresponds to exactly one OS thread.

Back in 2011 we identified that daemon threads were interfering with finalization, sometimes causing crashes or making the Python executable hang.  [^5]  At the time, we applied a best-effort solution where we kill the current thread if it isn't the one where `Py_FinalizeEx()` was called.

[^5]: If a daemon thread keeps running and tries to access any objects or other runtime state then there's a decent chance of a crash.

However, that solution checked the tstate pointer rather than the thread ID, so swapping interpreters in the finalizing thread was broken, and here we are.

History:

* gh-46164 (2011; commit 0d5e52d3469) - exit thread during finalization in `PyEval_RestoreThread()` (also add `_Py_Finalizing`)
* gh-???  (2014; commit 17548dda51d) - do same in `_PyEval_EvalFrameDefault()` (eval loop, right after re-acquiring GIL when handling eval breaker)
* gh-80656 (2019; PR: gh-12667) - do same in `PyEval_AcquireLock()` and `PyEval_AcquireThread()` (also add `exit_thread_if_finalizing()`)
* gh-84058 (2020; PR: gh-18811) - use `_PyRuntime` directly
* gh-84058 (2020; PR: gh-18885) - move all the checks to `take_gil()`

Related:  gh-87135 (PRs: gh-105805, gh-28525)


### Linked PRs
* gh-109794
* gh-110705

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PyThreadState_Swap() During Finalization Causes Immediate Exit (AKA Daemon Threads Are Still the Worst!) #109793

Bug report

Reproducer

Background

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

PyThreadState_Swap() During Finalization Causes Immediate Exit (AKA Daemon Threads Are Still the Worst!) #109793

Description

Bug report

Reproducer

Background

Linked PRs

Footnotes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions