Skip to content

gh-83856: Honor atexit for all multiprocessing start methods #114279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 3, 2024

Conversation

gaogaotiantian
Copy link
Member

@gaogaotiantian gaogaotiantian commented Jan 19, 2024

Currently, atexit does not work for fork and forkserver methods for multiprocessing, because multiprocessing kills the forked process with os._exit(). This causes a couple of issues:

  1. Obviously, atexit does not work, which is unexpected and undocumented.
  2. fork and forkserver has a different behavior than spawn, which is pretty bad
  3. Even for spawn, the cleanup order will be messed up, because util._exit_function is explicitly called in Process.bootstrap, and atexit callbacks will all be called after it. (See Process _bootstrap calls _exit_function #114220)

I'll have to admit that this is not the safest patch, it slightly changes the cleanup routine for all multiprocessing code. However, I still believe the benefit is worth the risk because this results in a consistent behavior.

@gaogaotiantian
Copy link
Member Author

@pablogsal as you are the original assignee for the issue, could you take the first look at this? Thanks.

@pablogsal
Copy link
Member

I think that first we need some agreement among core devs if this is a good idea at all. I agree that this is indeed more consistent and desirable in a lot of cases but the change in behaviour may unexpectedly affect many systems that rely on the fact that os._exit() is called. There are a bunch of examples where for instance process pools rely on calling os._exit on the workers directly without executing any cleanup on purpose and changing this may have unexpected consequences.

So before we can weight on the patch we need to collect our thoughts regarding this in the issue. I will ping some other core devs there to see what they think

@pitrou
Copy link
Member

pitrou commented Jan 19, 2024

I'm afraid of changing this as there's probably some software out there that relies on the current behavior. Common Unix "wisdom" has always recommended to call a light-weight exit function such as _exit in forked process, not the full-blown exit, and Python has always honored that. See https://stackoverflow.com/questions/2329640/how-to-exit-a-child-process-exit-vs-exit

What we could probably do is add a new atexit API that registers an atexit handler that will also be inherited in child processes (ideally this would be an option to atexit.register, but its current signature doesn't seem to allow this).

We could also be more selective and distinguish fork and forkserver. Since forkserver is an optimization that mostly emulates spawn behavior, it would make more sense to change its semantics.

Another thing: regardless of whether atexit handlers are inherited from the parent, atexit handlers registered in the child should certainly be executed.

@gpshead

@gaogaotiantian
Copy link
Member Author

I definitely agree that we should have more discussion about this behavior, but we should be clear about what's the new behavior first.

First of all, the child process will STILL use os._exit() to exit the process. The only thing changed is that it will call atexit handlers before os._exit(). It's NOT a full-blown exit().

The atexit handlers from parent is NOT inherited. They are cleaned up immediately in launch(). Only the handlers explicitly registered in child process is executed - that's the major fix, any callback explicitly registered in child process will be ignored.

So, the only major thing changed, is the callbacks registered by users with atexit. Is it possible that some specific handlers cause a dead lock? Yes. However, it's always possible to use util.Finalize() to do that (okay it's not documented, but it's there and people are using it. I am at least :) ) and we are all adults. It's not CPython's responsibility to guarantee a dead-lock free multiprocessing code. If the user does not use atexit in the child process, nothing major should change (the cleanup order is slightly changed).

@gpshead gpshead self-assigned this Jan 19, 2024
@gpshead
Copy link
Member

gpshead commented Jan 20, 2024

My gut feeling is that a change like this is desirable. The inconsistency and odd behavioral corners this changes feel good to make consistent. It seems awkward that any user code executed within a child process that calls an atexit.register API would not see it called when the process it ran in exits. Especially based on the multiprocessing start method (which today also happens to means "based on the platform's default").

That said, I don't expect most code run via multiprocessing to be registering atexit handlers. Which is presumably why the status quo behavior effectively went unnoticed for so long. Similarly to the other issue linked to regarding the multiprocessing.util._exit_function() vs atexit function call order being different - I expect the majority of atexit code to be rather independent and just does not depend on that ordering.

That we have two behaviors today, that are both regularly seen by cross platform code, suggests that it is infrequent for code to specifically care. So simplifying our story to make this consistent across start methods feels good.

While we could offer yet another API flag to let users control which atexit behavior their multiprocessing using application wants, I don't think that is something we should be happy to provide and maintain. The same goes for extending atexit to have a concept of if the actions happen in forked children or not. We've lived so long without anyone requesting that, I'd rather us not add and maintain such a thing for the future.

For user code that calls os._exit() directly itself, I don't believe this change has any impact: that was already preventing atexit stuff and other cleanup from running and will continue to do so.

@pitrou
Copy link
Member

pitrou commented Jan 22, 2024

I definitely agree that we should have more discussion about this behavior, but we should be clear about what's the new behavior first.

Okay, thanks for the explanation. I agree that this would be an improvement.

@gaogaotiantian
Copy link
Member Author

Hi @gpshead, I think we had a discussion above involving a couple of core devs. Should we try to proceed on this? To be honest, code wise, this is not even a very huge change. The structure stays almost the same, the only thing added is the atexit handler.

@gaogaotiantian
Copy link
Member Author

A friendly ping for @gpshead , as this PR has been stuck for a month while the code is ready for review.

@gpshead gpshead self-requested a review April 15, 2024 19:32
@@ -46,6 +48,43 @@ def test_atexit_instances(self):
self.assertEqual(res.out.decode().splitlines(), ["atexit2", "atexit1"])
self.assertFalse(res.err)

def test_multiprocessing(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a little awkward to have logic specific to multiprocessing start methods on different platforms in test_atexit.py. I think this test would be better off within something like _test_multprocessing.py which is turned into multiple separate test suites per start method based upon what the platform supports.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the test to _test_multiprocessing.py

@gaogaotiantian
Copy link
Member Author

Hi @gpshead , I updated the tests as requested. Is there anything I should do on this PR? Thanks!

@gpshead
Copy link
Member

gpshead commented May 3, 2024

Thanks! I think this is a good change, as captured by the discussion above. Merging. If anyone runs into issues with it during the 3.13 beta period we can reconsider.

@gpshead gpshead merged commit 998c385 into python:main May 3, 2024
@gaogaotiantian gaogaotiantian deleted the atexit-with-multiprocessing branch May 3, 2024 19:03
SonicField pushed a commit to SonicField/cpython that referenced this pull request May 8, 2024
…ythonGH-114279)

Use atexit for all multiprocessing start methods to cleanup.
See the pythonGH-114279 PR discussion and related issue for details as to why.
tornaria added a commit to tornaria/sage that referenced this pull request Dec 24, 2024
For some reason, a new behaviour of python 3.13 [0] causes the
`TemporaryDirectory()` in `sage.misc.temporary_file.spyx_tmp()`
to be deleted on child exit, which causes trouble with parallel
doctesting [1].

We rewrite `spyx_tmp()` using `tmp_dir()`, which doesn't have this
problem, see [2].

[0] python/cpython#114279
[1] sagemath#39188 (comment)
[2] sagemath#39188 (comment)
tornaria added a commit to tornaria/sage that referenced this pull request Dec 24, 2024
For some reason, a new behaviour of python 3.13 [0] causes the
`TemporaryDirectory()` in `sage.misc.temporary_file.spyx_tmp()`
to be deleted on child exit, which causes trouble with parallel
doctesting [1].

We rewrite `spyx_tmp()` using `tmp_dir()`, which doesn't have this
problem, see [2].

[0] python/cpython#114279
[1] sagemath#39188 (comment)
[2] sagemath#39188 (comment)
tornaria added a commit to tornaria/sage that referenced this pull request Dec 26, 2024
For some reason, a new behaviour of python 3.13 [0] causes the
`TemporaryDirectory()` in `sage.misc.temporary_file.spyx_tmp()`
to be deleted on child exit, which causes trouble with parallel
doctesting [1].

We rewrite `spyx_tmp()` using `tmp_dir()`, which doesn't have this
problem, see [2].

[0] python/cpython#114279
[1] sagemath#39188 (comment)
[2] sagemath#39188 (comment)
vbraun pushed a commit to vbraun/sage that referenced this pull request Jan 7, 2025
sagemathgh-39201: Fix `spyx_tmp()` cleanup.
    
For some reason, a new behaviour of python 3.13 [0] causes the
`TemporaryDirectory()` in `sage.misc.temporary_file.spyx_tmp()`
to be deleted on child exit, which causes trouble with parallel
doctesting [1].

We rewrite `spyx_tmp()` using `tmp_dir()`, which doesn't have this
problem, see [2].

[0] python/cpython#114279
[1] sagemath#39188 (comment)
[2] sagemath#39188 (comment)

In addition, use `sage_` prefix for the main temporary directory.

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->

- [x] The title is concise and informative.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
    
URL: sagemath#39201
Reported by: Gonzalo Tornaría
Reviewer(s): Antonio Rojas, Gonzalo Tornaría
vbraun pushed a commit to vbraun/sage that referenced this pull request Jan 9, 2025
sagemathgh-39201: Fix `spyx_tmp()` cleanup.
    
For some reason, a new behaviour of python 3.13 [0] causes the
`TemporaryDirectory()` in `sage.misc.temporary_file.spyx_tmp()`
to be deleted on child exit, which causes trouble with parallel
doctesting [1].

We rewrite `spyx_tmp()` using `tmp_dir()`, which doesn't have this
problem, see [2].

[0] python/cpython#114279
[1] sagemath#39188 (comment)
[2] sagemath#39188 (comment)

In addition, use `sage_` prefix for the main temporary directory.

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->

- [x] The title is concise and informative.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
    
URL: sagemath#39201
Reported by: Gonzalo Tornaría
Reviewer(s): Antonio Rojas, Gonzalo Tornaría
vbraun pushed a commit to vbraun/sage that referenced this pull request Jan 10, 2025
sagemathgh-39201: Fix `spyx_tmp()` cleanup.
    
For some reason, a new behaviour of python 3.13 [0] causes the
`TemporaryDirectory()` in `sage.misc.temporary_file.spyx_tmp()`
to be deleted on child exit, which causes trouble with parallel
doctesting [1].

We rewrite `spyx_tmp()` using `tmp_dir()`, which doesn't have this
problem, see [2].

[0] python/cpython#114279
[1] sagemath#39188 (comment)
[2] sagemath#39188 (comment)

In addition, use `sage_` prefix for the main temporary directory.

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->

- [x] The title is concise and informative.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
    
URL: sagemath#39201
Reported by: Gonzalo Tornaría
Reviewer(s): Antonio Rojas, Gonzalo Tornaría
vbraun pushed a commit to vbraun/sage that referenced this pull request Jan 12, 2025
sagemathgh-39201: Fix `spyx_tmp()` cleanup.
    
For some reason, a new behaviour of python 3.13 [0] causes the
`TemporaryDirectory()` in `sage.misc.temporary_file.spyx_tmp()`
to be deleted on child exit, which causes trouble with parallel
doctesting [1].

We rewrite `spyx_tmp()` using `tmp_dir()`, which doesn't have this
problem, see [2].

[0] python/cpython#114279
[1] sagemath#39188 (comment)
[2] sagemath#39188 (comment)

In addition, use `sage_` prefix for the main temporary directory.

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->

- [x] The title is concise and informative.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
    
URL: sagemath#39201
Reported by: Gonzalo Tornaría
Reviewer(s): Antonio Rojas, Gonzalo Tornaría
vbraun pushed a commit to vbraun/sage that referenced this pull request Jan 16, 2025
sagemathgh-39201: Fix `spyx_tmp()` cleanup.
    
For some reason, a new behaviour of python 3.13 [0] causes the
`TemporaryDirectory()` in `sage.misc.temporary_file.spyx_tmp()`
to be deleted on child exit, which causes trouble with parallel
doctesting [1].

We rewrite `spyx_tmp()` using `tmp_dir()`, which doesn't have this
problem, see [2].

[0] python/cpython#114279
[1] sagemath#39188 (comment)
[2] sagemath#39188 (comment)

In addition, use `sage_` prefix for the main temporary directory.

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->

- [x] The title is concise and informative.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
    
URL: sagemath#39201
Reported by: Gonzalo Tornaría
Reviewer(s): Antonio Rojas, Gonzalo Tornaría
vbraun pushed a commit to vbraun/sage that referenced this pull request Jan 17, 2025
sagemathgh-39201: Fix `spyx_tmp()` cleanup.
    
For some reason, a new behaviour of python 3.13 [0] causes the
`TemporaryDirectory()` in `sage.misc.temporary_file.spyx_tmp()`
to be deleted on child exit, which causes trouble with parallel
doctesting [1].

We rewrite `spyx_tmp()` using `tmp_dir()`, which doesn't have this
problem, see [2].

[0] python/cpython#114279
[1] sagemath#39188 (comment)
[2] sagemath#39188 (comment)

In addition, use `sage_` prefix for the main temporary directory.

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->

- [x] The title is concise and informative.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
    
URL: sagemath#39201
Reported by: Gonzalo Tornaría
Reviewer(s): Antonio Rojas, Gonzalo Tornaría
@camlee
Copy link

camlee commented Mar 13, 2025

Now that Python 3.13 is released, did this fully make it in? I don't see a reference in the docs (ex. a "Changed in version 3.13" here) and when I run the example shared provided in #83856, I only see one cleanup, not 2 as expected.

I've found one other person who has looked for this fix in Python 3.13 but is not seeing it on their machine: pydantic/logfire#779

I'm using Python 3.13 (and also tried 3.14 alpha 5) on Ubuntu 24.04, in a VM with a Windows host if that matters.

$ python -VV
Python 3.13.2 (main, Feb  5 2025, 08:49:06) [GCC 13.3.0]

$ uname -a
Linux ubuntu 6.11.0-19-generic #19~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Feb 17 11:51:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 24.04.2 LTS
Release:	24.04
Codename:	noble

$ python example.py 
process done 15821
app finished
cleanup 15820

@gaogaotiantian
Copy link
Member Author

@camlee yes this made it to 3.13. Not everything will be announced in whatsnew, this is a relatively small fix rather than a new feature so it's not mentioned. The expected behavior of the example given in #83856 is not two cleanups, it's one. You can read the discussion above - the child process will not inherit the registered atexit callback from their parents - that would be a huge change and potentially break a lot of code out there. The only thing this PR changes, is that the atexit functions registered in the child process will be executed normally, instead of being ignored.

@camlee
Copy link

camlee commented Mar 13, 2025

Thanks @gaogaotiantian! That makes perfect sense now. I figured I was probably looking for the change the wrong way. For those confused like I was, ex. @alexmojaki, here's the same example but modified to register the atexit function in each process individually and this produces the desired two cleanups on Python 3.13 but just one on Python 3.12 showing that the issue was indeed fixed.

from multiprocessing import Process, set_start_method
import time
import os
import atexit

def cleanup():
    print(f"cleanup {os.getpid()}")

def run():
    atexit.register(cleanup) # Registering cleanup for the child process

    time.sleep(0.1)
    print(f"process done {os.getpid()}")
    # atexit._run_exitfuncs()

if __name__ == "__main__":
    atexit.register(cleanup) # Registering cleanup for the parent process

    set_start_method("fork")
    process = Process(target=run)
    process.start()
    process.join()
    print("app finished")

In terms of docs, I agree this is pretty small and doesn't need mention in something like what's new. What I was thinking of is something on the atexit page clarifying some of this behavior related to fork, roughly like this: camlee@8b0e922
I'm not sure yet what's involved in properly proposing and making a docs change (particularly with the two Python versions involved now), or if it would be worth the effort. So the above commit is just illustrative. I can learn the process and submit a ticket and PR (or two, for 3.13 and 3.14?) as appropriate if desired.

Thanks again @gaogaotiantian for your very clear and prompt reply! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants