PEP 787: Safer subprocess usage using t-strings

steve.dower · April 15, 2025, 7:57pm

Because of these earlier examples:

layday · April 15, 2025, 8:09pm

Ah right, I didn’t consider the second scenario. Thank you.

bwoodsend · April 15, 2025, 9:09pm

Incidentally, that also only works on Windows. It’s equivalent to just run("echo", shell=True) everywhere else.

steve.dower · April 15, 2025, 9:44pm

I am more and more shocked that any of my subprocess-using code has ever worked on POSIX at all

gerardw · April 15, 2025, 9:56pm

shell=True is in the fourth example of the PEP-787:

# Safe with t-strings and POSIX-compliant shell quoting:
subprocess.run(t"echo {message_from_user}", shell=True)

How is / would:
subprocess.run(t"/usr/bin/cp file with space destination with space") possibly know how to split the string?

subprocess.run('/usr/bin/cp','file with space',"destination with space") works just fine. Add it’s Python so you don’t have to worry whatever / however given shell uses quotes.

gerardw · April 15, 2025, 9:59pm

I don’t even use shells for pipes. Just connect the subprocesses with:

class BinaryPiper(Thread):
    """Pipe binary data from source to dest"""

    def __init__(self, name: str, source:typing.BinaryIO, dest:typing.BinaryIO,chunk:int=512):
        super().__init__(name=name, daemon=False)
        self.source = source
        self.dest = dest
        self.chunk = chunk

    def run(self):
        mlogger.debug(f"BinaryPiper thread {self.name} starting")
        while True:
            data = self.source.read(self.chunk)
            if data:
                r = self.dest.write(data)
                if r < self.chunk:
                    mlogger.warning(f"{self.name} short write {r} < {self.chunk}")
            else:
                return

pitrou · April 16, 2025, 8:34am

Judging by the comments on this topic, it seems that “safer” in the PEP title is a misnomer. This PEP might actually make subprocess usage less safe because of encouraging people to pass their subprocess command as a single string, rather than a list which is the currently recommended idiom.

merwok · April 16, 2025, 3:19pm

You are misunderstanding. The idea is that this code is wrong:

source = "file with space"  # imagine this comes from user input
dest = "spaces again"

subprocess.run(f'cp {source} {dest}')  # 💣

But this would be as safe as passing a list, but possibly look nicer in the eyes of people:

subprocess.run(shlex.sh(t'cp {source} {dest}'))

It feels that we are rehashing the whole discussion about t-strings here. Please read the PEPs (Template strings + this one) for motivating examples.

merwok · April 16, 2025, 3:24pm

For this example, the idea again is that whereas f-strings would not catch the vulnerability of message_from_user = "ok; rm --no-preserve-root -rf /', t-strings + shlex.sh (or subprocess.run itself) would, because the final string passed to the shell would be something like "echo 'ok; rm --no-preserve-root -rf /'".

Jelle · April 16, 2025, 5:04pm

Though it would only be safe as long as shlex.quote correctly quotes the argument. As the PEP acknowledges, this is not true on Windows. On Unix it’s probably usually true, though it’s not clear to me how well-tested this is: there are many different shells, and it’s not clear to me that we’re confident that the escaping shlex does is always sufficient. I personally would still feel much safer writing this example as ["echo", message_from_user] without shell=True.

oscarbenjamin · April 16, 2025, 6:56pm

Would it not be better if the sh function made a list of strings rather than trying to quote anything?

Nineteendo · April 16, 2025, 6:59pm

How about just splitting the string on interpolations?
And splitting the strings using shlex.split()? Making these commands equivalent:

subprocess.run(t"echo {message_from_user}")
subprocess.run(shlex.split("echo ") + [str(message_from_user)])
subprocess.run(["echo", str(message_from_user)])

This would at least be as safe as passing it as a list, but less versatile.

Edit: looks Oscar posted the same idea right before me.

steve.dower · April 16, 2025, 7:26pm

It’s not necessary, we just say “subprocess.Popen handles t-string as if it’s a list of strings”^[1] rather than defining it in terms of the new sh function.

If people want a quoted command (to e.g. write into a script file), they’re welcome to it. The issue is defining our internal subprocess behaviour in terms of the sh function.

I realise this is vague enough for people to argue that I’m wrong about the specifics. I know, and don’t care. If you want to argue with me, I’m just going to ignore you. The PEP will have more precise text in it, but if you’ve made it this far and still want to argue about pointless aspects rather than letting the PEP author update the PEP, you’re welcome to waste your time but you’re not wasting more of mine. ↩︎

rrolls · April 17, 2025, 5:17pm

I can’t see that it’s been mentioned yet, so I want to raise that even though

subprocess.run(t"some_command {message_from_user}")

is an excellent use of t-strings, it is inadvisable to claim that it is “safe on all platforms” without qualifying what “safe” means.

In particular, there are all sorts of exploits that take the form of setting message_from_user to something beginning with a hyphen. Depending on the exact command, you can do more or less with this. The usual solution to this is to write

subprocess.run(t"some_command -- {message_from_user}")

instead, but this can’t be done automatically, as whether this is correct or not will depend on what some_command is.

At the very least, this should be noted in the Security Implications of this PEP.

pf_moore · April 17, 2025, 6:23pm

It’s certainly a security consideration, but it exists even in subprocess.run(["some_command", message_from_user]) (IMO the canonical form of a “safe approach”) , so I don’t think it’s particularly relevant to this PEP.

nedbat · April 17, 2025, 10:53pm

I think it could be appropriate here because the main improvement of using t-strings is that they are safer than other ways of building strings. It would be good to point out ways that they don’t prevent all the dangers.

BrenBarn · April 18, 2025, 3:29am

We also don’t want future users coming from search engines to a PEP that says “safe” and then thinking they are safe if they use that mechanism. Might be better to give the actual PEP a less tantalizing title.

ncoghlan · April 18, 2025, 6:07pm

Collating some notes on areas where the comments so far suggest we need improvements in the PEP:

addressing @steve.dower’s request to decouple the semantic definition of the subprocess changes from the proposed shlex changes (to avoid the problem in the initially published versions where POSIX-specific assumptions are used in defining the subprocess part of the proposal)
noting that “safer” is an arguable phrasing, since we’re not adding any safety that the list input format doesn’t already provide. (We, the PEP authors, think it’s a net safety gain, because t-strings are just as syntactically convenient for subprocess invocation as f-strings, while offering the same level of safety in user input processing as the list input format. However, folks that only run simple external invocations that are easy to directly format as a list, or only complex ones that they build programmatically as a list anyway, aren’t going to see any benefits from this proposal).
rework the proposal structure to emphasise that folks currently use shell=True on POSIX when they don’t actually need it, just because they hate the list input format so much (and convincing them to put a t in front of their input strings seems like an easier sell than asking them to rewrite those strings as lists). So while the t-string input format will work with shell=True, the real security pay-off comes when adopting the t-string format leads to users not invoking the shell at all (relying on the t-string parsing instead).
noting that there is an existing challenge that passing user input to external commands may allow that input to be interpreted as command options instead of purely positional inputs (shell quoting doesn’t prevent app-level security issues like that, and the t-string format will be just as vulnerable to those problems as the existing list format)
noting that the cleanest way to handle spaces that you don’t want to split on in the t-string format is to make them part of inline interpolation fields (so they’re implicitly quoted): subprocess.run(t"/usr/bin/cp {"file with space"} {"destination with space"}") (rather than embedding quotes directly in the fixed part of the string, which is harder to do in a cross-platform compatible way)
adding an open question around whether adopting mslex/oslex style functionality in the standard library is effectively a pre-requisite for the subprocess integration parts of the proposal
adding more complex quoting/splitting examples that illustrate why the initial version of the PEP defined the subprocess changes in terms of the shlex.sh proposal (and why I’m not sure we’re going to be able to give @steve.dower the platform independent subprocess semantics he’s after without agreeing on what subprocess.run(t"echo {'} {"} {"'single quoted arg'"} {'"double quoted arg"'}") should display).

Examples of the latter using the list format on the five shells I have easy access to (bash, dash, zsh, cmd, powershell)

# POSIX shells
>>> cmd = shlex.join(["echo", "'", '"', "'single quoted arg'", '"double quoted arg"'])
>>> subprocess.run(cmd, shell=True, executable="/usr/bin/bash") and None
' " 'single quoted arg' "double quoted arg"
>>> subprocess.run(cmd, shell=True, executable="/usr/bin/dash") and None
' " 'single quoted arg' "double quoted arg"
>>> subprocess.run(cmd, shell=True, executable="/usr/bin/zsh") and None
' " 'single quoted arg' "double quoted arg"
>>>

# Windows shells
>>> os.environ["ComSpec"]
'C:\\WINDOWS\\system32\\cmd.exe'
>>> subprocess.run(["echo", "'", '"', "'single quoted arg'", '"double quoted arg"'], shell=True) and None
' \" "'single quoted arg'" "\"double quoted arg\""
>>> shutil.which("powershell")
'C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\powershell.EXE'
>>> os.environ["ComSpec"] = shutil.which("powershell")
>>> subprocess.run(["echo", "'", '"', "'single quoted arg'", '"double quoted arg"'], shell=True) and None
The string is missing the terminator: '.
    + CategoryInfo          : ParserError: (:) [], ParentContainsErrorRecordException
    + FullyQualifiedErrorId : TerminatorExpectedAtEndOfString

>>> cmd = shlex.join(["echo", "'", '"', "'single quoted arg'", '"double quoted arg"'])
>>> subprocess.run(cmd, shell=True) and None
The string is missing the terminator: '.
    + CategoryInfo          : ParserError: (:) [], ParentContainsErrorRecordException
    + FullyQualifiedErrorId : TerminatorExpectedAtEndOfString

>>>

(shlex.join required on POSIX due to the “list elements after the first are ignored for shell=True” behaviour mentioned above. Apparently powershell doesn’t like either the implicit-on-Windows subprocess.list2cmdline or shlex.join)

steve.dower · April 18, 2025, 6:24pm

No, the shell escaping we use is for cmd.exe,^[1] which is the default for shell=True. Is it a bad default? Yeah. Can we make a set of rules that work for both? Not a chance. But at least with t-strings we can define compatible splitting rules for all shells/platforms, which allow us to treat each argument individually while we do substitutions, and then quoting rules can be applied based on the best information we have (which includes the path to the shell executable, so they could work out to be pretty good, really).

The non-shell escaping should be able to handle those examples fine. It’s entirely handled by applications on Windows, so there’s no real spec, but “whatever the standard C runtime does to generate argv” is good enough, and conveniently is what CPython uses, so we can at least be assured of quoting things properly for ourselves.

To the very limited extent that we do anything here. I think we mostly just say “you’re on your own, good luck” when you pass shell=True. ↩︎

ncoghlan · April 18, 2025, 11:59pm

Since the cross-platform concerns in subprocess turned out to be even trickier than we realised, @nhumrich and I have decided not to push for those changes in the remaining (short) 3.14 time frame.

We’ll still propose shlex.sh (along with t-string support for shlex.split) as a simple feature request for 3.14, since that part seems much less controversial, but the PEP as a whole will be targeting 3.15.