PEP 787: Safer subprocess usage using t-strings

Because of these earlier examples:

Ah right, I didn’t consider the second scenario. Thank you.

1 Like

Incidentally, that also only works on Windows. It’s equivalent to just run("echo", shell=True) everywhere else.

1 Like

I am more and more shocked that any of my subprocess-using code has ever worked on POSIX at all :laughing:

8 Likes

shell=True is in the fourth example of the PEP-787:

# Safe with t-strings and POSIX-compliant shell quoting:
subprocess.run(t"echo {message_from_user}", shell=True)

How is / would:
subprocess.run(t"/usr/bin/cp file with space destination with space") possibly know how to split the string?

subprocess.run('/usr/bin/cp','file with space',"destination with space") works just fine. Add it’s Python so you don’t have to worry whatever / however given shell uses quotes.

4 Likes

I don’t even use shells for pipes. Just connect the subprocesses with:

class BinaryPiper(Thread):
    """Pipe binary data from source to dest"""

    def __init__(self, name: str, source:typing.BinaryIO, dest:typing.BinaryIO,chunk:int=512):
        super().__init__(name=name, daemon=False)
        self.source = source
        self.dest = dest
        self.chunk = chunk

    def run(self):
        mlogger.debug(f"BinaryPiper thread {self.name} starting")
        while True:
            data = self.source.read(self.chunk)
            if data:
                r = self.dest.write(data)
                if r < self.chunk:
                    mlogger.warning(f"{self.name} short write {r} < {self.chunk}")
            else:
                return

Judging by the comments on this topic, it seems that “safer” in the PEP title is a misnomer. This PEP might actually make subprocess usage less safe because of encouraging people to pass their subprocess command as a single string, rather than a list which is the currently recommended idiom.

16 Likes

You are misunderstanding. The idea is that this code is wrong:

source = "file with space"  # imagine this comes from user input
dest = "spaces again"

subprocess.run(f'cp {source} {dest}')  # 💣

But this would be as safe as passing a list, but possibly look nicer in the eyes of people:

subprocess.run(shlex.sh(t'cp {source} {dest}'))

It feels that we are rehashing the whole discussion about t-strings here. Please read the PEPs (Template strings + this one) for motivating examples.

4 Likes

For this example, the idea again is that whereas f-strings would not catch the vulnerability of message_from_user = "ok; rm --no-preserve-root -rf /', t-strings + shlex.sh (or subprocess.run itself) would, because the final string passed to the shell would be something like "echo 'ok; rm --no-preserve-root -rf /'".

1 Like

Though it would only be safe as long as shlex.quote correctly quotes the argument. As the PEP acknowledges, this is not true on Windows. On Unix it’s probably usually true, though it’s not clear to me how well-tested this is: there are many different shells, and it’s not clear to me that we’re confident that the escaping shlex does is always sufficient. I personally would still feel much safer writing this example as ["echo", message_from_user] without shell=True.

14 Likes

Would it not be better if the sh function made a list of strings rather than trying to quote anything?

1 Like

How about just splitting the string on interpolations?
And splitting the strings using shlex.split()? Making these commands equivalent:

subprocess.run(t"echo {message_from_user}")
subprocess.run(shlex.split("echo ") + [str(message_from_user)])
subprocess.run(["echo", str(message_from_user)])

This would at least be as safe as passing it as a list, but less versatile.

Edit: looks Oscar posted the same idea right before me.

1 Like

It’s not necessary, we just say “subprocess.Popen handles t-string as if it’s a list of strings”[1] rather than defining it in terms of the new sh function.

If people want a quoted command (to e.g. write into a script file), they’re welcome to it. The issue is defining our internal subprocess behaviour in terms of the sh function.


  1. I realise this is vague enough for people to argue that I’m wrong about the specifics. I know, and don’t care. If you want to argue with me, I’m just going to ignore you. The PEP will have more precise text in it, but if you’ve made it this far and still want to argue about pointless aspects rather than letting the PEP author update the PEP, you’re welcome to waste your time but you’re not wasting more of mine. ↩

2 Likes

I can’t see that it’s been mentioned yet, so I want to raise that even though

subprocess.run(t"some_command {message_from_user}")

is an excellent use of t-strings, it is inadvisable to claim that it is “safe on all platforms” without qualifying what “safe” means.

In particular, there are all sorts of exploits that take the form of setting message_from_user to something beginning with a hyphen. Depending on the exact command, you can do more or less with this. The usual solution to this is to write

subprocess.run(t"some_command -- {message_from_user}")

instead, but this can’t be done automatically, as whether this is correct or not will depend on what some_command is.

At the very least, this should be noted in the Security Implications of this PEP.

4 Likes

It’s certainly a security consideration, but it exists even in subprocess.run(["some_command", message_from_user]) (IMO the canonical form of a “safe approach”) , so I don’t think it’s particularly relevant to this PEP.

2 Likes

I think it could be appropriate here because the main improvement of using t-strings is that they are safer than other ways of building strings. It would be good to point out ways that they don’t prevent all the dangers.

9 Likes

We also don’t want future users coming from search engines to a PEP that says “safe” and then thinking they are safe if they use that mechanism. Might be better to give the actual PEP a less tantalizing title. :slight_smile:

6 Likes

Collating some notes on areas where the comments so far suggest we need improvements in the PEP:

  • addressing @steve.dower’s request to decouple the semantic definition of the subprocess changes from the proposed shlex changes (to avoid the problem in the initially published versions where POSIX-specific assumptions are used in defining the subprocess part of the proposal)
  • noting that “safer” is an arguable phrasing, since we’re not adding any safety that the list input format doesn’t already provide. (We, the PEP authors, think it’s a net safety gain, because t-strings are just as syntactically convenient for subprocess invocation as f-strings, while offering the same level of safety in user input processing as the list input format. However, folks that only run simple external invocations that are easy to directly format as a list, or only complex ones that they build programmatically as a list anyway, aren’t going to see any benefits from this proposal).
  • rework the proposal structure to emphasise that folks currently use shell=True on POSIX when they don’t actually need it, just because they hate the list input format so much (and convincing them to put a t in front of their input strings seems like an easier sell than asking them to rewrite those strings as lists). So while the t-string input format will work with shell=True, the real security pay-off comes when adopting the t-string format leads to users not invoking the shell at all (relying on the t-string parsing instead).
  • noting that there is an existing challenge that passing user input to external commands may allow that input to be interpreted as command options instead of purely positional inputs (shell quoting doesn’t prevent app-level security issues like that, and the t-string format will be just as vulnerable to those problems as the existing list format)
  • noting that the cleanest way to handle spaces that you don’t want to split on in the t-string format is to make them part of inline interpolation fields (so they’re implicitly quoted): subprocess.run(t"/usr/bin/cp {"file with space"} {"destination with space"}") (rather than embedding quotes directly in the fixed part of the string, which is harder to do in a cross-platform compatible way)
  • adding an open question around whether adopting mslex/oslex style functionality in the standard library is effectively a pre-requisite for the subprocess integration parts of the proposal
  • adding more complex quoting/splitting examples that illustrate why the initial version of the PEP defined the subprocess changes in terms of the shlex.sh proposal (and why I’m not sure we’re going to be able to give @steve.dower the platform independent subprocess semantics he’s after without agreeing on what subprocess.run(t"echo {'} {"} {"'single quoted arg'"} {'"double quoted arg"'}") should display).

Examples of the latter using the list format on the five shells I have easy access to (bash, dash, zsh, cmd, powershell)

# POSIX shells
>>> cmd = shlex.join(["echo", "'", '"', "'single quoted arg'", '"double quoted arg"'])
>>> subprocess.run(cmd, shell=True, executable="/usr/bin/bash") and None
' " 'single quoted arg' "double quoted arg"
>>> subprocess.run(cmd, shell=True, executable="/usr/bin/dash") and None
' " 'single quoted arg' "double quoted arg"
>>> subprocess.run(cmd, shell=True, executable="/usr/bin/zsh") and None
' " 'single quoted arg' "double quoted arg"
>>>
# Windows shells
>>> os.environ["ComSpec"]
'C:\\WINDOWS\\system32\\cmd.exe'
>>> subprocess.run(["echo", "'", '"', "'single quoted arg'", '"double quoted arg"'], shell=True) and None
' \" "'single quoted arg'" "\"double quoted arg\""
>>> shutil.which("powershell")
'C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\powershell.EXE'
>>> os.environ["ComSpec"] = shutil.which("powershell")
>>> subprocess.run(["echo", "'", '"', "'single quoted arg'", '"double quoted arg"'], shell=True) and None
The string is missing the terminator: '.
    + CategoryInfo          : ParserError: (:) [], ParentContainsErrorRecordException
    + FullyQualifiedErrorId : TerminatorExpectedAtEndOfString

>>> cmd = shlex.join(["echo", "'", '"', "'single quoted arg'", '"double quoted arg"'])
>>> subprocess.run(cmd, shell=True) and None
The string is missing the terminator: '.
    + CategoryInfo          : ParserError: (:) [], ParentContainsErrorRecordException
    + FullyQualifiedErrorId : TerminatorExpectedAtEndOfString

>>>

(shlex.join required on POSIX due to the “list elements after the first are ignored for shell=True” behaviour mentioned above. Apparently powershell doesn’t like either the implicit-on-Windows subprocess.list2cmdline or shlex.join)

1 Like

No, the shell escaping we use is for cmd.exe,[1] which is the default for shell=True. Is it a bad default? Yeah. Can we make a set of rules that work for both? Not a chance. But at least with t-strings we can define compatible splitting rules for all shells/platforms, which allow us to treat each argument individually while we do substitutions, and then quoting rules can be applied based on the best information we have (which includes the path to the shell executable, so they could work out to be pretty good, really).

The non-shell escaping should be able to handle those examples fine. It’s entirely handled by applications on Windows, so there’s no real spec, but “whatever the standard C runtime does to generate argv” is good enough, and conveniently is what CPython uses, so we can at least be assured of quoting things properly for ourselves.


  1. To the very limited extent that we do anything here. I think we mostly just say “you’re on your own, good luck” when you pass shell=True. ↩

2 Likes

Since the cross-platform concerns in subprocess turned out to be even trickier than we realised, @nhumrich and I have decided not to push for those changes in the remaining (short) 3.14 time frame.

We’ll still propose shlex.sh (along with t-string support for shlex.split) as a simple feature request for 3.14, since that part seems much less controversial, but the PEP as a whole will be targeting 3.15.

6 Likes