Because of these earlier examples:
Ah right, I didnât consider the second scenario. Thank you.
Incidentally, that also only works on Windows. Itâs equivalent to just run("echo", shell=True)
everywhere else.
I am more and more shocked that any of my subprocess-using code has ever worked on POSIX at all
shell=True is in the fourth example of the PEP-787:
# Safe with t-strings and POSIX-compliant shell quoting:
subprocess.run(t"echo {message_from_user}", shell=True)
How is / would:
subprocess.run(t"/usr/bin/cp file with space destination with space")
possibly know how to split the string?
subprocess.run('/usr/bin/cp','file with space',"destination with space")
works just fine. Add itâs Python so you donât have to worry whatever / however given shell uses quotes.
I donât even use shells for pipes. Just connect the subprocesses with:
class BinaryPiper(Thread):
"""Pipe binary data from source to dest"""
def __init__(self, name: str, source:typing.BinaryIO, dest:typing.BinaryIO,chunk:int=512):
super().__init__(name=name, daemon=False)
self.source = source
self.dest = dest
self.chunk = chunk
def run(self):
mlogger.debug(f"BinaryPiper thread {self.name} starting")
while True:
data = self.source.read(self.chunk)
if data:
r = self.dest.write(data)
if r < self.chunk:
mlogger.warning(f"{self.name} short write {r} < {self.chunk}")
else:
return
Judging by the comments on this topic, it seems that âsaferâ in the PEP title is a misnomer. This PEP might actually make subprocess
usage less safe because of encouraging people to pass their subprocess
command as a single string, rather than a list which is the currently recommended idiom.
You are misunderstanding. The idea is that this code is wrong:
source = "file with space" # imagine this comes from user input
dest = "spaces again"
subprocess.run(f'cp {source} {dest}') # đŁ
But this would be as safe as passing a list, but possibly look nicer in the eyes of people:
subprocess.run(shlex.sh(t'cp {source} {dest}'))
It feels that we are rehashing the whole discussion about t-strings here. Please read the PEPs (Template strings + this one) for motivating examples.
For this example, the idea again is that whereas f-strings would not catch the vulnerability of message_from_user = "ok; rm --no-preserve-root -rf /
', t-strings + shlex.sh (or subprocess.run itself) would, because the final string passed to the shell would be something like "echo 'ok; rm --no-preserve-root -rf /'"
.
Though it would only be safe as long as shlex.quote
correctly quotes the argument. As the PEP acknowledges, this is not true on Windows. On Unix itâs probably usually true, though itâs not clear to me how well-tested this is: there are many different shells, and itâs not clear to me that weâre confident that the escaping shlex
does is always sufficient. I personally would still feel much safer writing this example as ["echo", message_from_user]
without shell=True
.
Would it not be better if the sh
function made a list of strings rather than trying to quote anything?
How about just splitting the string on interpolations?
And splitting the strings using shlex.split()
? Making these commands equivalent:
subprocess.run(t"echo {message_from_user}")
subprocess.run(shlex.split("echo ") + [str(message_from_user)])
subprocess.run(["echo", str(message_from_user)])
This would at least be as safe as passing it as a list, but less versatile.
Edit: looks Oscar posted the same idea right before me.
Itâs not necessary, we just say âsubprocess.Popen
handles t-string as if itâs a list of stringsâ[1] rather than defining it in terms of the new sh
function.
If people want a quoted command (to e.g. write into a script file), theyâre welcome to it. The issue is defining our internal subprocess
behaviour in terms of the sh
function.
I realise this is vague enough for people to argue that Iâm wrong about the specifics. I know, and donât care. If you want to argue with me, Iâm just going to ignore you. The PEP will have more precise text in it, but if youâve made it this far and still want to argue about pointless aspects rather than letting the PEP author update the PEP, youâre welcome to waste your time but youâre not wasting more of mine. â©ïž
I canât see that itâs been mentioned yet, so I want to raise that even though
subprocess.run(t"some_command {message_from_user}")
is an excellent use of t-strings, it is inadvisable to claim that it is âsafe on all platformsâ without qualifying what âsafeâ means.
In particular, there are all sorts of exploits that take the form of setting message_from_user
to something beginning with a hyphen. Depending on the exact command, you can do more or less with this. The usual solution to this is to write
subprocess.run(t"some_command -- {message_from_user}")
instead, but this canât be done automatically, as whether this is correct or not will depend on what some_command
is.
At the very least, this should be noted in the Security Implications of this PEP.
Itâs certainly a security consideration, but it exists even in subprocess.run(["some_command", message_from_user])
(IMO the canonical form of a âsafe approachâ) , so I donât think itâs particularly relevant to this PEP.
I think it could be appropriate here because the main improvement of using t-strings is that they are safer than other ways of building strings. It would be good to point out ways that they donât prevent all the dangers.
We also donât want future users coming from search engines to a PEP that says âsafeâ and then thinking they are safe if they use that mechanism. Might be better to give the actual PEP a less tantalizing title.
Collating some notes on areas where the comments so far suggest we need improvements in the PEP:
- addressing @steve.dowerâs request to decouple the semantic definition of the subprocess changes from the proposed shlex changes (to avoid the problem in the initially published versions where POSIX-specific assumptions are used in defining the subprocess part of the proposal)
- noting that âsaferâ is an arguable phrasing, since weâre not adding any safety that the list input format doesnât already provide. (We, the PEP authors, think itâs a net safety gain, because t-strings are just as syntactically convenient for subprocess invocation as f-strings, while offering the same level of safety in user input processing as the list input format. However, folks that only run simple external invocations that are easy to directly format as a list, or only complex ones that they build programmatically as a list anyway, arenât going to see any benefits from this proposal).
- rework the proposal structure to emphasise that folks currently use
shell=True
on POSIX when they donât actually need it, just because they hate the list input format so much (and convincing them to put at
in front of their input strings seems like an easier sell than asking them to rewrite those strings as lists). So while the t-string input format will work withshell=True
, the real security pay-off comes when adopting the t-string format leads to users not invoking the shell at all (relying on the t-string parsing instead). - noting that there is an existing challenge that passing user input to external commands may allow that input to be interpreted as command options instead of purely positional inputs (shell quoting doesnât prevent app-level security issues like that, and the t-string format will be just as vulnerable to those problems as the existing list format)
- noting that the cleanest way to handle spaces that you donât want to split on in the t-string format is to make them part of inline interpolation fields (so theyâre implicitly quoted):
subprocess.run(t"/usr/bin/cp {"file with space"} {"destination with space"}")
(rather than embedding quotes directly in the fixed part of the string, which is harder to do in a cross-platform compatible way) - adding an open question around whether adopting
mslex
/oslex
style functionality in the standard library is effectively a pre-requisite for thesubprocess
integration parts of the proposal - adding more complex quoting/splitting examples that illustrate why the initial version of the PEP defined the subprocess changes in terms of the
shlex.sh
proposal (and why Iâm not sure weâre going to be able to give @steve.dower the platform independent subprocess semantics heâs after without agreeing on whatsubprocess.run(t"echo {'} {"} {"'single quoted arg'"} {'"double quoted arg"'}")
should display).
Examples of the latter using the list format on the five shells I have easy access to (bash, dash, zsh, cmd, powershell)
# POSIX shells
>>> cmd = shlex.join(["echo", "'", '"', "'single quoted arg'", '"double quoted arg"'])
>>> subprocess.run(cmd, shell=True, executable="/usr/bin/bash") and None
' " 'single quoted arg' "double quoted arg"
>>> subprocess.run(cmd, shell=True, executable="/usr/bin/dash") and None
' " 'single quoted arg' "double quoted arg"
>>> subprocess.run(cmd, shell=True, executable="/usr/bin/zsh") and None
' " 'single quoted arg' "double quoted arg"
>>>
# Windows shells
>>> os.environ["ComSpec"]
'C:\\WINDOWS\\system32\\cmd.exe'
>>> subprocess.run(["echo", "'", '"', "'single quoted arg'", '"double quoted arg"'], shell=True) and None
' \" "'single quoted arg'" "\"double quoted arg\""
>>> shutil.which("powershell")
'C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\powershell.EXE'
>>> os.environ["ComSpec"] = shutil.which("powershell")
>>> subprocess.run(["echo", "'", '"', "'single quoted arg'", '"double quoted arg"'], shell=True) and None
The string is missing the terminator: '.
+ CategoryInfo : ParserError: (:) [], ParentContainsErrorRecordException
+ FullyQualifiedErrorId : TerminatorExpectedAtEndOfString
>>> cmd = shlex.join(["echo", "'", '"', "'single quoted arg'", '"double quoted arg"'])
>>> subprocess.run(cmd, shell=True) and None
The string is missing the terminator: '.
+ CategoryInfo : ParserError: (:) [], ParentContainsErrorRecordException
+ FullyQualifiedErrorId : TerminatorExpectedAtEndOfString
>>>
(shlex.join
required on POSIX due to the âlist elements after the first are ignored for shell=True
â behaviour mentioned above. Apparently powershell doesnât like either the implicit-on-Windows subprocess.list2cmdline
or shlex.join
)
No, the shell escaping we use is for cmd.exe,[1] which is the default for shell=True
. Is it a bad default? Yeah. Can we make a set of rules that work for both? Not a chance. But at least with t-strings we can define compatible splitting rules for all shells/platforms, which allow us to treat each argument individually while we do substitutions, and then quoting rules can be applied based on the best information we have (which includes the path to the shell executable, so they could work out to be pretty good, really).
The non-shell escaping should be able to handle those examples fine. Itâs entirely handled by applications on Windows, so thereâs no real spec, but âwhatever the standard C runtime does to generate argv
â is good enough, and conveniently is what CPython uses, so we can at least be assured of quoting things properly for ourselves.
To the very limited extent that we do anything here. I think we mostly just say âyouâre on your own, good luckâ when you pass
shell=True
. â©ïž
Since the cross-platform concerns in subprocess
turned out to be even trickier than we realised, @nhumrich and I have decided not to push for those changes in the remaining (short) 3.14 time frame.
Weâll still propose shlex.sh
(along with t-string support for shlex.split
) as a simple feature request for 3.14, since that part seems much less controversial, but the PEP as a whole will be targeting 3.15.