PEP 787: Safer subprocess usage using t-strings

If the goal here is to be safer, I agree with much of the prior discussion, and think that adding it to the existing subprocess apis likely won’t help. New apis that exclusively accept t-strings, even if they are just abstractions around existing apis, seems like something that could reasonably live somewhere in the standard library as the encouraged use for new code (without any plan of breaking existing code by removing the existing apis on any short timeline)

7 Likes

I wonder if we’re just putting lipstick on a pig. In other forums, I’ve seen recommendations for plumbum which @henryiii is a maintainer of. I’ve never used it, but the API looks interesting, and it makes me think there’s gotta be better ways to shell out from Python than the subprocess API.

10 Likes

But the goal here isn’t simply to be safer, but to be more readable at the same time, or we’d all be just sticking to a list, which is unquestionably the safest but also the clunkiest. I shrug every time I read code that puts a long list of arguments and options into quotes separated by commas, making the command line that much noisier to read.

I think it’s more reasonable at this point to narrow the scope of the proposal to making the existing subprocess APIs accept a t-string only when shell=False.

3 Likes

That’s certainly a matter of perception. Personally I’d rather read a list of arguments, because I know that there will be no splitting issues if some arguments have whitespace or other special characters in them.

Given that t-strings don’t seem like they would be able to eliminate all issues with splitting, I’m not sure I would like to read them in that context. Also, as I pointed above, it’s easy to misread a leading “f” as a “t” or the reverse.

4 Likes

True. A t-string can’t unambiguously split t"echo 'Hello, World!'" across platforms even with shell=False unless we document the splitting to always follow the shlex/bash logics, which may be counter-intuitive to some Windows users unfortunately.

In many real-world use cases where spaces and special characters are not part of the literal arguments of a command line though, I think t-string can still be a more readable option, and we may just have to document the limitations like we already do for shell=True in the Security Considerations section.

This sounds like an argument against PEP750: Template Strings (new updates) rather than this proposal though.

You can also take advantage of the list to show the intent of the command more clearly, at the expense of vertical space. I think that it’s even easier to understand at a glance since the brain doesn’t have to “parse” the line:

run([
    sys.executable,
    "-m", "pymsbuild",
    "wheel"
])
run([
    sys.executable,
    "scripts/generate-nuget-index.py",
    LAYOUT / "bundled" / "index.json"
])
run([
    "py-manager.exe",
    "install",
    "-v",
    "-f",
    "--download", TEMP / "bundle",
    "default"
])
5 Likes

Yes and no :slight_smile: As long as t-strings and regular strings cannot be used interchangeably, then it’s not really a problem. Just as suggested above.

Not all brains work the same way. I’m much more comfortable reading horizontally than vertically (I’m also skeptical that you’re able to “read at a glance” 18 lines of text as easily as 3, but I’m not going to rule out the possibility… I’d want to hear a lot of people say that it’s just as easy/easier to glance at 18 lines as 3 before I generalise though).


The greater point is that my proposal is to allow either approach, while the counter-proposal is to force people into one particular format. I’d rather let people choose the style that works best for their case.

1 Like

Except then you have a need to use t-strings without any substitutions (which I bet overly aggressive linters will complain about), or else rewrite a perfectly good string into a list just because you made a minor edit (possibly a temporary edit, though I guess you’d ignore the linter in that case).

Arguing “allowing t-strings will cause people to mess up regular strings” is a strange argument, in my opinion. We ought to give our users a bit more credit.

1 Like

Of the three examples posted, I find the wrapped-sequence easiest to read. This might be due to frequency of exposure though, as for cross-platform and cross-shell support it is hard to use strings (as discussed). I think it’s marginally easier to scan around a ‘squarer’ block, but this is clearly very subjective.

Personally, I’m not sure I’d use a t-string API, as I have more confidence when using a sequence that every argument is passed/quoted as I expect, but there may be usability benefits for throwaway scripts, beginners, or users coming from other languages. I agree with the concerns about mixing up f- and t-strings (I’ve done it several times when working on the PEP 750 implementation) / inadvertently promoting more usage of plain strings if linters are overly agressive.

A

1 Like

I don’t think we should consider linters doing the wrong thing when deciding about features. Linters cannot know if the function accepts non-template string (which is after all perfectly allowed) and therefore shouldn’t assume as such. And if they start special casing subprocess.run, I would hope they also detect f-strings being used here, meaning they get an error message.

2 Likes

We should consider it just as much as we consider users doing the wrong thing. I’m okay with assuming either perfection or fallibility, provided we’re consistent. (And this discussion is the time and place to discuss things that wouldn’t necessarily make it into the PEP, so no problem with exploring both here to help us decide which assumption is most appropriate.)

No. Linters are large projects that worst case can have mistakes corrected based on user reports or tuned by config files. I can’t exactly fix a user making a mistake, they will do that and continue to do it again and again. (maybe not this exact user, but others might)

Worst case, CPython can explicitly call out linters and explain why this linting rule (forcing conversion of non-substrition template strings to flat strings) is incorrect, and promotes bad practices in the places where it doesn’t break stuff.

1 Like

Overly aggressive linters should just be fixed then.

Is “a simple and easy to overlook typo can turn into a security hole” a strange argument too?

Well, I don’t know about you, but I’m part of “our users” and I don’t want to see this.

Aye, I’ve definitely been persuaded by the discussion that this is the place to start.

I was previously thinking we could at least pursue that part for 3.14b1, but honestly, the implementation is short enough that having it written up as a recipe in the docs, or even just in the draft PEP, should be sufficiently useful in the near term, without having to make the long term support commitment associated with a published standard library API.

For APIs that accept plain strings for historical reasons, the most readily available type hinting solution for improved safety is often to replace str with typing.LiteralString: typing — Support for type hints — Python 3.13.3 documentation

That way folks that haven’t considered the possibility of injection attacks get a static type checking error by default, while folks that have done everything correctly can always make the error go away with cast(LiteralString, ...), even if the typechecker can’t work out on its own that the string has been correctly escaped.

Given that typeshed doesn’t currently do that for the subprocess API stubs (it accepts plain strings, which means typecheckers don’t complain about f-strings either), the PEP should at least note that it may help smooth the path to typeshed eventually adopting a stricter parameter signature more like _CMD: TypeAlias = LiteralString | PathLike[str] | PathLike[bytes] | Template | Sequence[StrOrBytesPath] (instead of its current _CMD: TypeAlias = StrOrBytesPath | Sequence[StrOrBytesPath]).

The proposal is to force the Documentation team to to expand the already bloated subprocess Documentation. As noted by CAM - Gelach in 2022

Take an example, the API reference for subprocess.run or subprocess.Popen. The former is over a full page, and the latter six pages of interleaved prose paragraphs, warnings, notes, “new/changed in”, “availability” and more, without much clear organization.

That’s not the fault of the Documentation writers, it’s an artifact of the historial development and expansions of subprocess. (universal_newlines and text? stdout=subprocess.PIPE and capture_output?)

The counter proposal to have pity on new Python developers trying to figure out subprocess by not adding additional complexity to an already overly complex module by not expanding the module with a brand new language feature that offers no benefits over the existing list implementation except aesthetics. (A manner of opinion).

If this is really important to the community deprecate the whole subprocess module and write a new module (e.g.) “launch” from scratch. lists or t-strings. No duplicate arguments, et. al.

1 Like

For compatibility reasons - if you dig into the history then you’ll find that the unusual or problematic one existed first, and the more normal/consistent option was added later. But because we don’t like breaking people, we didn’t actively remove the old one. Arguably, their documentation could be pushed a long way down, so that they’re only found by people wondering what the old code they’re reading is doing.

I mean, broadly speaking I’m okay with not adding things just for the sake of adding them. But do be aware that there are lots of people with lots of little problems that they’re constantly wanting to solve by making the language just a little bit more complex - this argument covers all of those, and is effectively just telling people to stop contributing to Python (which, as I said, I’m not totally against - just be clear on what you’re proposing here).

I think it was clear:

Others have expressed similar sentiment. Any new t-string consuming API is a new API that depends on new syntax. The API can have new functions and does not need to be encumbered by historical compatibility concerns.

Of course if it is not possible to make a new API that is significantly better than the current API then it is not worth the churn and maybe just adding t-strings to run is reasonable. I think most people would say that significantly better APIs are possible though.

One obvious shortcoming of the subprocess API is precisely the shell boolean parameter where the make-a-separate-function rule definitely applies. Since t-strings are intended precisely for the string-argument and shell=True case which is already sort of the not-recommended use of run why not make a new function that accepts t-strings and has a generally improved interface?

2 Likes

I meant at the meta-level - the proposal that existing APIs shouldn’t grow to support new protocols, but entirely new APIs should be added. As far as I’m aware, it’s never been an approach that Python specifically followed, and we have plenty of examples of existing APIs expanding to support new protocols that make adoption easier without leaving cruft behind or forcing learning from scratch.

Proposing to deprecate and replace all of subprocess is definitely straying too far from this thread’s purpose to be helpful. Though if there was a serious proposal in flight, I’d expect this one to be held up until it was resolved.

I don’t think that the reasons for any points made here are related to meta-level considerations. The discussion is very much about the specifics of the subprocess module, its current API and how that gets used and the specific proposal on the table to change that API as compared to alternatives such as adding a new function.

When Barry talks about putting lipstick on a pig I’m confident that his reasons for doing so lie somewhere below the meta-level.

1 Like