Skip to content

gh-120057: Add os.get_user_default_environ() function #120494

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Jun 14, 2024

@vstinner
Copy link
Member Author

vstinner commented Jun 14, 2024

Example:

  • Run Python REPL.
  • Open the environment GUI editor in Windows Parameters. Add an user variable "TEST" equal to "value".
  • In Python, get os.environ["TEST"]: you should get an error.
  • In Python, run os.environ.update(os.get_user_default_environ())
  • In Python, get os.environ["TEST"]: you should "value" as expected.

Example:

vstinner@WIN C:\victor\python\main>python
>>> # Add TEST variable in the GUI environment editor
>>> import os
>>> os.environ['TEST']
KeyError: 'TEST'
>>> os.environ.update(os.get_user_default_environ())
>>> os.environ['TEST']
'value'

@vstinner
Copy link
Member Author

os.get_user_default_environ() gets environment variables which can be managed by this GUI:

Environment Variables

(The screenshot is truncated, there are more variables after Path :-) But it should give you an idea of what we are talking about.)

@vstinner
Copy link
Member Author

cc @eryksun

Lib/os.py Outdated
Comment on lines 841 to 846
parts = entry.split('=', 1)
if len(parts) != 2:
# Skip names that begin with '='
continue
name, value = parts
env[name] = value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work for an entry such as "=C:=C:\Temp". For a single split on "=", it yields the incorrect result ['', 'C:=C:\\Temp'] instead of ['=C:', 'C:\\Temp']. Since variable names that start with '=' are strictly illegal in POSIX environ (but not in a Windows environment block), I'd prefer to just test for and skip such entries.

Suggested change
parts = entry.split('=', 1)
if len(parts) != 2:
# Skip names that begin with '='
continue
name, value = parts
env[name] = value
# Skip conventionally hidden variables that begin with '='.
if entry[0] != '=':
name, value = entry.split('=', 1)
env[name] = value

If entry is empty, then the environment block is corrupt, and it's okay for entry[0] to raise IndexError. If the split doesn't create a 2-item list, then the environment block is corrupt, and it's okay for the unpack to raise ValueError.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the code to be closer to convertenviron() and I added comments.

Copy link
Member Author

@vstinner vstinner Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=C:=C:\Temp creates an environment variable with an empty name and C:=C:\Temp value. Same as convertenviron().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convertenviron() will never see a variable name that starts with "=" because UCRT strictly forbids such names in _[w]environ. Adding a conventionally hidden variable as an empty name is broken nonsense. Please just skip conventionally hidden names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please do not mask a corrupt environment block by skipping entries that fail to split into exactly a name-value pair. This should not be possible unless there was memory corruption. A corruption in the environment block should raise an exception, not be silently ignored.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please just skip conventionally hidden names.

What do you mean by "conventionally hidden"?

Copy link
Member Author

@vstinner vstinner Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the PR to ignore variables with empty names and I added tests.

But I prefer to silently ignore variables with no name since the user cannot fix it and it would prevent using the function if it would raise an exception. And I don't want to emit a warning for that. putenv() raises an exception if the input is invalid. But getenv() doesn't raise an exception if the environment is invalid. (On Linux, I don't see how the environment can be invalid.)

Copy link
Contributor

@eryksun eryksun Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "conventionally hidden"?

CMD and PowerShell go out of their way to hide variable names that start with "=" and disallow setting them, though CMD expands them using its % syntax, and CMD's set command has a bug in which set " mistakenly displays variables that begin with "=". The C runtime excludes them from _[w]environ and forbids setting such a variable name via _[w]putenv(). Applications can set and get such variables via WinAPI SetEnvironmentVariableW() and GetEnvironmentVariableW(), or in an environment block that's passed to CreateProcessW(). However, the API documentation in changing environment variables stresses that "[b]ecause the equal sign is a separator, it must not be used in the name of an environment variable".

The one legitimate case in the system is for setting the per-drive working directory on drives "A:" to "Z:" using environment variable names such as "=Z:". These variables get used by WinAPI GetFullPathNameW() when resolving drive-relative paths (e.g. "Z:spam"). Supporting this capability is optional for an application. The working directory on a drive is assumed to be its root directory if the environment variable doesn't exist (except of course if it's the drive of the actual working directory of the process). Python's os.chdir() opts into creating these variables.

I suppose initial per-drive working directories can be set in a user's "HKCU\Environment". They'll appear in the CreateEnvironmentBlock() result, though nothing in the API documentation actually says that such variables are allowed and will be included by CreateEnvironmentBlock().

Since Python uses a C based environ, I'm dubious about keeping these conventionally hidden variables. It's simplest to just skip them. The only place it could be useful is when the environment block is passed to subprocess.Popen, but I wouldn't lose sleep over it.

@vstinner vstinner requested a review from a team as a code owner June 17, 2024 12:14
@zooba
Copy link
Member

zooba commented Jun 17, 2024

As posted on Discourse, I don't like this proposal in general. However, this looks like (almost) the best possible implementation, so if it's decided to go ahead then this looks good to me.

I would prefer the CreateEnvironmentBlock functions be dynamically loaded, rather than statically linked to userenv.dll, as it will avoid a performance penalty on all users at startup.

@eryksun
Copy link
Contributor

eryksun commented Jun 17, 2024

I would prefer the CreateEnvironmentBlock functions be dynamically loaded, rather than statically linked to userenv.dll, as it will avoid a performance penalty on all users at startup.

But there's no perceptible performance penalty for the loader mapping "userenv.dll" into the process address space. I can understand dynamic loading via explicit LoadLibraryExW() and GetProcAddres() for newer APIs that may not be present, but not for functions that have been in the API since Windows NT 4.0 in 1996. That's just adding a lot of ugly noise to the source code. At most I could understand delay-loading a DLL (which is still automatic and transparent in the source) that's expensive to map into a process (e.g. loading "user32.dll" is especially expensive since it converts to a GUI process) or unlikely to already be loaded in the system.

@zooba
Copy link
Member

zooba commented Jun 17, 2024

Agreed that user32 is particularly bad, but userenv chains through a lot of other DLLs (particularly to do with network access) that we currently don't load. I'd prefer not to load any of them until we need to.

@eryksun
Copy link
Contributor

eryksun commented Jun 17, 2024

but userenv chains through a lot of other DLLs (particularly to do with network access) that we currently don't load. I'd prefer not to load any of them until we need to.

I just checked. Loading "userenv.dll" at startup doesn't load anything else in Windows 11. It has several delay-loaded dependencies. In particular, calling CreateEnvionmentBlock() pulls in "profapi.dll" and "SspiCli.dll".

@zooba
Copy link
Member

zooba commented Jun 17, 2024

Thanks for checking. In that case, I'm still opposed to the function entirely, but the proposed code here is fine.

Comment on lines +851 to +852
# If a variable is set twice, use the first value
env.setdefault(name, value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this doesn't hurt. But it shouldn't be possible to get duplicates unless there's a serious bug. The new environment is built by assigning values to the environment block, which is handled as a parameter by the same code that implements WinAPI SetEnvironmentVariableW().

Comment on lines +200 to +202
On Windows, :func:`get_user_default_environ` can be used to update
:data:`os.environ` to the latest user and system environment variables, such
as the ``Path`` variable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally, a new environment block gets passed to a spawned child process, such as via the env parameter of subprocess.Popen. I'd emphasize that over updating os.environ, which seems to be a point of controversy.

@vstinner vstinner marked this pull request as draft June 20, 2024 19:17
@vstinner
Copy link
Member Author

@zooba:

In that case, I'm still opposed to the function entirely, but the proposed code here is fine.

I'm no longer sure about the use case of this feature. I prefer to abandon my PR.

@vstinner vstinner closed this Jun 24, 2024
@vstinner vstinner deleted the default_env branch June 24, 2024 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants