Skip to content

gh-123803: Support arbitrary code page encodings on Windows #123804

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 18, 2024

Conversation

serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Sep 7, 2024

If the cpXXX encoding is not directly implemented in Python, fall back to use the Windows-specific API codecs.code_page_encode() and codecs.code_page_decode().

If the cpXXX encoding is not directly implemented in Python, fall back
to use the Windows-specific API codecs.code_page_encode() and
codecs.code_page_decode().
Copy link
Member

@zooba zooba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks fine, and I can see now that it's not so trivial to raise a different exception. Maybe we can improve the current "unknown encoding" exception string to say "encoding '{name}' is not registered" or something that suggests it might be possible to fix by adding the encoding? I expect many users would think that encodings are static.

@bedevere-app
Copy link

bedevere-app bot commented Nov 5, 2024

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@serhiy-storchaka
Copy link
Member Author

I have made the requested changes; please review again.

@bedevere-app
Copy link

bedevere-app bot commented Nov 7, 2024

Thanks for making the requested changes!

@malemburg: please review the changes made to this pull request.

@bedevere-app bedevere-app bot requested a review from malemburg November 7, 2024 07:00
Copy link
Member

@malemburg malemburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @serhiy-storchaka
This looks good now.

@serhiy-storchaka
Copy link
Member Author

Thank you for your review @zooba and @malemburg. I do not particularly like the change in the error message. Can we do without it?

@zooba
Copy link
Member

zooba commented Nov 12, 2024

I guess if people would rather wait and see if the difference in availability causes confusion, then we can do that. I'd prefer to be more clear in the docs, as I don't think developers or users currently expect codecs to be platform-specific like this.

@malemburg
Copy link
Member

Thank you for your review @zooba and @malemburg. I do not particularly like the change in the error message. Can we do without it?

How about "encoding XYZ not available" ?! I'd also be fine with leaving the current error message in place.

The "not registered" is not quite correct, since it assumes that there is a codec with that name available somewhere, it's just not registered. This is not always the case, though, e.g. if you mistype an encoding name.

@zooba
Copy link
Member

zooba commented Nov 12, 2024

"Not registered" at least implies that there's a potential way to fix it, whereas "not available" suggests the user has to come to us to ask us to make it available. (I'd assume in this case they wouldn't read our docs either.)

"Not available on this platform" would be okay, but feels less accurate overall, since as you say a mistyped codec isn't going to be available on any platform. We can't seem to customise the message for a known-but-not-present codec though, so it's a bit of a tough spot.

@serhiy-storchaka serhiy-storchaka enabled auto-merge (squash) November 18, 2024 17:24
@serhiy-storchaka serhiy-storchaka merged commit f7ef020 into python:main Nov 18, 2024
36 checks passed
@malemburg
Copy link
Member

Thanks, @serhiy-storchaka

@serhiy-storchaka serhiy-storchaka deleted the code-page-codecs branch November 20, 2024 16:48
ebonnal pushed a commit to ebonnal/cpython that referenced this pull request Jan 12, 2025
…thonGH-123804)

If the cpXXX encoding is not directly implemented in Python, fall back
to use the Windows-specific API codecs.code_page_encode() and
codecs.code_page_decode().
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants