Skip to content

Move backslash unescaping to treeprocessor #1272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 15, 2022
Merged

Conversation

waylan
Copy link
Member

@waylan waylan commented Jul 13, 2022

By unescaping backslash escapes in a treeprocessor, the text is properly
escaped during serialization. Fixes #1131.

As it is recognized that varous third-party extensions may be calling the
old class at postprocessors.UnescapePostprocessor the old class remains
in the codebase, but has been deprecated and will be removed in a future
release. The new class treeprocessors.UnescapeTreeprocessor should be
used instead.

By unescaping backslash escapes in a treeprocessor, the text is properly
escaped during serialization. Fixes Python-Markdown#1131.

As it is recognized that varous third-party extensions may be calling the
old class at `postprocessors.UnescapePostprocessor` the old class remains
in the codebase, but has been deprecated and will be removed in a future
release. The new class `treeprocessors.UnescapeTreeprocessor` should be
used instead.
Copy link
Member Author

@waylan waylan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below are a few comments and concerns I have about this change. Feedback is welcome.

@@ -9,7 +9,7 @@
<p>Right bracket: ]</p>
<p>Left paren: (</p>
<p>Right paren: )</p>
<p>Greater-than: ></p>
<p>Greater-than: &gt;</p>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the one and only change in behavior in the existing tests. I'm okay with this, however, as technically this results in valid output. The reason for the change is that the angle bracket gets escaped during serialization. Previously, a placeholder was there during serialization, which was swapped out for the actual character later. The whole point of this change was to better ensure valid HTML output, so this is an acceptable change in behavior.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having unescaped > in HTML was a bug, so good that you fixed it.

""" Loop over all elements and unescape all text. """
for elem in root.iter():
# Unescape text content
if elem.text and not elem.tag == 'code':
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we actually need to skip code tags, In fact, if I remove the check, the tests all pass. In fact, the previous code did not have a way to distinguish between code and other content. However, there is always a possibility that code could intentionally contain what looks like a placeholder. In that case, the content should not be altered. Therefore, I have left the check in.

@waylan waylan requested review from facelessuser and mitya57 July 13, 2022 20:11
@facelessuser
Copy link
Collaborator

I'll try and look at this soon. I'd like to pull it and see how it impacts some of my things.

@waylan waylan added the needs-review Needs to be reviewed and/or approved. label Jul 14, 2022
Copy link
Collaborator

@mitya57 mitya57 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As all tests pass and the bug is fixed, I am happy with this change.

Copy link
Collaborator

@facelessuser facelessuser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see nothing breaking on my end. Seems good.

@waylan waylan merged commit c0f6e5a into Python-Markdown:master Jul 15, 2022
@waylan waylan deleted the 1131 branch July 15, 2022 12:38
@waylan waylan added approved The pull request is ready to be merged. and removed needs-review Needs to be reviewed and/or approved. labels Jul 15, 2022
andersk added a commit to andersk/zulip that referenced this pull request Feb 4, 2023
andersk added a commit to andersk/typeshed that referenced this pull request Feb 4, 2023
This replaced the deprecated
`markdown.postprocessors.UnescapePostprocessor` in
Python-Markdown/markdown#1272.

Signed-off-by: Anders Kaseorg <[email protected]>
andersk added a commit to andersk/typeshed that referenced this pull request Feb 4, 2023
This replaced the deprecated
`markdown.postprocessors.UnescapePostprocessor` in
Python-Markdown/markdown#1272.

Signed-off-by: Anders Kaseorg <[email protected]>
andersk added a commit to andersk/zulip that referenced this pull request Feb 4, 2023
andersk added a commit to andersk/typeshed that referenced this pull request Feb 4, 2023
This replaced the deprecated
`markdown.postprocessors.UnescapePostprocessor` in
Python-Markdown/markdown#1272.

Signed-off-by: Anders Kaseorg <[email protected]>
andersk added a commit to andersk/typeshed that referenced this pull request Feb 4, 2023
This replaced the deprecated
`markdown.postprocessors.UnescapePostprocessor` in
Python-Markdown/markdown#1272.

Signed-off-by: Anders Kaseorg <[email protected]>
timabbott pushed a commit to zulip/zulip that referenced this pull request Feb 5, 2023
andersk added a commit to andersk/typeshed that referenced this pull request Feb 9, 2023
This replaced the deprecated
`markdown.postprocessors.UnescapePostprocessor` in
Python-Markdown/markdown#1272.

Signed-off-by: Anders Kaseorg <[email protected]>
AlexWaygood pushed a commit to python/typeshed that referenced this pull request Feb 9, 2023
This replaced the deprecated
`markdown.postprocessors.UnescapePostprocessor` in
Python-Markdown/markdown#1272.

Signed-off-by: Anders Kaseorg <[email protected]>
UjjwalAggarwal-1 pushed a commit to UjjwalAggarwal-1/zulip that referenced this pull request Feb 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved The pull request is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Smarty extension causes generation of invalid HTML syntax
3 participants