-
Notifications
You must be signed in to change notification settings - Fork 877
Move backslash unescaping to treeprocessor #1272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
By unescaping backslash escapes in a treeprocessor, the text is properly escaped during serialization. Fixes Python-Markdown#1131. As it is recognized that varous third-party extensions may be calling the old class at `postprocessors.UnescapePostprocessor` the old class remains in the codebase, but has been deprecated and will be removed in a future release. The new class `treeprocessors.UnescapeTreeprocessor` should be used instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below are a few comments and concerns I have about this change. Feedback is welcome.
@@ -9,7 +9,7 @@ | |||
<p>Right bracket: ]</p> | |||
<p>Left paren: (</p> | |||
<p>Right paren: )</p> | |||
<p>Greater-than: ></p> | |||
<p>Greater-than: ></p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the one and only change in behavior in the existing tests. I'm okay with this, however, as technically this results in valid output. The reason for the change is that the angle bracket gets escaped during serialization. Previously, a placeholder was there during serialization, which was swapped out for the actual character later. The whole point of this change was to better ensure valid HTML output, so this is an acceptable change in behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having unescaped >
in HTML was a bug, so good that you fixed it.
""" Loop over all elements and unescape all text. """ | ||
for elem in root.iter(): | ||
# Unescape text content | ||
if elem.text and not elem.tag == 'code': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we actually need to skip code
tags, In fact, if I remove the check, the tests all pass. In fact, the previous code did not have a way to distinguish between code and other content. However, there is always a possibility that code
could intentionally contain what looks like a placeholder. In that case, the content should not be altered. Therefore, I have left the check in.
I'll try and look at this soon. I'd like to pull it and see how it impacts some of my things. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As all tests pass and the bug is fixed, I am happy with this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see nothing breaking on my end. Seems good.
See Python-Markdown/markdown#1272. Signed-off-by: Anders Kaseorg <[email protected]>
This replaced the deprecated `markdown.postprocessors.UnescapePostprocessor` in Python-Markdown/markdown#1272. Signed-off-by: Anders Kaseorg <[email protected]>
This replaced the deprecated `markdown.postprocessors.UnescapePostprocessor` in Python-Markdown/markdown#1272. Signed-off-by: Anders Kaseorg <[email protected]>
See Python-Markdown/markdown#1272. Signed-off-by: Anders Kaseorg <[email protected]>
This replaced the deprecated `markdown.postprocessors.UnescapePostprocessor` in Python-Markdown/markdown#1272. Signed-off-by: Anders Kaseorg <[email protected]>
This replaced the deprecated `markdown.postprocessors.UnescapePostprocessor` in Python-Markdown/markdown#1272. Signed-off-by: Anders Kaseorg <[email protected]>
See Python-Markdown/markdown#1272. Signed-off-by: Anders Kaseorg <[email protected]>
This replaced the deprecated `markdown.postprocessors.UnescapePostprocessor` in Python-Markdown/markdown#1272. Signed-off-by: Anders Kaseorg <[email protected]>
This replaced the deprecated `markdown.postprocessors.UnescapePostprocessor` in Python-Markdown/markdown#1272. Signed-off-by: Anders Kaseorg <[email protected]>
See Python-Markdown/markdown#1272. Signed-off-by: Anders Kaseorg <[email protected]>
By unescaping backslash escapes in a treeprocessor, the text is properly
escaped during serialization. Fixes #1131.
As it is recognized that varous third-party extensions may be calling the
old class at
postprocessors.UnescapePostprocessor
the old class remainsin the codebase, but has been deprecated and will be removed in a future
release. The new class
treeprocessors.UnescapeTreeprocessor
should beused instead.