Don't emit stashed HTML tag placeholders in `.toc_tokens` #901

jimporter · 2020-01-31T08:09:55Z

This should resolve #899. The fix is a bit subtle, since we want to emit HTML entities for .toc_tokens, but not HTML tags. We also want to be careful to clean up data-toc-label as needed. Note: this change means that raw HTML in a heading is no longer passed through to the HTML for .toc.

I'm not entirely happy with this. In practice, I think the ideal would be for the toc extension to include the (HTML-ized) Markdown from a heading in its TOC entry, and to have .toc_tokens include both html and plaintext fields. This patch gives us a sort of middle ground, where we just use the HTML with tags stripped. The "ideal" is a larger behavioral change though, and I'm not sure it makes sense to do that for a minor release...

If you think it makes sense to implement the "ideal" solution now, I can try and do that. However, this patch is sufficient for what MkDocs needs.

…ython-Markdown#899 Note: this slightly changes existing behavior in that raw HTML tags are no longer included in the HTML `.toc`.

jimporter · 2020-01-31T08:54:00Z

Hmm, one problem with the "ideal" solution mentioned above is that we'd probably want to strip <a> tags from the heading: those are obviously going to cause problems with the links generated for navigating from the TOC itself. Given that, maybe what I have here is the best solution for now...

jimporter · 2020-01-31T11:04:58Z

One other option here would be that we strip HTML tags from the heading, but if a user really wants HTML tags in their heading, they could explicitly use { data-toc-label="Foo <code>Bar</code>" }. This is more flexible, though attr lists do seem to have some unusual behavior when trying to embed Markdown in them, e.g. { data-toc-label="Foo *Bar*" }. I haven't spent enough time digging through what's happening to understand how to fix that, but I assume it's because attr lists are just looking at the raw text, not the HTML elements generated by the initial Markdown parse.

waylan · 2020-01-31T16:06:18Z

For data-toc-label to support raw HTML, then the content would need to be stashed in the html stash. Otherwise, it would get escaped when the elementtree is serialized.

For data-toc-label to support Markdown syntax, it gets complicated. For example, we do not want a elements. And what about images or block level elements? I suppose we could allow a whitelist of known specific inline elements only. I would need to explore this more as well. But that is separate from the issue we are trying to address here.

While the fix here is more restrictive than previously, it fixes the obviously wrong output and actually does what we intended to do previously. If we want to add support for formatting TOC entries, then that would be a separate new feature.

jimporter added 2 commits January 30, 2020 23:48

Add (failing) tests for Python-Markdown#899

16394c2

Don't emit stashed HTML tag placeholders in .toc_tokens; resolves P…

d799ee9

…ython-Markdown#899 Note: this slightly changes existing behavior in that raw HTML tags are no longer included in the HTML `.toc`.

waylan merged commit ccf56ed into Python-Markdown:master Jan 31, 2020

waylan mentioned this pull request Feb 7, 2020

Use toc_tokens to generate the TOC mkdocs/mkdocs#1970

Merged

jimporter deleted the toc-strip-html branch April 8, 2020 22:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Don't emit stashed HTML tag placeholders in `.toc_tokens` #901

Don't emit stashed HTML tag placeholders in `.toc_tokens` #901

Uh oh!

jimporter commented Jan 31, 2020 •

edited

Loading

Uh oh!

jimporter commented Jan 31, 2020

Uh oh!

jimporter commented Jan 31, 2020

Uh oh!

waylan commented Jan 31, 2020

Uh oh!

Uh oh!

Don't emit stashed HTML tag placeholders in .toc_tokens #901

Don't emit stashed HTML tag placeholders in .toc_tokens #901

Uh oh!

Conversation

jimporter commented Jan 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jimporter commented Jan 31, 2020

Uh oh!

jimporter commented Jan 31, 2020

Uh oh!

waylan commented Jan 31, 2020

Uh oh!

Uh oh!

Don't emit stashed HTML tag placeholders in `.toc_tokens` #901

Don't emit stashed HTML tag placeholders in `.toc_tokens` #901

jimporter commented Jan 31, 2020 •

edited

Loading