Support fenced_code optional lang trailer: '::' up to } or newline #977

wlupton · 2020-06-06T15:51:53Z

This is to allow sphinx_markdown_parser's AutoStructify Embed reStructuredText option to work. Its second style expects lang tags like this to work as ReST directives:

``` automodule:: module-name
```

sphinx-markdown-parser assumes that the entire automodule:: module-name is parsed as the lang tag (and I can only assume that this used to work). However it no longer works because the lang tag needs to match this RE fragment (which doesn't allow spaces or colons):

(?P<lang>[\w#.+-]*)

The fix in this PR retains the above fragment but adds this optional fragment:

(?P<lang2>::[^}\n]*)?

I realise that the fix in this PR is probably not acceptable, because it doesn't sufficiently consider the impact on other valid syntaxes such as the hl_lines option, and indeed it may go against the spirit of the lang tag (although where is the acceptable lang syntax defined?). However I wanted to test the temperature to see whether people think that a fix at the markdown parser level might be acceptable.

This is to allow sphinx_markdown_parser AutoStructify to work. It expects lang tags like this to work: ``` automodule:: bin.onusim ```

waylan · 2020-06-06T16:59:01Z

First of all, #816 is a mostly complete (only missing docs) refactor of fenced code attributes. I would think any change here needs to be on that, rather than what is in master now. Ignoring the concerns raised below, that refactor might be a good excuse to add support for a new feature like this.

That said, I do have a concern. This change allows syntax that the extension does nothing with. In other words, if a user were to use Python-Markdown directly and use a ReST directive, the directive would be interpreted as the lang of a code block for syntax highlighting. I assume Pygments would just see it as a unrecognized lang and ignore it, but we need to be sure it won't break things. If we are officially supporting the syntax, then perhaps we should check for it an not pass the content to Pygments. More importantly, it could be confusing to users that the ReST directive is ignored if we 'support them.' Users would reasonably expect the directive to do something. However, that it outside the scope of this extension.

I haven't looked at how sphinx-markdown-parser handles this, but my assumption would be that they provide a separate fenced code extension which handles the directives and replaces the built-in extension. Seems to me that that would be a more sensible approach than to simply allow the directives to be passed in with no action taken on them. Whether that extension forks or subclasses the builtin extension, or is built from scratch is simply an implementation detail. However, the extension would be maintained as a third-party extension outside of Python-Markdown. In that case, there would be no reason to make any changes for the built-in extension.

wlupton · 2020-06-08T08:58:43Z

Thanks @waylan. I wasn't aware of #816.

I share the concern of your second paragraph, and I never expected this PR to be merged as is.

I believe that sphinx-markdown-parser simply assumes that the node's language attribute will contain the necessary text. This must have worked at some point.

With the #816 change, would the following be valid? If so, maybe it would be best just to do something like this:

``` {rst="automodule:: module-name"}
```

Alternatively just stick to something that will be parsed as a language (so no change on the markdown parser side), and then transform it on the sphinx-markdown-parser side, e.g.,

``` automodule--module-name
```

BTW, I think that at present, non-matching "language" values can cause the fenced blocks to get out of sync (I could provide an example), so I think that invalid language/class/keyword text needs to match the RE. Is this addressed by #816?

waylan · 2020-06-08T12:54:06Z

``` {rst="automodule:: module-name"}
```

Yes, that should work fine.

Alternatively just stick to something that will be parsed as a language (so no change on the markdown parser side), and then transform it on the sphinx-markdown-parser side, e.g.,
``` automodule--module-name
```

That should work as well.

BTW, I think that at present, non-matching "language" values can cause the fenced blocks to get out of sync (I could provide an example), so I think that invalid language/class/keyword text needs to match the RE.

This has always been the case and is intentional. If you provide bad input, then you get bad output. That is the clue that you did something wrong.

waylan · 2020-06-08T14:42:55Z

If I'm understanding what Autostructify is doing under the hood, I expect that something else that would work with no changes being required to any upstream libs would be this:

``` {class="automodule:: module-name"}
```

Of course, the problem with any of these solutions is that they require a different syntax than what recommonmark requires. I expect that sphinx-markdown-parser's goal is to provide a consistent syntax regardless of which tool is being used under-the-hood.

While spelunking around in sphinx-markdown-parser's code, something else I noticed is that they are not using Python-Markdown's code highlighting, but leaving that for docutils/Sphinx to handle (which seems reasonable to me). Therefore, the fenced_code extension seems like it is overkill for them as it has code highlighting built in. They don't need support for all the extra attributes (as Autostructify provides that via the ReST syntax), but they do need support for the single expression to contain a space. However, we need support for multiple attributes to be defined on one line and single expressions to not contain a space. The two sets of goals and needs are in conflict with one another. Given the above, I would think that sphinx-markdown-parser should provide their own fenced code extension which serves their needs more clearly.

I'm going to close this as something which should be implemented by a third-party extension.

As an aside, I've given a lot of thought over the years to how one might support ReST directives in Markdown and I find the Autostructify library interesting. However, it is not what I would have done. It simply uses ReST syntax wrapped up in code blocks. Instead a Markdownesque syntax would be more preferable. I don't know what that would look like exactly, but it is not Autostructify IMO.

wlupton · 2020-06-08T14:50:49Z

Your class= example is similar to my rst= example isn't it? (Here "class" is just a random keyword, not a CSS class, right?)

I agree with your comments. Note that I only came across all this a couple of weeks ago (I'm a user here) and I have no axe to grind. There appears to be some history, with AutoStructify coming originally from recommonmark and having been reimplemented in python-markdown-parser. There other areas where it manages to be more directly "markdowny", e.g., it makes a valiant attempt at defining the toc tree in pure markdown.

waylan · 2020-06-08T15:04:04Z

Your class= example is similar to my rst= example isn't it? (Here "class" is just a random keyword, not a CSS class, right?)

No, that should be a HTML class attribute. The dot syntax is simply shorthand. However you can define it explicitly, which allows you to use quotes and preserve whitespace.

Although, now that I think about it, maybe that doesn't work. It probably should.

facelessuser · 2020-06-08T15:11:59Z

In thought the refactor didn't allow setting arbitrary HTML attributes, or an I misunderstanding something? I though it wasn't allowed due to conflict with preexisting setting syntax.

wlupton · 2020-06-08T15:19:46Z

@waylan, isn't class special though? What would this result in?

{#mycode .python .numberLines class="something"}

waylan · 2020-06-08T16:57:53Z

Sorry I forgot that key=value pairs do not get used as HTML attributes like they do in the attr_list extension. That being the case neither rst= nor class= would work as they would not be preserved.

wlupton · 2020-06-08T17:30:34Z

Ah, I didn't realise this. I guess hl_lines is special. Is there any way that the attribute handling could be more common across extensions, e.g., I think that pandoc attributes are the same wherever they're supported (and one of the places where they're supported is fenced code blocks). From https://pandoc.org/MANUAL.html:

Optionally, you may attach attributes to fenced or backtick code block using this syntax:

~~~~ {#mycode .haskell .numberLines startFrom="100"}
qsort []     = []
qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++
               qsort (filter (>= x) xs)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Here mycode is an identifier, haskell and numberLines are classes,
and startFrom is an attribute with value 100. Some output formats can
use this information to do syntax highlighting.

[...] the code block above will appear as follows:

<pre id="mycode" class="haskell numberLines" startFrom="100">
  <code>
  ...
  </code>
</pre>

In fact (and this is probably the last thing you want to hear!) is there any way that the fenced code block extension could depend on the attr_list extension for processing its attribute lists?

waylan · 2020-06-08T18:50:02Z

@wlupton that is a discussion for #816.

_{Sent with GitHawk}

Support fenced_code optional lang trailer: '::' up to } or newline

80a1969

This is to allow sphinx_markdown_parser AutoStructify to work. It expects lang tags like this to work: ``` automodule:: bin.onusim ```

wlupton mentioned this pull request Jun 6, 2020

Miscellaneous improvements (will add more detail later) clayrisser/sphinx-markdown-parser#26

Merged

waylan added the 3rd-party Should be implemented as a third party extension. label Jun 8, 2020

waylan closed this Jun 8, 2020

waylan mentioned this pull request Jun 8, 2020

Refactor fenced code attributes #816

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support fenced_code optional lang trailer: '::' up to } or newline #977

Support fenced_code optional lang trailer: '::' up to } or newline #977

Uh oh!

wlupton commented Jun 6, 2020

Uh oh!

waylan commented Jun 6, 2020

Uh oh!

wlupton commented Jun 8, 2020

Uh oh!

waylan commented Jun 8, 2020

Uh oh!

waylan commented Jun 8, 2020

Uh oh!

wlupton commented Jun 8, 2020

Uh oh!

waylan commented Jun 8, 2020

Uh oh!

facelessuser commented Jun 8, 2020 •

edited

Loading

Uh oh!

wlupton commented Jun 8, 2020

Uh oh!

waylan commented Jun 8, 2020

Uh oh!

wlupton commented Jun 8, 2020

Uh oh!

waylan commented Jun 8, 2020

Uh oh!

Uh oh!

Support fenced_code optional lang trailer: '::' up to } or newline #977

Support fenced_code optional lang trailer: '::' up to } or newline #977

Uh oh!

Conversation

wlupton commented Jun 6, 2020

Uh oh!

waylan commented Jun 6, 2020

Uh oh!

wlupton commented Jun 8, 2020

Uh oh!

waylan commented Jun 8, 2020

Uh oh!

waylan commented Jun 8, 2020

Uh oh!

wlupton commented Jun 8, 2020

Uh oh!

waylan commented Jun 8, 2020

Uh oh!

facelessuser commented Jun 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wlupton commented Jun 8, 2020

Uh oh!

waylan commented Jun 8, 2020

Uh oh!

wlupton commented Jun 8, 2020

Uh oh!

waylan commented Jun 8, 2020

Uh oh!

Uh oh!

facelessuser commented Jun 8, 2020 •

edited

Loading