Skip to content

Support fenced_code optional lang trailer: '::' up to } or newline #977

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

wlupton
Copy link

@wlupton wlupton commented Jun 6, 2020

This is to allow sphinx_markdown_parser's AutoStructify Embed reStructuredText option to work. Its second style expects lang tags like this to work as ReST directives:

``` automodule:: module-name
```

sphinx-markdown-parser assumes that the entire automodule:: module-name is parsed as the lang tag (and I can only assume that this used to work). However it no longer works because the lang tag needs to match this RE fragment (which doesn't allow spaces or colons):

(?P<lang>[\w#.+-]*)

The fix in this PR retains the above fragment but adds this optional fragment:

(?P<lang2>::[^}\n]*)?

I realise that the fix in this PR is probably not acceptable, because it doesn't sufficiently consider the impact on other valid syntaxes such as the hl_lines option, and indeed it may go against the spirit of the lang tag (although where is the acceptable lang syntax defined?). However I wanted to test the temperature to see whether people think that a fix at the markdown parser level might be acceptable.

This is to allow sphinx_markdown_parser AutoStructify to work. It
expects lang tags like this to work:

``` automodule:: bin.onusim
```
@waylan
Copy link
Member

waylan commented Jun 6, 2020

First of all, #816 is a mostly complete (only missing docs) refactor of fenced code attributes. I would think any change here needs to be on that, rather than what is in master now. Ignoring the concerns raised below, that refactor might be a good excuse to add support for a new feature like this.

That said, I do have a concern. This change allows syntax that the extension does nothing with. In other words, if a user were to use Python-Markdown directly and use a ReST directive, the directive would be interpreted as the lang of a code block for syntax highlighting. I assume Pygments would just see it as a unrecognized lang and ignore it, but we need to be sure it won't break things. If we are officially supporting the syntax, then perhaps we should check for it an not pass the content to Pygments. More importantly, it could be confusing to users that the ReST directive is ignored if we 'support them.' Users would reasonably expect the directive to do something. However, that it outside the scope of this extension.

I haven't looked at how sphinx-markdown-parser handles this, but my assumption would be that they provide a separate fenced code extension which handles the directives and replaces the built-in extension. Seems to me that that would be a more sensible approach than to simply allow the directives to be passed in with no action taken on them. Whether that extension forks or subclasses the builtin extension, or is built from scratch is simply an implementation detail. However, the extension would be maintained as a third-party extension outside of Python-Markdown. In that case, there would be no reason to make any changes for the built-in extension.

@wlupton
Copy link
Author

wlupton commented Jun 8, 2020

Thanks @waylan. I wasn't aware of #816.

I share the concern of your second paragraph, and I never expected this PR to be merged as is.

I believe that sphinx-markdown-parser simply assumes that the node's language attribute will contain the necessary text. This must have worked at some point.

With the #816 change, would the following be valid? If so, maybe it would be best just to do something like this:

``` {rst="automodule:: module-name"}
```

Alternatively just stick to something that will be parsed as a language (so no change on the markdown parser side), and then transform it on the sphinx-markdown-parser side, e.g.,

``` automodule--module-name
```

BTW, I think that at present, non-matching "language" values can cause the fenced blocks to get out of sync (I could provide an example), so I think that invalid language/class/keyword text needs to match the RE. Is this addressed by #816?

@waylan
Copy link
Member

waylan commented Jun 8, 2020

``` {rst="automodule:: module-name"}
```

Yes, that should work fine.

Alternatively just stick to something that will be parsed as a language (so no change on the markdown parser side), and then transform it on the sphinx-markdown-parser side, e.g.,

``` automodule--module-name
```

That should work as well.

BTW, I think that at present, non-matching "language" values can cause the fenced blocks to get out of sync (I could provide an example), so I think that invalid language/class/keyword text needs to match the RE.

This has always been the case and is intentional. If you provide bad input, then you get bad output. That is the clue that you did something wrong.

@waylan
Copy link
Member

waylan commented Jun 8, 2020

If I'm understanding what Autostructify is doing under the hood, I expect that something else that would work with no changes being required to any upstream libs would be this:

``` {class="automodule:: module-name"}
```

Of course, the problem with any of these solutions is that they require a different syntax than what recommonmark requires. I expect that sphinx-markdown-parser's goal is to provide a consistent syntax regardless of which tool is being used under-the-hood.

While spelunking around in sphinx-markdown-parser's code, something else I noticed is that they are not using Python-Markdown's code highlighting, but leaving that for docutils/Sphinx to handle (which seems reasonable to me). Therefore, the fenced_code extension seems like it is overkill for them as it has code highlighting built in. They don't need support for all the extra attributes (as Autostructify provides that via the ReST syntax), but they do need support for the single expression to contain a space. However, we need support for multiple attributes to be defined on one line and single expressions to not contain a space. The two sets of goals and needs are in conflict with one another. Given the above, I would think that sphinx-markdown-parser should provide their own fenced code extension which serves their needs more clearly.

I'm going to close this as something which should be implemented by a third-party extension.

As an aside, I've given a lot of thought over the years to how one might support ReST directives in Markdown and I find the Autostructify library interesting. However, it is not what I would have done. It simply uses ReST syntax wrapped up in code blocks. Instead a Markdownesque syntax would be more preferable. I don't know what that would look like exactly, but it is not Autostructify IMO.

@waylan waylan added the 3rd-party Should be implemented as a third party extension. label Jun 8, 2020
@waylan waylan closed this Jun 8, 2020
@wlupton
Copy link
Author

wlupton commented Jun 8, 2020

Your class= example is similar to my rst= example isn't it? (Here "class" is just a random keyword, not a CSS class, right?)

I agree with your comments. Note that I only came across all this a couple of weeks ago (I'm a user here) and I have no axe to grind. There appears to be some history, with AutoStructify coming originally from recommonmark and having been reimplemented in python-markdown-parser. There other areas where it manages to be more directly "markdowny", e.g., it makes a valiant attempt at defining the toc tree in pure markdown.

@waylan
Copy link
Member

waylan commented Jun 8, 2020

Your class= example is similar to my rst= example isn't it? (Here "class" is just a random keyword, not a CSS class, right?)

No, that should be a HTML class attribute. The dot syntax is simply shorthand. However you can define it explicitly, which allows you to use quotes and preserve whitespace.

Although, now that I think about it, maybe that doesn't work. It probably should.

@facelessuser
Copy link
Collaborator

facelessuser commented Jun 8, 2020

In thought the refactor didn't allow setting arbitrary HTML attributes, or an I misunderstanding something? I though it wasn't allowed due to conflict with preexisting setting syntax.

@wlupton
Copy link
Author

wlupton commented Jun 8, 2020

@waylan, isn't class special though? What would this result in?

{#mycode .python .numberLines class="something"}

@waylan
Copy link
Member

waylan commented Jun 8, 2020

Sorry I forgot that key=value pairs do not get used as HTML attributes like they do in the attr_list extension. That being the case neither rst= nor class= would work as they would not be preserved.

@wlupton
Copy link
Author

wlupton commented Jun 8, 2020

Ah, I didn't realise this. I guess hl_lines is special. Is there any way that the attribute handling could be more common across extensions, e.g., I think that pandoc attributes are the same wherever they're supported (and one of the places where they're supported is fenced code blocks). From https://pandoc.org/MANUAL.html:

Optionally, you may attach attributes to fenced or backtick code block using this syntax:

~~~~ {#mycode .haskell .numberLines startFrom="100"}
qsort []     = []
qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++
               qsort (filter (>= x) xs)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Here mycode is an identifier, haskell and numberLines are classes,
and startFrom is an attribute with value 100. Some output formats can
use this information to do syntax highlighting.

[...] the code block above will appear as follows:

<pre id="mycode" class="haskell numberLines" startFrom="100">
  <code>
  ...
  </code>
</pre>

In fact (and this is probably the last thing you want to hear!) is there any way that the fenced code block extension could depend on the attr_list extension for processing its attribute lists?

@waylan
Copy link
Member

waylan commented Jun 8, 2020

@wlupton that is a discussion for #816.

Sent with GitHawk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3rd-party Should be implemented as a third party extension.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants