Skip to content

Re-use compiled regex for block level checks #1169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 9, 2021

Conversation

HebaruSan
Copy link
Contributor

Problem

This page takes several seconds to load:

Cause

We've hit a somewhat exceptional case in that there are 76+ different markdown documents to render to HTML for that page (the author of the mod has been very prolific with creating releases, see the "changelog" list at the bottom).

With profiling enabled, it turns out that about one second is spent (re-)compiling a regex inside of isblocklevel, once per markdown document:

image

Other ideas

We (site developers) will probably take a number of other actions to address this slowness, such as paginating the releases and pre-rendering the markdown to HTML when saving to SQL rather than at render. But that regex still represents some low-hanging fruit for other users with many documents to render.

Changes

Now that regex is compiled once at startup and re-used in compiled form as needed. Should speed up rendering multiple documents modestly.

FYI to @DasSkelett, going to see how this goes.

@HebaruSan
Copy link
Contributor Author

Which changelog file should I update? I see one recent commit editing index.md and another editing release-3.3.md.

4b20c16

e11cd25

@mitya57
Copy link
Collaborator

mitya57 commented Aug 8, 2021

Please edit docs/change_log/index.md. The second commit is wrong (Cc @facelessuser; and 0e6dc4c was wrong too…)

By the way, thanks for the change. Compiling regular expressions is almost always the right thing to do.

@facelessuser
Copy link
Collaborator

@mitya57 thanks for pinging me, I'll get that fixed. I always forget how the changelogs work.

@waylan waylan merged commit 663a647 into Python-Markdown:master Aug 9, 2021
@HebaruSan HebaruSan deleted the fix/regex-compile branch August 9, 2021 14:22
@remusao
Copy link

remusao commented Nov 17, 2021

By the way, thanks for the change. Compiling regular expressions is almost always the right thing to do.

Sorry for jumping in but this part got me curious. As far as I know CPython will always cache[1] the compiled regexps internally so using compile and storing globally should not be the right thing to do (maybe unless there are enough different regexps in the program that the internal cache is too small to hold all of them). I am a bit surprised that you got a performance improvement from this change. Can you confirm that the slowness initially reported is fixed after this change was applied?

[1] https://stackoverflow.com/a/12514276

@mitya57
Copy link
Collaborator

mitya57 commented Nov 18, 2021

Python-Markdown itself has around 70 various regular expressions. grep -Er 're\.(sub|match|compile|search)' markdown | wc -l returns 79 for me. CPython's re._MAXCACHE is 512 by default.

Small projects are unlikely to reach that limit. But projects may import other modules with regular expressions, not only markdown. And in such case this limit will be reached sooner or later.

So I think even if compiling is not always useful, in some cases it is, and because it causes no harm, why not use it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants