Possible performance improvement in email parsing

PyPy received the following performance bug today: https://foss.heptapod.net/pypy/pypy/-/issues/3961

Somebody who was trying to process a lot of emails from an mbox file was complaining about terrible performance on PyPy. The problem turned out to be fact that `email.feedparser.FeedParser._parsegen` is compiling a new regular expression for every multipart message in the mbox file. On PyPy this is particularly bad, because those regular expressions are jitted and that costs even more time. However, even on CPython compiling these regular expressions takes a noticeable portion of the benchmark.

I [fixed this problem in PyPy](https://foss.heptapod.net/pypy/pypy/-/commit/ac270e3701e29024d4098cd0e674cd1fe30a751f) by simply using `str.startswith` with the multipart separator, followed by a generic regular expression that can be used for arbitrary boundaries. In PyPy this helps massively, but in CPython it's still a 20% performance improvement. Will open a PR for it.


### Linked PRs
* gh-106629

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Possible performance improvement in email parsing #106628

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Possible performance improvement in email parsing #106628

Description

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions