Skip to content

BUG: .extractall() throws AssertionError if capture group length > 1 #13382

Closed
@GeraintDuck

Description

@GeraintDuck

Code to replicate error:

import pandas as pd
s = pd.Series(["a13a23", "b13", "c13"], index=["A", "B", "C"])
s.str.extractall("[ab](\d\d)")

Note that the regex [ab](\d) from the documentation page works, whereas [ab](\d\d) above doesn't. It seems that any captured group with a length of > 1 causes this error.

Though playing with this a bit more, the following regex's all seem to work correctly without error:

([ab])(\d\d)
()[ab](\d+)
(a13)(\d\d)

I've reproduced the issue in both versions 0.18.0 and 0.18.1. I'll admit I've not checked against the master branch though.

Note: I posted this to the mailing list, but haven't had any responses - thus I assume this is a bug.
I'm unsure what the underlying cause is here (maybe it doesn't like the first regex character not being within a capture group?).

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugStringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions