Description
What version of regex are you using?
Latest
If it isn't the latest version, then please upgrade and check whether the bug
is still present.
Describe the bug at a high level.
Because regex_syntax is lazily using char::from_u32
not all valid unicode code points are parsed, and this prevents valid regex's from compiling.
Give a brief description of the actual problem you're observing.
Rust defines char as a "Unicode scalar value" and explicitly states that it's similar but not the same as a unicode code point.
The parser is supposed to extract all code points as documented above the function:
https://github.com/rust-lang/regex/blob/master/regex-syntax/src/ast/parse.rs#L1611
What is the expected behavior?
I expect this crate to include custom logic for validating code points, instead relying on char::from_u32
which omits valid code points/surrogate values because they aren't considered scalar values.
Javascript and several other regex engines can handle these fine.