Extra characters erroneously matched when using possessive quantifier with negative lookahead

# Bug report

Regular expressions that combine a possessive quantifier with a negative lookahead match extra erroneous characters in re module 2.2.1 of Python 3.11. (The test was run on Windows 10 using the official distribution of Python 3.11.0.)

For example, the following regular expression aims to match consecutive characters that are not 'C' in string 'ABC'. (There are simpler ways to do this, but this is just an example to illustrate the problem.)

```
import re

text = 'ABC'
print('Possessive quantifier, negative lookahead:', 
      re.findall('(((?!C).)++)', text))
```

Output:
```
Possessive quantifier, negative lookahead: [('ABC', 'B')]
```

The first subgroup of the match is the entire match, while the second subgroup is the last character that was matched. They should be 'AB' and 'B', respectively. While the last matched character is correctly identified as 'B', the complete match is erroneously set to 'ABC'.

Replacing the negative lookahead with a positive lookahead eliminates the problem:

```
print('Possessive quantifier, positive lookahead:',
      re.findall('(((?=[^C]).)++)', text))
```

Output:
```
Possessive quantifier, positive lookahead: [('AB', 'B')]
```

Alternately, keeping the negative lookahead but replacing the possessive quantifier with a greedy quantifier also eliminates the problem:

```
print('Greedy quantifier, negative lookahead:',
      re.findall('(((?!C).)+)', text))
```

Output:
``` 
Greedy quantifier, negative lookahead: [('AB', 'B')]
```

While this example uses the ++ quantifier, the *+ and ?+ quantifiers exhibit similar behaviour. Also, using a longer pattern in the negative lookahead leads to even more characters being erroneously matched.

Thank you for adding possessive quantifiers to the re module! It is a very useful feature!

# Environment

- re module 2.2.1 in standard library
- CPython versions tested on: 3.11.0
- Operating system and architecture: Windows 10


### Linked PRs
* gh-102612
* gh-108003
* gh-108004

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Extra characters erroneously matched when using possessive quantifier with negative lookahead #100061

Bug report

Environment

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Extra characters erroneously matched when using possessive quantifier with negative lookahead #100061

Description

Bug report

Environment

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions