Skip to content

generate_tokens tokenizes $ differently with Python 3.12 than earlier #104802

Closed as not planned
@pekkaklarck

Description

@pekkaklarck

I tested Python 3.12 beta 1 with Robot Framework and noticed that tokenize.generate_tokens() handles expressions containing $ differently than earlier. Earlier $ yielded ERRORTOKEN but nowadays we get OP:

Python 3.11.3 (main, Apr  5 2023, 14:15:06) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tokenize import generate_tokens
>>> from io import StringIO
>>> next(generate_tokens(StringIO('$x').readline))
TokenInfo(type=60 (ERRORTOKEN), string='$', start=(1, 0), end=(1, 1), line='$x')
Python 3.12.0b1 (main, May 22 2023, 23:31:26) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tokenize import generate_tokens
>>> from io import StringIO
>>> next(generate_tokens(StringIO('$x').readline))
TokenInfo(type=55 (OP), string='$', start=(1, 0), end=(1, 1), line='$x\n')

We support Python evaluation with special variables like $var > 1 in Robot Framework data and this change breaks our tokenizing code. I didn't notice anything related in the release notes and decided to report this. If the change is intentional, we can easily update our code to handle also these semantics.

Notice also that there's a small change with TokenInfo.line above. With Python 3.12 there's an additional \n even though the original string didn't contain any newlines.

Metadata

Metadata

Labels

interpreter-core(Objects, Python, Grammar, and Parser dirs)type-bugAn unexpected behavior, bug, or error

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions