Skip to content

The lines in tokens from tokenize.generate_tokens incorrectly indicate multiple lines. #104972

Closed
@nedbat

Description

@nedbat

The line attribute in tokens returned by tokenize.generate_tokens incorrectly indicate multiple lines. The tokens should have an invariant that using the .start and .end attributes to index into the .line attribute will produce the .string attribute.

tokbug.py:

import io
import sys
import tokenize

SOURCE = r"""
a + \
b
"""

print(sys.version)
readline = io.StringIO(SOURCE).readline
for tok in tokenize.generate_tokens(readline):
    correct = (tok.string) == (tok.line[tok.start[1]: tok.end[1]])
    print(tok, "" if correct else "<*****!!!")

Run with 3.12.0a7:

% /usr/local/pyenv/pyenv/versions/3.12.0a7/bin/python tokbug.py
3.12.0a7 (main, Apr  5 2023, 05:51:58) [Clang 14.0.3 (clang-1403.0.22.14.1)]
TokenInfo(type=62 (NL), string='\n', start=(1, 0), end=(1, 1), line='\n')
TokenInfo(type=1 (NAME), string='a', start=(2, 0), end=(2, 1), line='a + \\\n')
TokenInfo(type=54 (OP), string='+', start=(2, 2), end=(2, 3), line='a + \\\n')
TokenInfo(type=1 (NAME), string='b', start=(3, 0), end=(3, 1), line='b\n')
TokenInfo(type=4 (NEWLINE), string='\n', start=(3, 1), end=(3, 2), line='b\n')
TokenInfo(type=0 (ENDMARKER), string='', start=(4, 0), end=(4, 0), line='')

Run with 3.12.0b1:

% /usr/local/pyenv/pyenv/versions/3.12.0b1/bin/python tokbug.py
3.12.0b1 (main, May 23 2023, 16:19:59) [Clang 14.0.3 (clang-1403.0.22.14.1)]
TokenInfo(type=65 (NL), string='\n', start=(1, 0), end=(1, 1), line='\n')
TokenInfo(type=1 (NAME), string='a', start=(2, 0), end=(2, 1), line='a + \\\n')
TokenInfo(type=55 (OP), string='+', start=(2, 2), end=(2, 3), line='a + \\\n')
TokenInfo(type=1 (NAME), string='b', start=(3, 0), end=(3, 1), line='a + \\\nb\n') <*****!!!
TokenInfo(type=4 (NEWLINE), string='\n', start=(3, 1), end=(3, 2), line='a + \\\nb\n') <*****!!!
TokenInfo(type=0 (ENDMARKER), string='', start=(4, 0), end=(4, 0), line='')

Related to #104825? cc @pablogsal

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    type-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions