Closed
Description
The line attribute in tokens returned by tokenize.generate_tokens
incorrectly indicate multiple lines. The tokens should have an invariant that using the .start
and .end
attributes to index into the .line
attribute will produce the .string
attribute.
tokbug.py:
import io
import sys
import tokenize
SOURCE = r"""
a + \
b
"""
print(sys.version)
readline = io.StringIO(SOURCE).readline
for tok in tokenize.generate_tokens(readline):
correct = (tok.string) == (tok.line[tok.start[1]: tok.end[1]])
print(tok, "" if correct else "<*****!!!")
Run with 3.12.0a7:
% /usr/local/pyenv/pyenv/versions/3.12.0a7/bin/python tokbug.py
3.12.0a7 (main, Apr 5 2023, 05:51:58) [Clang 14.0.3 (clang-1403.0.22.14.1)]
TokenInfo(type=62 (NL), string='\n', start=(1, 0), end=(1, 1), line='\n')
TokenInfo(type=1 (NAME), string='a', start=(2, 0), end=(2, 1), line='a + \\\n')
TokenInfo(type=54 (OP), string='+', start=(2, 2), end=(2, 3), line='a + \\\n')
TokenInfo(type=1 (NAME), string='b', start=(3, 0), end=(3, 1), line='b\n')
TokenInfo(type=4 (NEWLINE), string='\n', start=(3, 1), end=(3, 2), line='b\n')
TokenInfo(type=0 (ENDMARKER), string='', start=(4, 0), end=(4, 0), line='')
Run with 3.12.0b1:
% /usr/local/pyenv/pyenv/versions/3.12.0b1/bin/python tokbug.py
3.12.0b1 (main, May 23 2023, 16:19:59) [Clang 14.0.3 (clang-1403.0.22.14.1)]
TokenInfo(type=65 (NL), string='\n', start=(1, 0), end=(1, 1), line='\n')
TokenInfo(type=1 (NAME), string='a', start=(2, 0), end=(2, 1), line='a + \\\n')
TokenInfo(type=55 (OP), string='+', start=(2, 2), end=(2, 3), line='a + \\\n')
TokenInfo(type=1 (NAME), string='b', start=(3, 0), end=(3, 1), line='a + \\\nb\n') <*****!!!
TokenInfo(type=4 (NEWLINE), string='\n', start=(3, 1), end=(3, 2), line='a + \\\nb\n') <*****!!!
TokenInfo(type=0 (ENDMARKER), string='', start=(4, 0), end=(4, 0), line='')
Related to #104825? cc @pablogsal