Skip to content

Incorrect optimization in itertools.tee() #123884

Closed
@rhettinger

Description

@rhettinger

Bug description:

To save a memory allocation, the code path for a tee-in-a-tee incorrectly reuses the outer tee object as the first tee object in the result tuple. This is incorrect. All tee objects in the result tuple should have the same behavior. They are supposed to be "n independent iterators". However, the first one is not independent and it has different behaviors from the others. This is an unfortunate side-effect of an early incorrect optimization. I've now seen this affect real code. It surprising, unhelpful, undocumented, and hard to debug.

Demonstration:

from itertools import tee

def demo(i):
    it = iter('abcdefghi')
    [outer_tee] = tee(it, 1)
    inner_tee = tee(outer_tee, 10)[i]
    return next(inner_tee), next(outer_tee)

print('These should all give the same result:')
for i in range(10):
    print(i, demo(i))

This outputs:

These should all give the same result:
0 ('a', 'b')
1 ('a', 'a')
2 ('a', 'a')
3 ('a', 'a')
4 ('a', 'a')
5 ('a', 'a')
6 ('a', 'a')
7 ('a', 'a')
8 ('a', 'a')
9 ('a', 'a')

There is a test for the optimization -- it wasn't an accident. However, the optimization itself is a bug against the published specification in the docs and against general expectations.

        a, b = tee('abc')
        c, d = tee(a)
        self.assertTrue(a is c)

Linked PRs

Metadata

Metadata

Assignees

Labels

3.12only security fixes3.13bugs and security fixes3.14bugs and security fixestype-bugAn unexpected behavior, bug, or error

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions