Generator expressions behavior with non-iterable objects

I’ve started some discussion related to this topic on Oct 2024 here:

There was a crash with these three lines of code:

g = (x for x in range(10))
g.gi_frame.f_locals['.0'] = range(20)
list(g)

There were various suggestions for fixing this crash.
Mark Shannon (@markshannon) comes with this solution (gh-125038: Crash after genexpr.gi_frame.f_locals manipulations is fixed by efimov-mikhail · Pull Request #125178 · python/cpython · GitHub),
(gh-125038: Crash after genexpr.gi_frame.f_locals manipulations is fixed by efimov-mikhail · Pull Request #125178 · python/cpython · GitHub):

Currently, the bytecode compiler assumes that local variables in generator expressions cannot be modified in.
This is no longer true, but it remains true that the evaluation stack cannot be modified.

The bytecode for (x for x in ...) starts with:

      0 RETURN_GENERATOR
      2 POP_TOP
      4 RESUME                   0
      6 LOAD_FAST                0 (.0)
>>    8 FOR_ITER                 6 (to 24)

Compare that with the code for for x in ....:

      0 RESUME                   0
      2 LOAD_FAST                0 (...)
      4 GET_ITER
>>    6 FOR_ITER                 4 (to 18)
      10 STORE_FAST               1 (x)

Note the additional GET_ITER as the bytecode compiler doesn’t know what ... is.

I think adding the GET_ITER before the FOR_ITER would gain the robustness required in this case, without complicating the iteration code.

We also want to avoid emitting GET_ITER twice. It is currently emitted in the scope surrounding the generator expression.
We should move it into the generator expression.

In other words, this function:

def f(s): return (x for x in s)

currently generates the following bytecode:

  1           RESUME                   0

  2           LOAD_CONST               1 (<code object <genexpr> at 0x7fc1c815e6a0, file "<python-input-0>", line 2>)
              MAKE_FUNCTION
              LOAD_FAST_TEMP           0 (s)
              GET_ITER
              CALL                     0
              RETURN_VALUE

Disassembly of <code object <genexpr> at 0x7fc1c815e6a0, file "<python-input-0>", line 2>:
   2           RETURN_GENERATOR
               POP_TOP
       L1:     RESUME                   0
               LOAD_FAST                0 (.0)
       L2:     FOR_ITER                 6 (to L3)
               ...

I think it should generate:

  1           RESUME                   0

  2           LOAD_CONST               1 (<code object <genexpr> at 0x7fc1c815e6a0, file "<python-input-0>", line 2>)
              MAKE_FUNCTION
              LOAD_FAST_TEMP           0 (s)
              CALL                     0
              RETURN_VALUE

Disassembly of <code object <genexpr> at 0x7fc1c815e6a0, file "<python-input-0>", line 2>:
   2           RETURN_GENERATOR
               POP_TOP
       L1:     RESUME                   0
               LOAD_FAST                0 (.0)
               GET_ITER
       L2:     FOR_ITER                 6 (to L3)
               ...

This is a behavior change, but I think a correct one. If f(2) is called, we get a TypeError, but this seems wrong. The error should occur once the generator expression is iterated over.

One key thing that was used in solution for 3.13 is fact that GET_ITER is idempotent (gh-125038: Crash after genexpr.gi_frame.f_locals manipulations is fixed by efimov-mikhail · Pull Request #125178 · python/cpython · GitHub):

As @Jelle points out, GET_ITER is idempotent, so we can drop CHECK_ITER and use GET_ITER instead.

For 3.14, we can then drop the GET_ITER in the caller.

As we can see from this issue, someone can write a code which reacts on GET_ITER as it is NOT idempotent:

There were comment by Mark Shannon (see Regression of 3.13.1 with iterator creation being duplicated · Issue #127682 · python/cpython · GitHub):

The problem is specific to generator expressions which have acted in a subtly different way from generators that (almost) no one noticed.

For a generator, any iteration occurs once the generator is executed. For generator expressions creating an iterator from the iterable happened when the generator expression is created. However there was no check that the iteration variable held an iterator when the generator expression executed, which could lead to a crash in exceptional circumstances.

This can be seen by disassembling this function:

def f(seq): return (x for x in seq)

Up to 3.13

  2           LOAD_CONST               0 (<code object <genexpr> at 0x7f1ad8c25be0, file "<python-input-9>", line 2>)
              MAKE_FUNCTION
              LOAD_FAST                0 (seq)
              GET_ITER
              CALL                     0
              RETURN_VALUE

Disassembly of <code object <genexpr> at 0x7f1ad8c25be0, file "<python-input-9>", line 2>:
   2           RETURN_GENERATOR
               POP_TOP
       L1:     RESUME                   0
               LOAD_FAST                0 (.0)
       L2:     FOR_ITER                 6 (to L3)       # This is unsafe if .0 contains a non-iterator
               ...

Now:

  2           LOAD_CONST               0 (<code object <genexpr> at 0x7f1ad8c25be0, file "<python-input-9>", line 2>)
              MAKE_FUNCTION
              LOAD_FAST                0 (seq)
              GET_ITER
              CALL                     0
              RETURN_VALUE

Disassembly of <code object <genexpr> at 0x7f1ad8c25be0, file "<python-input-9>", line 2>:
   2           RETURN_GENERATOR
               POP_TOP
       L1:     RESUME                   0
               LOAD_FAST                0 (.0)
               GET_ITER
       L2:     FOR_ITER                 6 (to L3)
               ...

What I would like:

  2           LOAD_CONST               0 (<code object <genexpr> at 0x7f1ad8c25be0, file "<python-input-9>", line 2>)
              MAKE_FUNCTION
              LOAD_FAST                0 (seq)
              CALL                     0
              RETURN_VALUE

Disassembly of <code object <genexpr> at 0x7f1ad8c25be0, file "<python-input-9>", line 2>:
   2           RETURN_GENERATOR
               POP_TOP
       L1:     RESUME                   0
               LOAD_FAST                0 (.0)
               GET_ITER
       L2:     FOR_ITER                 6 (to L3)
               ...

I think we should make generator expressions behave exactly like generators. It seems surprising that they would not.

And solution that he was provided:

Another related issues and PRs:

In fact, today we have slightly different behavior for 3.13, 3.14 and main/3.15 branches.
One can find an arguments for some ways from this comment and beyond:

We have to decide, which behavior is best for 3.15+.
Should we raise TypeError instantly for (x for x in None) or it’s better to raise TypeError only when this generator expression will be iterated over?

There was my another atempt to provide better solution here:

But there was not enough contention and that PR was closed by myself.

P.S. I’m not sure about topic organization and hope it’s not very hard for reading.
I’ll be happy to improve it and provide any needed information.

A little joke.

3 Likes