Skip to content

Add unfoldr to Iterators #43203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Add unfoldr to Iterators #43203

wants to merge 4 commits into from

Conversation

cmcaine
Copy link
Contributor

@cmcaine cmcaine commented Nov 23, 2021

I didn't stop to ask if I should.

Inspired by Tamas_Papp's work here: https://discourse.julialang.org/t/julia-equivalence-of-haskells-unfoldr/17834 and a student on exercism.

Searching github, the string unfoldr is not used in any Julia code on github before this, so the name doesn't clash. We could also use unfold, but that's more likely to clash with something.

In other languages, Elm and Haskell use unfoldr. Elixir, Ocaml and Rust (and this) use unfold. Clojure doesn't have it, but a third-party library uses unfold.

@cmcaine cmcaine force-pushed the unfoldr branch 2 times, most recently from a20a8b8 to edf3baa Compare November 24, 2021 10:33
@cmcaine
Copy link
Contributor Author

cmcaine commented Nov 24, 2021

Forgot to export the function and forgot to import it in the doctest. Force pushed to fix those. Deliberately used using Base.Iterators: unfoldr in the doctest to show a different way of using names from Iterators (other docstrings use a qualified name: e.g. Iterators.accumulate)

@stevengj
Copy link
Member

stevengj commented Nov 24, 2021

What is a good example where this is much cleaner/simpler than a generator expression?

Hmm, I guess this is analogous to a while loop, vs. the for loop of a generator. For example, this is analogous to bitstring and would have to use a completely different algorithm for a for loop:

julia> collect(unfoldr(x -> x==0 ? nothing : (x % 2, x ÷ 2), 23))
5-element Vector{Int64}:
 1
 1
 1
 0
 1

cc @tpapp

@cmcaine
Copy link
Contributor Author

cmcaine commented Nov 24, 2021

Yes, the point of this is to express while loops.

The example in the doctest that replicates digits is kinda motivating. If you want to do that with a while loop then you'd create a vector and then push!, but working out the type of that vector is a bit tricky (convince yourself that typeof(rem(n, d)) == typeof(n) or unroll the first iteration and make a vector of the type of the result of that iteration) whereas collect(unfoldr(... will automatically sort it out for you.

@cmcaine
Copy link
Contributor Author

cmcaine commented Nov 24, 2021

I think it's fun that you picked basically the same problem as I showed in the docstring :)

julia> # Same output as digits(6, base=2):
julia> unfoldr(6) do n
           n  0 ? reverse(divrem(n, 2)) : nothing
       end |> collect
3-element Vector{Int64}:
 0
 1
 1

@MatthijsBlom
Copy link

MatthijsBlom commented Nov 24, 2021

We could also use unfold,

Ceteris paribus, I think unfold is a better fit, as there is no inherent direction to this implementation.

All things aren't quite equal, though. unfold has 373 hits. Most aren't relevant, so it looks like it shouldn't matter much.

Elm and Haskell use unfoldr.

I cannot find the Elm one. In the case of Haskell, the same argument for unfold sort-of applies.

@cmcaine
Copy link
Contributor Author

cmcaine commented Nov 24, 2021

Elm's popular List-Extra module defines unfoldr: https://package.elm-lang.org/packages/elm-community/list-extra/latest/List-Extra#unfoldr

I also think that unfold is a better name. Guess we could try PkgEval to see if anything breaks?

@tpapp
Copy link
Contributor

tpapp commented Nov 24, 2021

I think that since Julia is not a purely functional language like Haskell, functionality like this has a better home in a package.

@cmcaine
Copy link
Contributor Author

cmcaine commented Nov 24, 2021

I wasn't sure to start with, but I think this is a good addition to Iterators. Here are some reasons:

  1. It's a quick and simple way to write one-off or prototype iterators (unfoldr is very close to the iterator protocol, but handles mysterious stuff like IteratorSize for you).
  2. It makes it easier to write e.g. a lazy version of things like digits or split: without unfoldr you have to go through the ceremony of writing a new iterator. Writing the eager version with push! is much more straightforward, so most people will do that, but maybe that's not actually very good for us?
  3. You cannot rewrite an unfoldr expression using a while loop because we don't have generator functions like python.
  4. It's an answer to people asking about an equivalent to python generator functions (something that comes up on slack/discourse etc)

Here's some code comparing while and unfoldr:

# This
collect(unfoldr(f, initialstate))

# Is roughly equivalent to this:
acc = []
state = initialstate
while true
    x = f(state)
    x  nothing && break
    element, state = x
    push!(acc, element)
end

# But the unfoldr version: defines fewer variable bindings; produces a more
# strictly typed vector; and can easily be made lazy by removing the collect()
# You can rewrite eager while loops like this:

acc = []
state = initialstate
while predicate(state)
    # Create an element from state
    element = # ...
    # Rebind or mutate state
    state = # ...
    push!(acc, element)
end

# to a lazy iterator that looks like this:

itr = unfoldr(initialstate) do state
    predicate(state) || return nothing
    # Create an element from state
    element = # ...
    # Rebind or mutate state
    state = # ...
    return element, state
end

I think there is little cost to adding this because the definition is very simple so there's little maintainer effort required. I don't see any other notable risks.

@cmcaine
Copy link
Contributor Author

cmcaine commented Nov 25, 2021

I've renamed this to unfold, added an entry to NEWS.md and expanded the docstring.

This is ready for a PkgEval run, I think.

If that passes, this is good for triage / merge.

If that fails, perhaps we should rename, then triage.


# Extended help

The interface for `f` is very similar to the interface required by `iterate`, but `unfold` is simpler to use because it does not require you to define a type.
Copy link

@MatthijsBlom MatthijsBlom Nov 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the interface exactly the same?

By the implementation, the requirements of unfold are at least as strict as those of iterate, so the question is really whether they are more strict. They aren't, are they?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterate receives a struct as the first argument, whereas f just receives the state, so it's a bit different. If you define iterate you have to provide two methods instead of one as well:

iterate(itr::YourType) = # ...
iterate(itr::YourType, state) = # ...

The return type is the same, though.

@Seelengrab
Copy link
Contributor

Seelengrab commented Nov 29, 2021

Isn't this kind of the same pattern as

julia> c = Channel(0) do chan
           x = 23
           while !iszero(x)
               push!(chan, x % 2) # yield point equivalent
               x ÷= 2
           end
       end
Channel{Any}(0) (1 item available)

julia> collect(c)
5-element Vector{Any}:
 1
 1
 1
 0
 1

? I guess the input & loop is hardcoded here, but the Channel approach automatically handles multiple consumers due to its locking & task creation behavior. So if this is merged, I'd be in favor of a very small definition like unfold(f, x) = Channel(c -> f(c, x), 0) or similar instead of a custom iterator.

@cmcaine
Copy link
Contributor Author

cmcaine commented Nov 29, 2021

I had forgotten about Channel. I'm not really in favour of changing the interface of f because right now it matches implementations in other languages quite well and it also matches our iterator spec well.

I could maybe be persuaded, but I'm also doubtful about the benefit of making unfold behave like a channel when there are multiple consumers. I would expect an iterator from Iterators not to be stateful unless documented so (and unfold will not be stateful unless the user chooses to mutate the state), so I'd expect each consumer to get the same sequence of values.

Does that make sense? Do you agree?

@Seelengrab
Copy link
Contributor

Seelengrab commented Nov 29, 2021

The Channel approach seems more flexible to me, but I can also see why an iterator may be helpful/useful 🤷 Parallelization with the Channel is trivial, while the unfold approach here makes that kind of hard, having to deal with locking etc. all over again. In general, I personally prefer things that are easily parallelizable without spending mental effort on how to best do that. If you're certain what the function passed to Channel will return, you can also make the eltype known via Channel{T}, which I don't think you can explicitly do with Unfold as it is now, correct?

I mainly wanted to mention that this pattern seemed familiar and wanted to drill down what the difference/tradeoff is, so I'm not opposed to this in principle.

@cmcaine
Copy link
Contributor Author

cmcaine commented Nov 29, 2021

Ye, unfold doesn't currently have any way of specifying an eltype for the iterator. I thought about adding one, but I wanted to get approval or rejection on the basic version and idea before complicating the API with unfold(f, init; eltype=T) or unfold(f, T, init) or whatever.

You can always wrap an iterator in a Channel straightforwardly with this, tho obviously the ergonomics aren't so great with the nested do block.

Channel(c -> foreach(i -> push!(c, i), itr), 0)

@cmcaine
Copy link
Contributor Author

cmcaine commented Nov 29, 2021

My other concern is that wrapping it in a channel might make it slow. But I don't have any benchmarks, so I don't even know if the present implementation is slow (tho I don't see why it should be).

@JeffBezanson JeffBezanson added the triage This should be discussed on a triage call label Nov 29, 2021
@JeffBezanson
Copy link
Member

I don't see any particular reason to consider Channels here; after all any iterator can be turned into a channel in the same way.

Comment on lines +1498 to +1503
struct Unfold{F, T}
f::F
init::T
end

iterate(uf::Unfold, state = uf.init) = uf.f(state)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iteration protocol is not normally supposed to have an init state. Perhaps this would work to easily provide both versions (with and without an init argument):

Suggested change
struct Unfold{F, T}
f::F
init::T
end
iterate(uf::Unfold, state = uf.init) = uf.f(state)
struct Unfold{F, T}
f::F
init::T
Unfold{F, T}(f::F, init::T) where {F, T} = new{F, T}(f, init)
Unfold(f::F, init::T) where {F, T} = new{F, T}(f, init)
Unfold{F}(f::F) where {F} = new{F, Union{}}(f)
Unfold(f::F) where {F} = new{F, Union{}}(f)
end
iterate(uf::Unfold) = isdefined(uf, :init) ? uf.f(uf.init) : uf.f()
iterate(uf::Unfold, state) = uf.f(state)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also be done by removing the init field, and using rest(unfold(f), init) if you want an initial state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to do this if it's desired, but I'd like to point out that all of the other implementations of unfold that I linked above require a "seed" or "initial" argument. I think it will be quite rare that unfold(f) is wanted rather than unfold(f, init) and I would personally prefer that we only define unfold(f, init).

I think Jeff's suggestion of using rest is good, so this is how I would implement it if you are sure you want it:

struct Unfold{F}
    f::F
end

unfold(f) = Unfold(f)
unfold(f, init) = rest(unfold(f), init)

iterate(uf::Unfold) = uf.f()
iterate(uf::Unfold, state) = uf.f(state)

And with this text added to the docstring:

If init is not provided, f will be called with no arguments on the first iteration, then as normal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While iterate does expect two methods to be defined per-type, I think unfold is still a useful prototyping tool even if its f always receives an initial value.

I think that because you can move the arguments around to always have a state:

iterate(it::X) => f((it::X,),)
iterate(it::X, state) => f((it::X, state),)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JeffBezanson Could you decide here if you'd like what you described or if you were swayed by my argument that other implementations of unfold require an init value? Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think init is nice in that f only needs one method accepting one argument. So I'm ok with that version.

@oscardssmith
Copy link
Member

Triage says this is pretty cool, but probably needs a few minor changes before merge. Also, the name should probably be unfold not unfoldr. We are very unlikely to want the unfoldl version, so the simpler name is better.

@vtjnash vtjnash removed the triage This should be discussed on a triage call label Dec 2, 2021
nlw0 added a commit to nlw0/julia that referenced this pull request Jul 24, 2022
@MatthijsBlom
Copy link

@cmcaine In case you missed it: this PR might be subsumed by #44873.

@nsajko nsajko added the iteration Involves iteration or the iteration protocol label Jan 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
iteration Involves iteration or the iteration protocol
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants