-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Add unfoldr
to Iterators
#43203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add unfoldr
to Iterators
#43203
Conversation
a20a8b8
to
edf3baa
Compare
Forgot to export the function and forgot to import it in the doctest. Force pushed to fix those. Deliberately used |
What is a good example where this is much cleaner/simpler than a generator expression? Hmm, I guess this is analogous to a julia> collect(unfoldr(x -> x==0 ? nothing : (x % 2, x ÷ 2), 23))
5-element Vector{Int64}:
1
1
1
0
1 cc @tpapp |
Yes, the point of this is to express while loops. The example in the doctest that replicates |
I think it's fun that you picked basically the same problem as I showed in the docstring :) julia> # Same output as digits(6, base=2):
julia> unfoldr(6) do n
n ≠ 0 ? reverse(divrem(n, 2)) : nothing
end |> collect
3-element Vector{Int64}:
0
1
1 |
Ceteris paribus, I think All things aren't quite equal, though.
I cannot find the Elm one. In the case of Haskell, the same argument for |
Elm's popular List-Extra module defines I also think that |
I think that since Julia is not a purely functional language like Haskell, functionality like this has a better home in a package. |
I wasn't sure to start with, but I think this is a good addition to Iterators. Here are some reasons:
Here's some code comparing # This
collect(unfoldr(f, initialstate))
# Is roughly equivalent to this:
acc = []
state = initialstate
while true
x = f(state)
x ≡ nothing && break
element, state = x
push!(acc, element)
end
# But the unfoldr version: defines fewer variable bindings; produces a more
# strictly typed vector; and can easily be made lazy by removing the collect() # You can rewrite eager while loops like this:
acc = []
state = initialstate
while predicate(state)
# Create an element from state
element = # ...
# Rebind or mutate state
state = # ...
push!(acc, element)
end
# to a lazy iterator that looks like this:
itr = unfoldr(initialstate) do state
predicate(state) || return nothing
# Create an element from state
element = # ...
# Rebind or mutate state
state = # ...
return element, state
end I think there is little cost to adding this because the definition is very simple so there's little maintainer effort required. I don't see any other notable risks. |
I've renamed this to This is ready for a PkgEval run, I think. If that passes, this is good for triage / merge. If that fails, perhaps we should rename, then triage. |
|
||
# Extended help | ||
|
||
The interface for `f` is very similar to the interface required by `iterate`, but `unfold` is simpler to use because it does not require you to define a type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the interface exactly the same?
By the implementation, the requirements of unfold
are at least as strict as those of iterate
, so the question is really whether they are more strict. They aren't, are they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iterate
receives a struct as the first argument, whereas f
just receives the state, so it's a bit different. If you define iterate
you have to provide two methods instead of one as well:
iterate(itr::YourType) = # ...
iterate(itr::YourType, state) = # ...
The return type is the same, though.
Isn't this kind of the same pattern as julia> c = Channel(0) do chan
x = 23
while !iszero(x)
push!(chan, x % 2) # yield point equivalent
x ÷= 2
end
end
Channel{Any}(0) (1 item available)
julia> collect(c)
5-element Vector{Any}:
1
1
1
0
1 ? I guess the input & loop is hardcoded here, but the |
I had forgotten about I could maybe be persuaded, but I'm also doubtful about the benefit of making unfold behave like a channel when there are multiple consumers. I would expect an iterator from Iterators not to be stateful unless documented so (and unfold will not be stateful unless the user chooses to mutate the state), so I'd expect each consumer to get the same sequence of values. Does that make sense? Do you agree? |
The I mainly wanted to mention that this pattern seemed familiar and wanted to drill down what the difference/tradeoff is, so I'm not opposed to this in principle. |
Ye, unfold doesn't currently have any way of specifying an eltype for the iterator. I thought about adding one, but I wanted to get approval or rejection on the basic version and idea before complicating the API with You can always wrap an iterator in a Channel straightforwardly with this, tho obviously the ergonomics aren't so great with the nested Channel(c -> foreach(i -> push!(c, i), itr), 0) |
My other concern is that wrapping it in a channel might make it slow. But I don't have any benchmarks, so I don't even know if the present implementation is slow (tho I don't see why it should be). |
I don't see any particular reason to consider Channels here; after all any iterator can be turned into a channel in the same way. |
struct Unfold{F, T} | ||
f::F | ||
init::T | ||
end | ||
|
||
iterate(uf::Unfold, state = uf.init) = uf.f(state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The iteration protocol is not normally supposed to have an init
state. Perhaps this would work to easily provide both versions (with and without an init
argument):
struct Unfold{F, T} | |
f::F | |
init::T | |
end | |
iterate(uf::Unfold, state = uf.init) = uf.f(state) | |
struct Unfold{F, T} | |
f::F | |
init::T | |
Unfold{F, T}(f::F, init::T) where {F, T} = new{F, T}(f, init) | |
Unfold(f::F, init::T) where {F, T} = new{F, T}(f, init) | |
Unfold{F}(f::F) where {F} = new{F, Union{}}(f) | |
Unfold(f::F) where {F} = new{F, Union{}}(f) | |
end | |
iterate(uf::Unfold) = isdefined(uf, :init) ? uf.f(uf.init) : uf.f() | |
iterate(uf::Unfold, state) = uf.f(state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can also be done by removing the init
field, and using rest(unfold(f), init)
if you want an initial state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to do this if it's desired, but I'd like to point out that all of the other implementations of unfold
that I linked above require a "seed" or "initial" argument. I think it will be quite rare that unfold(f)
is wanted rather than unfold(f, init)
and I would personally prefer that we only define unfold(f, init)
.
I think Jeff's suggestion of using rest
is good, so this is how I would implement it if you are sure you want it:
struct Unfold{F}
f::F
end
unfold(f) = Unfold(f)
unfold(f, init) = rest(unfold(f), init)
iterate(uf::Unfold) = uf.f()
iterate(uf::Unfold, state) = uf.f(state)
And with this text added to the docstring:
If
init
is not provided,f
will be called with no arguments on the first iteration, then as normal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While iterate
does expect two methods to be defined per-type, I think unfold
is still a useful prototyping tool even if its f
always receives an initial value.
I think that because you can move the arguments around to always have a state:
iterate(it::X) => f((it::X,),)
iterate(it::X, state) => f((it::X, state),)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JeffBezanson Could you decide here if you'd like what you described or if you were swayed by my argument that other implementations of unfold
require an init
value? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think init
is nice in that f
only needs one method accepting one argument. So I'm ok with that version.
Triage says this is pretty cool, but probably needs a few minor changes before merge. Also, the name should probably be |
I didn't stop to ask if I should.
Inspired by Tamas_Papp's work here: https://discourse.julialang.org/t/julia-equivalence-of-haskells-unfoldr/17834 and a student on exercism.
Searching github, the string
unfoldr
is not used in any Julia code on github before this, so the name doesn't clash. We could also useunfold
, but that's more likely to clash with something.In other languages, Elm and Haskell use
unfoldr
. Elixir, Ocaml and Rust (and this) useunfold
. Clojure doesn't have it, but a third-party library usesunfold
.