-
Notifications
You must be signed in to change notification settings - Fork 9
Observable is not appropriate for directory listing; async iterable would be better #4
Comments
Correct me if I'm totally wrong on this, but it honestly does seem like both can be used for the primary use cases:
"Observable" lent itself to filtering and mapping. I think the pace at which work is performed is dependent on a few things. Then there's backpressure too, which I'm not 100% clear on :-( But async iterables look totally usable. |
Both can be used, for sure. However, async iterables map more directly (in the sense of giving more control) than observables do. The async iterable to observable transformation is lossy, in the sense that to create an observable from an async iterable you have to pump the async iterable in a loop in order to "emit events" (i.e., call
As proposed to TC39, neither async iterable nor observable has filtering or mapping. But it is equally trivial to add filtering and mapping to either. (I kind of doubt either will happen before we add filtering and mapping to sync iterables though, like maps and sets.)
Unfortunately no. An observable is just a wrapper for a function
An async iterable, on the other hand, consists of an object with a promise-returning My main concern in opening this is that observable is not misused in places where it is a bad fit. It makes sense for events like "click" where the producer drives the production of events ("push"), but it doesn't make sense for things like directory listings where the consumer drives things ("pull"). I know it is the most-evangelized asynchronous-plural type among the trio of observable, async iterable, and behavior, but I wanted to make sure the right tool is used for the job here instead of the most-evangelized one :) |
Evangelizer-in-chief here. ;) It's true that an asynchronous iterator gives the consumer control (back pressure). However it's not clear to me what the motivating use cases are for that control. It is important to spell that out because back pressure support will come at hefty performance cost given the fine granularity of the proposed API. Let's say you did a deep enumeration and saw 1000 files (not unrealistic). You would get a minimum of 1000 promise allocations. A single map operation would double the number of promise allocations to 2000. Assuming you were calling next() again before the previous Promise resolved then you would be growing the job queue as well as notifications were queued up. Comparably an Observable would involve only three allocations: minimally the observer, subscription, and Observable. A map operation would introduce another three allocations + 1 extra allocation for the map closure argument. In other words the allocations would not grow with the number of files being processed. Under the circumstances it seems like it's worth enumerating the motivating cases for back pressure here. If there are compelling use cases, an asynchronous iterator might be made more efficient if we made the granularity of the API more coarse. (AsyncIterator of Promise of Array of File) |
We did a pretty hefty real-world performance comparison for this sort of stuff in streams and found that the cost of promise allocations was absolutely trivial (nanoseconds, in aggregate). The actual I/O is where all the work happens. The motivation for backpressure here is quite simply to avoid the extra I/O---which is where the real cost lies. Your program will be much slower if it is constantly being forced to perform A simple example would be that with an async iterator you could notice that you've used up 4 ms of your 16 ms frame budget on readdirs, and decide to stop asking for results until next frame. (Well, the rendering thread actually gets hit by the IPC cost of getting it from the I/O process into the render process, but you know what I mean.) Whereas with an observable, this isn't under your control. |
In general, making performance arguments without data has led down very dark paths. Talking in the abstract about allocations is quite disingenuous given generational GCs and the like. Calling a function versus allocating a short-lived object and calling its method are in the abstract very different amounts of work, but in the real world very similar ones. So in particular claims like
are much better phrased as "I would be interested in benchmarking file-wise async iterator versus array-of-files async iterator." (Although that particular idea doesn't really make sense in terms of how it fails to map to readdir etc.; the added indirection loses the very control and reactivity we seek to gain.) |
Inclined to agree I/O will be the likely bottleneck here. Also agree that we should rely on performance data. Probably worth measuring perf in our use case rather than extrapolating from data collected in different use cases. Should be easy to confirm that all those Promise allocations are indeed chewed up efficiently by V8 during a directory walk. Otherwise GCs could detract from framerate as well. Do you think a directory walk in Node with AsyncIterator vs. Observable would be a reasonable test? |
Unfortunately Node doesn't expose the necessary primitives here---it only does array-returning directory traversal, and doesn't give you access to readdir and friends. So testing this would require a native addon. I will open a libuv issue to see if we can get those in libuv at least. |
D: if these issue(s) are set up, I'd love a pointer to them. I've trawled libuv/libuv and come up empty. We're going to have to have an interim API in the places where Observable is used, till platform considerations stabilize. I think I'm coming around to the idea that AsyncIterator might be more useful here. It might be confusing if the platform had both, since the specific messaging would be drowned out by "this makes asynchronous programming better and easier." In particular, here's the MSFT proposal for promises returning sequence parametrized types for Directory iteration: I'm going to rewrite the API a bit in Directory (and I'll |
THANK you. |
Pinging @wycats |
I just want to make sure I understand the constraints here:
Did I miss anything? |
@wycats: in comparing my wishlist with your list above, I'd say 1, 2, and 4 are the main ones. I asked for 3. as a bonus, and perhaps made the mistake of assuming that composition came out of the box (Domenic corrects me above). It's a nice to have. Till we get something better, we're using a promise that returns a one-and-done sequence parametrized type for file/directory results. |
@wycats I don't really find reasoning from constraints that compelling in this case. Rather, I'd prefer a primitive that maps directly to the underlying OS operations, thus allowing maximum flexibility. You could even imagine an awkward API that does a direct translation (using some kind of "directory handle" JS object). That IMO would be better than any proposal that tries to fire events at you continuously. It's a nice fact that async iterator maps to the OS operations directly. |
https://github.com/zenparsing/async-iteration/ is where async iterable is proposed.
The difference is that with an observable the creator of the observable defines the pace at which "events" are produced (push), whereas with an async iterable, the consumer defines the pace at which work is performed. The latter is a much better match for filesystem APIs, e.g. Windows's FindFirstFileEx/FindNextFile/CloseFind, or POSIX opendir/readdir. In these cases the async iterable's
.next()
would map pretty directly to FindFirstFileEx/FindNextFile and readdir./cc @zenparsing
The text was updated successfully, but these errors were encountered: