Commit f60c772

Steve Krouse

committed

## Yesterday’s Slice-and-Dice Data Ninja Playground

* TOC {: toc } While watching fireworks the night before last, I couldn't stop thinking about [Flowsheets](https://tinyletter.com/Flowsheets/archive), my [fuzzyset refactor](http://futureofcoding.org/log#email-to-glen-fuzzyset--fp), my [ideas to improve ObservableHQ](http://futureofcoding.org/log#notes-on-observablehq), and for my own [reactive playground](http://futureofcoding.org/log#project-idea-functional-immutable-strongly-typed-notebook). Here's how I put it to my girlfriend: "Glen and I are working on the same problem - code comprehensibility - and with similar approaches - visualizing the live data - but for different types of code. He's working with normal batch code (mostly static input, mostly static output), like data processing, while I'm working with reactive code (which responds to inputs over time), like user interfaces. So my problem is strictly more difficult - possibly even a superset - and is *ever harder*, because I want to allow higher order streams (streams of streams). So it occurs to me: why start with the harder problem? We still as an industry haven't solved the easier problem. If I want to do simple data processing, I don't have any satisfactory options. The easiest to use is probably Excel / Google Sheets, but that's a pain and limited. And then there's Jupyter or ObservableHQ, but those are a nightmare in all sorts of errors and slow feedback loops. ### Let's visualize all the FP list operators! My first thought was to create metaphoric and live visualizations of all the FP list operators in Figma, such as map, filter, fold, find, some, etc. My thesis here is: All list (or stream) operators have highly visible structures that programmers have to simulate in their heads. Why don't we bring those mental visualizations out onto the computer screen as the actual interface? WYSIWYG for list manipulation! For example, here's map: ![image](https://user-images.githubusercontent.com/2288939/42383185-84a45440-8104-11e8-930e-c260cc8c6ced.png) This is literally how functions are taught in school to children. They are a mapping from a domain to a range. Arrows from one list to another. And you could imagine actually writing the map function on any of the arrows and seeing the data update through the arrows. For the fuzzyset problem I’m working on, the visual representation I have in my head is something like: ![image](https://user-images.githubusercontent.com/2288939/42387500-6b5db632-8110-11e8-9c75-2e8222f3f4a1.png) (It was prettier in my head.) However, I was discouraged when I went to the [lodash documentation](https://lodash.com/docs/4.17.10) to see how many of these I'd have to make. Part of my thesis is that there are only a few key primitives you need to do anything you want with lists. (While this is technically true because all you need is fold, it's likely not true when you consider our mental visualizations of these operators: they are more specific. For example, it'd be difficult to detect that I'm doing a map with `foldr (\x xs -> x+1:xs) [] [1,2,3]` to get it to expand to the arrows viz above. Ditto with filter, etc.) However, this isn't a showstopper. Just because there are a couple dozen (maybe even ~100) operators doesn't mean I should go home. It's actually a simple-ish project if I want to commit to it. Here are a few questions I have: 1. What about nested combinators? A filter inside a map? A map inside a fold? What about both? Look at this crazy code: ![image](https://user-images.githubusercontent.com/2288939/42384858-28c12ea0-8109-11e8-81b9-78fa6787d0e8.png) 2. How do you visualize the current values of computations within the body of lambda functions? Maybe this isn’t a real problem (as long as you have concrete data to flow through), or maybe it’s the same problem as (1) above. 3. What about user-defined functions? Do they write their own visualizer? 4. Are the FP combinators the essence of list (or data) combinatorics? What about APL? Sometimes nested lists and maps within lists seem cumbersome. What about SQL? ### How about Luna-lang? I did not have fun with Luna, despite it being squarely centered in all the things I’m into: Haskell, visual coding, live data. For one, Luna only visualize dependencies between values, which are much less rich than I’m thinking in terms of specific metaphoric pictures for each combinator. ![image](https://user-images.githubusercontent.com/2288939/42385339-7be3cf4c-810a-11e8-93a4-d4f605ab0482.png) Secondly, it’s not nearly as live as I’d want. ### How about a spreadsheet? ![image](https://user-images.githubusercontent.com/2288939/42385591-32d183a2-810b-11e8-8603-7cd785a77aa4.png) It was easy to normalize the string (except the regex wasn’t compatible, so that part needs work). There’s no `range(10)` function, so I had to create the series by dragging down `+1` of the previous number manually, which won’t scale. I’m not sure how to get around this. I was surprised to see that there are named ranges now which are similar to variables. The names don’t show up on the sheet (I had to add them manually above as a label) but you can use them in formulas. I did this with `value`, `normalized`, and `gramSize`, but got bored and stopped after a certain point. Splitting into grams as a mapping was very straight forward (except that there’s no substring and I had to use `LEFT` and `RIGHT` - it’s a bummer there’s no function creation either). Grouping and counting was funny - I actually used the `Query` function which is very SQL-like. It worked like a charm, except it didn’t keep the original ordering, which was a bummer. (You can see the query in the bottom of the sheet. In order to try to keep the order, I pulled the counts back up into the original grams list. This works, but then there are duplicates of course. I guess I could first remove the duplicates and then join with the counts, but I didn’t do that. Overall, it’s possible to do stuff in a spreadsheet, but: * the cell position references are the worst (always breaking things) * you need to dynamically creating ranges (not by dragging down) * you need user-defined functions * you need to see both the data and the formula at the same time * you need to be able to visually see the relationships between cells without looking at formulas * you need to be able to export your process to code that can run as a library ### What about Flowsheets? I haven’t had the chance to play with it myself, so I don’t really know, but it looks great. I’d prefer a FP language instead of Python, but doesn’t really matter. I’d be curious how it handles nested data structures. ### What about APL? I spend ~3 hours yesterday on replicating this code in APL. God it was hard! ![image](https://user-images.githubusercontent.com/2288939/42386185-c44df03a-810c-11e8-836f-da12c5068964.png) [Here’s a link to it on tryapl.](https://tryapl.org/#?a=%7B%u237A%2C%u2262%u2375%7D%20%u2338%20%7B%27-mississippi-%27%5B%u2375%5D%7D11%203%u2374%u2283%2C/%7B%u2375+%u23733%7D%A8%AF1+%u237311&run) It took a very long time, it was very frustrating, I barely understand how the code works, and I am still not quite done. I haven't figured out how to abstract over arbitrary word inputs which would allow me to remove the '-mississippi-' string and hardcoded 11's and 3's. One of my biggest complaints about APL is the documentation. It's like it's written in another language! And I wasn't able to find great Stack Overflow support. I’ve printed out Iverson’s Turing lecture on APL so maybe reading that will help. I’m dying out here! ### The grass is always greener - yet again I tried to jump ship (again) to a different problem. I thought maybe it would be easier and help with my intended problem, but *surprise* it’s difficult in its own right. ### Todos 7/6/18 I’m not sure where to go from here... I’m feeling as lost as I did in [my last todos](http://futureofcoding.org/log#next-steps-6218). I guess learning Reflex and building visualizations for it (in Figma to start) is a good next step. (As an aside, I spent about an hour yesterday learning about PureScript which was fascinating. It compiles to readable JS! It does this by not being lazy. The author has a FRP library called Behaviors that’s just ~100 lines of code. It’s neat but not as firm an abstraction as in Reflex, which I prefer. And so I’m stuck with Reflex. But now I know that maybe I can use ghci to get better type instrumentation this time so maybe it’ll be better.) It may be a couple days (or even a week) until I can work here again because I have visa paperwork do to (I may be moving to London) and work for Dark to do. (On the other hand, maybe I’ll procrastinate on those things and be back here sooner.)

1 parent c624aa7 commit f60c772Copy full SHA for f60c772

0 file changed

-0

lines changed

0 file changed

-0

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit f60c772

0 file changed

0 file changed

File tree

0 file changed

0 file changed

0 commit comments