Description
Filing this to collect some ideas for future work, only one person I know of has tried this and probably worked around it by using the compiler directly.
Problem
If you use LLVM TableGen files like Target.td
in a notebook, the llvm-tblgen
output is > 320,000 lines. This breaks the limit Jupyter sets and removing that limit likely makes the client crash.
You might do this if you wanted to make a notebook about adding some LLVM internal thing like a scheduler or an instruction. You wouldn't want every cell to be massive even if the notebook could handle the text.
It's a niche that most people won't hit, so it needs input from people who do to decide what the best tradeoffs are. I don't want to create more things for folks to learn in the process.
Possible Solutions
- Arbitrary cut off for the output, basically
tail <N>
.- Easiest to understand, but zero nuance.
- If the content of the includes changes between versions then your
<N>
may need to change. - Let's not do this, but writing it here as the "baseline" from which to compare better options.
- Detect the output is too large and return an error to the notebook telling them to use the compiler directly.
- We're not actually fixing anything, but at least it's clearer.
- Emitting JSON and running one of the JSON query languages on it.
- Now you’re learning yet another language.
- The result is more JSON, not the record format you’re used to.
- Pragmas/notes to mark include file content in the output.
- No way to tell “user” vs. “system” includes apart right now.
- You may want to see some subset of an included file anyway.
- Regular expression for class and definition names.
- If we use JSON, same issues as before.
- Probably could match on the output, but likely easier to make it a compiler option.
- You are now learning regex but at least there are sites that make building a regex easy, unlike a JSON query language I expect.
- Is 2 expressions enough, what about multiclass?
- Marking "new" records somehow by comparing the previous output.
- You may want a mix of old and new in the output.
- Still leaves 300k of lines in the first cell even if you only want new stuff in the next ones.
- Not sure we can reliably detect "new" given that the order may not be deterministic.