Skip to content

TableGen Jupyter notebook should have a way to filter compiler output #72856

Open
@DavidSpickett

Description

@DavidSpickett

Filing this to collect some ideas for future work, only one person I know of has tried this and probably worked around it by using the compiler directly.

Problem

If you use LLVM TableGen files like Target.td in a notebook, the llvm-tblgen output is > 320,000 lines. This breaks the limit Jupyter sets and removing that limit likely makes the client crash.

You might do this if you wanted to make a notebook about adding some LLVM internal thing like a scheduler or an instruction. You wouldn't want every cell to be massive even if the notebook could handle the text.

It's a niche that most people won't hit, so it needs input from people who do to decide what the best tradeoffs are. I don't want to create more things for folks to learn in the process.

Possible Solutions

  • Arbitrary cut off for the output, basically tail <N>.
    • Easiest to understand, but zero nuance.
    • If the content of the includes changes between versions then your <N> may need to change.
    • Let's not do this, but writing it here as the "baseline" from which to compare better options.
  • Detect the output is too large and return an error to the notebook telling them to use the compiler directly.
    • We're not actually fixing anything, but at least it's clearer.
  • Emitting JSON and running one of the JSON query languages on it.
    • Now you’re learning yet another language.
    • The result is more JSON, not the record format you’re used to.
  • Pragmas/notes to mark include file content in the output.
    • No way to tell “user” vs. “system” includes apart right now.
    • You may want to see some subset of an included file anyway.
  • Regular expression for class and definition names.
    • If we use JSON, same issues as before.
    • Probably could match on the output, but likely easier to make it a compiler option.
    • You are now learning regex but at least there are sites that make building a regex easy, unlike a JSON query language I expect.
    • Is 2 expressions enough, what about multiclass?
  • Marking "new" records somehow by comparing the previous output.
    • You may want a mix of old and new in the output.
    • Still leaves 300k of lines in the first cell even if you only want new stuff in the next ones.
    • Not sure we can reliably detect "new" given that the order may not be deterministic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementImproving things as opposed to bug fixing, e.g. new or missing featuretablegen

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions