Support better error messages

Update: this has now become https://github.com/haskellfoundation/tech-proposals/pull/21. Original proposal follows.

* * *

Our proposal process is currently in flux (#10), but some ideas are better written down than ephemerally stored in wetware, so I record ideas here that will become, eventually a proper proposal.

# The Problem

An oft-repeated challenge in learning Haskell is the quality of the error messages, along at least these dimensions:
1. An error message may have an unknown provenance: is it generated by stack? by cabal? by GHC? by haddock? Even for experts (e.g. me, if I may be so bold), this is a real challenge sometimes. (My example: trying to get a program using a GHC plugin compiled. I think the message I grappled with was from haddock in the end, but it was produced by the cabal executable.)
2. An error message may mention concepts unfamiliar to the programmer and unrelated to the problem. (Example: if I write `myFirstAttemptAtMaybe :: Just Int; myFirstAttemptAtMaybe = Just 5`, you probably don't want to enable `-XDataKinds`.)
3. An error message may mention concepts unfamiliar to the programmer and related to the problem. (Example: if I write `not :: a -> a; not True = False`, then I see something about "rigid type variables". This information *is* relevant, but it's still unhelpful if the user doesn't know about rigid type variables.)
4. Error messages may contain too much information. (Example: given that the erroneous line is now repeated under the error message itself, the "In the expression" lines may be redundant.)
5. Error messages may contain too little information. (Example: It is possible to get GHC to say that `Proxy` does not match `Proxy`, because the invisible argument to `Proxy` is different. GHC usually suggests `-fprint-explicit-kinds` in these scenarios, but now always.)
6. Error messages may say too little about what caused the error to occur. (Examples: 1) If I `import M` where `M` exports `not` and then use `not` in my code, I'll get an ambiguity error between `Prelude.not` and `M.not`. Understanding this error requires knowing that the `Prelude` is imported implicitly. 2) GHC might report a missing instances, but only after simplifying a goal via several available (and perhaps wrong) instances. By the time the error is reported, the programmer has to re-create GHC's thinking process, which can be challenging.)
7. Error messages do not integrate well with tooling. (Example: many error messages suggest extensions to enable, but IDEs currently have to parse strings in order to find these suggestions, which are not at all standardized across error messages.)

# The Technical Solution: Add More Structure

In order to fix some of the problems above, I've been on a multi-year campaign (aided critically by @alpmestan and @adinapoli) to add more structure to error messages, along two main axes.

## Using error datatypes

This was first written up as https://github.com/ghc-proposals/ghc-proposals/pull/306, is mostly implemented as described by https://gitlab.haskell.org/ghc/ghc/-/wikis/Errors-as-(structured)-values, and is the subject of a recent blog post at https://well-typed.com/blog/2021/08/the-new-ghc-diagnostic-infrastructure/.

The central idea is that, instead of GHC producing (essentially) a string to describe an error message, it should use a data constructor to state the nature of the error and store any auxiliary information necessary to print the error. Then, separately, GHC should render the error into a (structured) string.

After much work, this is now possible, as described in the pages linked above. Not all error messages have been ported over to the new architecture yet. But for messages that have been, tools no longer need to parse messages to know what they say, and it's possible to, say, enumerate all the possible messages that GHC could produce. With the extra structure here, it's now feasible to create, say, a wiki or other website that explicates each error message. This way, when a user hits an error, they are linked to a page with more information, and quite possibly tips from others who have been there before. Relatedly, we could assign error codes to messages (https://github.com/ghc-proposals/ghc-proposals/pull/325) to make this all even easier.

This solution addresses problems 3 and 7. We see above how this will integrate better with tooling. This step also addresses 3 (to some degree) by allowing for a simple process for making web pages related to specific errors.

## Using an inspectable document type

Today, GHC produces pretty-printed strings ("ppstrings") for errors. A ppstring is just like a string, but with a bit of structure to indicate how lines should be wrapped and/or indented. A ppstring is rendered into a string using a few settings, including the desired line width of the output. However, a ppstring's internal structure is all about presentation, not about semantic content.

The next technical part of this effort is adding structure to the error message texts. This is https://github.com/ghc-proposals/ghc-proposals/pull/307. The additional structure would allow us to embed, say, a `Type` into some error text, in a way that remembers it's a `Type`. This would allow for an IDE to support clickable error messages, where the user could click on the type mentioned in an error and, say, show its kinds explicitly. Or jump to its definition. Or show how GHC decided some expression had that type. The sky is the limit here -- but, critically, we need our error texts to remember what parts of the text are types (or expressions, or language extensions, or other goodies).

There is some design work here, but no implementation work that I'm aware of.

This solution addresses problems 4, 5, and 6, by creating the possibility of fine-grained user control over the level of detail in error messages.

# The Social Solution: design better error messages

The above technical solutions are simply about installing plumbing. The real work is around crafting error messages that make more sense to users. While not a social problem, crafting better error messages seems to require a social solution, in that we absolutely need a diversity of voices working together in order to craft the best messages. In particular, implementors are very poorly poised to write error messages that will be helpful to newcomers, because of the great chasm between them in the amount of information they respectively know.

The current attempt to address this problem is at https://github.com/haskell/error-messages. This repo, the brainchild of @ketzacoatl and me, is meant to be a clearinghouse of ideas around improving individual error messages, by writing better text. This work will be enhanced by the technical solutions above, but there are great strides we can make without rewriting parts of GHC.

This solution is mostly about problem 2, but good solutions to all the problems will require careful collaboration across our user community.

# The Ecosystem Solution: identifying the speaker

As we improve the error messages, it would be good to come to a community standard around being able to track where a message comes from -- and even having a uniform way of presenting errors to the user. Currently, GHC has a particular style to its errors, listing the filename, line, and column number, then writing out the message, and lastly including the error-producing line with a certain span highlighted. Should other tools adopt the same style? Should GHC adopt other tools' style? Maybe, if we're integrating with IDEs, this doesn't matter -- but then, the question becomes how different parts of our tooling interact with IDEs. The more we standardize, the easier it will be to flexibly work with a variety of tools.

As we standardize, we should also have some standard system for identifying the provenance of an error message. If we use error codes, for example, maybe the first character of the code could define the generating tool. e.g. `G` for GHC, `C` for cabal, `S` for stack, `H` for haddock, `L` for hlint, and perhaps others. Given its position in our community, the HF would police these prefixes and allocate them for the common good. One advantage to having clear error message provenance is that, if there is a bug or infelicity in a message, users will know where to post! This invites users into our ecosystem, and turns users into contributors.

This addresses problem 1.

# What we need

There is Good Stuff happening in the space described by this post. But there are many opportunities for more volunteers. Here, I list a few ideas I have for ways people could contribute, numbered only for reference (not to imply priority).

1. The error-messages repo (https://github.com/haskell/error-messages) is having some good conversations, but there would ideally be a much higher level of participation. In addition, making this repo work to produce good messages will require thoughtful care and someone leading conversations to conclusions. Currently, @ketzacoatl and I have been serving this need, but more hands here would be very helpful.
2. Relatedly, once a new message is decided upon in the error-messages repo, someone has to implement the change in GHC (or other tool). Sometimes, the work is just around wording, and is suitable for a GHC newcomer.
3. The error-datatypes work in GHC has already created the new infrastructure for datatype-based error messages. But we still need to have volunteers convert the many messages scattered throughout GHC to use this new structure. This is harder work and requires some familiarity with GHC. (As one particular example, I expect to do the type-checker messages myself, as I think my level of knowledge of these messages will be instrumental. Other messages will be easier!)
4. The error-datatypes work has already created lots and lots of error-message constructors. Not all of these are well documented or exemplified. For example, see https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Parser/Errors/Types.hs, which contains a few constructors with lovely, detailed documentation and examples, and many, many more that lack these niceties. A small army of volunteers could fix this! This work does not require nearly as much familiarity with GHC (really, you'd just need to build it and operate the testsuite).
5. Structured error text would be a major step forward, but it would need someone dedicated to designing a great system and, likely, implementing it. This is, sadly, a large ask, and it might be appropriate for this to be a funded task -- not sure. Furthermore, it's hard to see how to easily break this down into smaller, separable sub-tasks.
6. Now that we have lots of error-message constructors, we can start assigning ID codes to them and then creating web pages that describe each one. (This is related to volunteer opportunity 4, documenting the constructors.) Once we have structured error text, we can even imagine having special glossary-item components to messages, where users could click on terminology to get linked to a page explaining the term. (Examples: "rigid", "infinite type", "superclass", etc.) Setting up a space where this content could be hosted and kept up-to-date would be a great job for a volunteer, as would curating the site generally.

While there's synergy among these different opportunities, they really are (I believe) mostly independent, and we could do all of them at once.

As for my own role: I care deeply about the quality of error messages, but I do not have the bandwidth to oversee all the volunteers I would hope to attract to this movement -- and so a key other piece would just be coordination. The part I would likely be most hand-on with is around the design for structured text, which is a hard Haskell-programming design challenge, and the part of this I could probably offer the most value to.

# Role for the HF

The HF slots in most naturally as the coordinator and generator of volunteers. HF resources (e.g. Slack) could be used for volunteers to choose what to work on and to avoid stepping on one another's toes. The HF could provide the steady, encouraging hand to keep this all going. The HF could additionally provide the coordination among ecosystem projects to (if we think it's a good idea) standardize messages and provenances. Some aspects of this require designing interfaces among tools, and the HF is well suited to this work. Perhaps, also, the HF could help fund or otherwise source a singularly generous volunteer to design and implement the structured error text piece of this puzzle.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support better error messages #12

The Problem

The Technical Solution: Add More Structure

Using error datatypes

Using an inspectable document type

The Social Solution: design better error messages

The Ecosystem Solution: identifying the speaker

What we need

Role for the HF

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support better error messages #12

Description

The Problem

The Technical Solution: Add More Structure

Using error datatypes

Using an inspectable document type

The Social Solution: design better error messages

The Ecosystem Solution: identifying the speaker

What we need

Role for the HF

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions