The work on new error formats has raised some interesting questions about how to test the compiler. The current testing framework works roughly as follows:
- run-pass: the test must execute, but compilation is allowed to issue warnings
-
compile-fail:
- you must annotate errors/warnings that are expected
- any unexpected errors/warnings will make the test fail
- if you add a NOTE or HELP annotation, then you must annotate all notes and helps
- in that case, any unexpected errors/warnings will make the test fail
- however, if you donāt have any NOTE or HELP annotations in the test, then notes/helps are generally ignored
- you can also add arbitrary annotations that just look for specific lines
-
ui (this one is relatively new)
- a file alongside the test captures the exact output of the compiler
- intended for testing the āround-tripā experience, including all formatting details
- example Rust file and expected output
- there is a script to automatically capture the current output and update the reference files en masse
- the idea is that you can then browse the
git diff
to see the effect of your change - also easy for people reviewing your PR, since theyāll see the diffs
- the idea is that you can then browse the
This setup has I think largely evolved following a particular philosophy: itās also one that I have actively encouraged thus far. Specifically, that we should be able to write tests that test precisely the thing we are intending to test and avoid incidental presentation details. So, for example, if we are writing a type-checking test, we might want to check that a particular bit of code will fail to compile for the expected reason, but we donāt necessarily care what error message it gives ā and if we do want to test that, we should write a test targeting specifically the error-message (ideally, probably, a ui
test, that captures the full presentation).
Until recently, Iāve been advocating this POV for a couple of reasons. For one thing, it avoids a lot of painful redundancy in the tests. For example, consider a test like this one, which tests for a number of different scenarios, all of which generate the same error message. And actually that test is just one of a set of many tests. Now that particular message (ācannot inferā) has a lot of notes and things attached to it ā but is it really valuable to test all of those notes on each of these variations? Also, it sometimes takes a lot of time of time to keep those annotations up to date, particularly if you make a small change to wording. Is it worth it?
On the other hand, Iāve been rethinking things a bit as a result of the error message update. In particular, it is handy sometimes to have a bunch of tests for random scenarios, even if those tests were not intended to test notes. I think we caught at least a few (maybe more than a few?) bugs around custom notes as a result of having those extra annotations. Essentially, checking the full details (all notes etc) lets you get the most out of your existing test base, in some sense.
But by that logic, it would seem that we should move things to the ui
test suite entirely, which captures all aspects of the compilerās behavior. For example, subtle changes to the span can change what columns are highlighted ā this can have a very big impact on how usable the compiler is, especially since a lot of error messages rely on the span to give context ā but it doesnāt get tested at all by the compile-fail
setup, which is focused on lines.
The scripts that update the ui
tests are also an interesting factor, since they mean that it can actually be less work to keep the test suite in sync after a big change than compile-fail, which can require a lot of laborious changes. (Though some argue that it makes it too easy to update the expected output, since you donāt necessarily think carefully about why something changed; I think thatās largely a problem that reviews will catch, though.)
So, what do people think? Should we continue having a ātargetedā testing philosophy? Should we go for maximal testing? Does that imply that ui
tests are the way we should go, or is that too much detail (and, if so, why?)
cc @jntrnr