Canonicalization passes
Canonicalization passes are transformations that put the IR in a canonical form, that is, an agreed-upon way to represent expressions.
For instance, consider the C statement a = b - c
. A frontend could produce the following two equivalent IRs:
- Version 1:
%a = sub i64 %b - %c
- Version 2:
%neg_c = sub i64 0, %c %a = add i64 %b, %neg_c
In both cases, the semantics are the same, but the first version is a more direct translation of the input C statement and is also more compact. The canonical form in LLVM of this expression is the first version. It is important to be aware of the tendency of the LLVM middle end to revolve around the canonical representation.
What this means is the following:
- With the standard pipeline, anything that is not canonical will be canonicalized
- Optimizations have been tested almost exclusively using the canonical form
The second point means that...