Hi Everyone,
I’ve been working on a usability improvement for LTO workflows. We (Sony) have
been using this in production for some time and we’d like to offer it here as an
non-default LTO configuration option for clang.
How it works
Currently, LTO mode is chosen during the pre-link phase and can’t
be changed at a later stage. Thin and Full LTO bitcode may share a binary
format (LLVM bitcode), but are explicitly made incompatible. When summaries
were added to Full LTO, they were given a different name, to ensure that they
were never confused with ThinLTO summaries.
Our LTO pipeline creates a single LTO bitcode structure that can be used by
Thin or Full LTO. This means that the LTO mode can be chosen at link time. In
addition, this means that all LTO bitcode is compatible, from an optimization
perpsective. Currently, if a compilation has both Thin and Full LTO bitcode
files, they will be optimized separately with no information shared between the
Full and Thin backends. Since the internal structure of a bitcode file isn’t
visible to most users, this may not be the expected behavior. This
compatibility also means that deploying bitcode libraries can be a simpler
process. A normalized set of features over all bitcode files helps to ensure
that users get the optimizations they expect when these libraries are included.
We implemented this feature by making every LTO module identical in structure
to a split ThinLTO bitcode module. We then use the Full LTO pipeline for
pre-link optimisation. This allows for maximum optimisation and compatibility.
It also, as you’ll see below, leads to increased file sizes. This is due to the
fact that type information is always available when using this scheme, which
means that more split modules are created.
Performance
This feature does come with a build time cost, however. Here are Webkit build
times when using ThinLTO.
Run # | Build Time (s) |
---|---|
Unified: | |
0 | 2866.34 |
1 | 2868.64 |
2 | 2872.39 |
Distinct: | |
3 | 2831.95 |
4 | 2826.11 |
5 | 2830.02 |
Unified AVG (s): | 2869.12 |
---|---|
Distinct AVG (s): | 2829.36 |
% Diff: | 1.40% |
This also comes with a slight increase in file size when compared to a standard
ThinLTO build. This is because loops are unrolled and some vectorization also
occurs.
Unified: | 376878.7998 |
---|---|
Distinct: | 373361.0738 |
% diff: | 0.938% |
When compared to a normal Full LTO build, the difference is negligible.
Diff: | -0.084303938 |
---|---|
Unified: | 376878.7998 |
Distinct: | 376878.8841 |
% diff | 0.000% |
Conclusion
We’ve been using this feature in production for some time and it’s been stable.
We’ve had the opportunity to work out the kinks (mostly related to symbol
resolution). At this point, we’d like get some feedback on contributing this
system. We think it could be a useful workflow for others in the project.
Comments and feedback welcome. I’ve posted the patch in three parts, with
one diff for each project changed (clang, llvm, lld).
Patches:
https://reviews.llvm.org/D123803
https://reviews.llvm.org/D123804
https://reviews.llvm.org/D123805
Thanks,
Matt