Basics of using instrumented BOLT

Tetrakai · July 10, 2024, 2:21am

I appear to have successfully generated a myprogram.bolt executable from the original PGO’ed with gcc. However I did not see any improved performance. In fact it is a bit slower, even on the exact same input.

Using gcc, when does -fno-reorder-blocks-and-partition flag need to be included? Is this before PGO, after PGO, or both? Also, should I expect an error if I did this incorrectly, or just fail to see any improvement.
When the program was run multiple times it could not update prof.fdata. I saw:
Error while trying to open profile file for writing: /tmp/prof.fdata
Basically I need to run the same program with various inputs. Am I correct that, in contrast with using gcc for PGO, it is required to generate a different fdata file for each run then combine them (as described in the readme)?

Thanks for any help.

aaupov · July 12, 2024, 6:44pm

Using gcc, when does -fno-reorder-blocks-and-partition flag need to be included? Is this before PGO, after PGO, or both?

The flag needs to used at least for pre-BOLT binary build, but it would make sense to apply it for all builds (training and PGO).

Also, should I expect an error if I did this incorrectly, or just fail to see any improvement.

If the input binary contains split functions, BOLT issues a warning: “split function detected on input”, and such functions may not be optimized.

When the program was run multiple times it could not update prof.fdata. I saw:
Error while trying to open profile file for writing: /tmp/prof.fdata
Basically I need to run the same program with various inputs.

You can disambiguate multiple profiles using a suffix appended to the end of prof.fdata file – set -instrumentation-file-append-pid during instrumentation.

The profiles could be merged into one using merge-fdata BOLT tool (should be built as part of bolt component, or just run ninja merge-fdata in a configured llvm build directory).

Tetrakai · July 14, 2024, 3:57am

I didn’t end up seeing a benefit but it all worked great, thanks. Perhaps my program/runtime was too small/short.

Topic		Replies	Views
Can the binary optimized by Autofdo and bolt be iteratively optimized? Using Clang pgo , clang , bolt	0	123	March 25, 2024
How to solve the problem of stale Profile data when Bolt is used with pgo? BOLT pgo , llvm , bolt	11	396	January 16, 2025
BOLT ❤ PGO. But are there recommendations for making them work better together? BOLT	2	173	September 13, 2024
Propeller can work with pgo use one profile? IR & Optimizations pgo , clang	9	248	April 11, 2024
BOLT: Optimizing relocatable files BOLT	4	511	March 23, 2023

Basics of using instrumented BOLT

Related topics