I appear to have successfully generated a myprogram.bolt executable from the original PGO’ed with gcc. However I did not see any improved performance. In fact it is a bit slower, even on the exact same input.
Using gcc, when does -fno-reorder-blocks-and-partition flag need to be included? Is this before PGO, after PGO, or both? Also, should I expect an error if I did this incorrectly, or just fail to see any improvement.
When the program was run multiple times it could not update prof.fdata. I saw: Error while trying to open profile file for writing: /tmp/prof.fdata
Basically I need to run the same program with various inputs. Am I correct that, in contrast with using gcc for PGO, it is required to generate a different fdata file for each run then combine them (as described in the readme)?
Using gcc, when does -fno-reorder-blocks-and-partition flag need to be included? Is this before PGO, after PGO, or both?
The flag needs to used at least for pre-BOLT binary build, but it would make sense to apply it for all builds (training and PGO).
Also, should I expect an error if I did this incorrectly, or just fail to see any improvement.
If the input binary contains split functions, BOLT issues a warning: “split function detected on input”, and such functions may not be optimized.
When the program was run multiple times it could not update prof.fdata. I saw: Error while trying to open profile file for writing: /tmp/prof.fdata
Basically I need to run the same program with various inputs.
You can disambiguate multiple profiles using a suffix appended to the end of prof.fdata file – set -instrumentation-file-append-pid during instrumentation.
The profiles could be merged into one using merge-fdata BOLT tool (should be built as part of bolt component, or just run ninja merge-fdata in a configured llvm build directory).