Closed
Description
This started off as a build time analysis (#130090 (comment)), but since I now have the infrastructure, I tried -flto=thin
, too:
- faster in building 520.6 vs 651.2 seconds
- is neutral on the pyperformance benchmarks
- would bring us in sync with Linux, because there
CONFIGURE_CFLAGS_NODIST
andCONFIGURE_LDFLAGS_NOLTO
both use-flto=thin
when I configure for clang in WSL Ubuntu-24.04. See also the discussion why not to use full-flto
in Revert to default fullLTO on Clang #130048
Benchmark | clang.pgo.20.1.0-rc2 | clang.pgo.thin.20.1.0-rc2 |
---|---|---|
Geometric mean | (ref) | 1.00x faster |
Detailed pybenchmark results
Benchmark | clang.pgo.20.1.0-rc2 | clang.pgo.thin.20.1.0-rc2 |
---|---|---|
float | 95.0 ms | 89.7 ms: 1.06x faster |
json_loads | 29.8 us | 28.6 us: 1.04x faster |
mdp | 2.86 sec | 2.77 sec: 1.03x faster |
html5lib | 68.3 ms | 66.2 ms: 1.03x faster |
async_tree_none_tg | 330 ms | 320 ms: 1.03x faster |
pyflate | 518 ms | 505 ms: 1.03x faster |
sqlite_synth | 3.21 us | 3.13 us: 1.03x faster |
pidigits | 228 ms | 223 ms: 1.02x faster |
bench_mp_pool | 168 ms | 165 ms: 1.02x faster |
async_tree_eager_io | 742 ms | 727 ms: 1.02x faster |
generators | 34.5 ms | 33.8 ms: 1.02x faster |
comprehensions | 18.3 us | 17.9 us: 1.02x faster |
async_tree_cpu_io_mixed | 641 ms | 629 ms: 1.02x faster |
scimark_sparse_mat_mult | 4.51 ms | 4.43 ms: 1.02x faster |
async_tree_memoization | 425 ms | 417 ms: 1.02x faster |
sympy_expand | 538 ms | 529 ms: 1.02x faster |
unpack_sequence | 57.0 ns | 56.0 ns: 1.02x faster |
regex_dna | 209 ms | 205 ms: 1.02x faster |
async_generators | 465 ms | 458 ms: 1.02x faster |
scimark_sor | 140 ms | 137 ms: 1.02x faster |
sympy_str | 319 ms | 314 ms: 1.02x faster |
async_tree_io_tg | 751 ms | 740 ms: 1.01x faster |
regex_effbot | 3.14 ms | 3.10 ms: 1.01x faster |
async_tree_eager_tg | 272 ms | 268 ms: 1.01x faster |
pickle_dict | 27.3 us | 27.0 us: 1.01x faster |
async_tree_eager_memoization_tg | 363 ms | 359 ms: 1.01x faster |
sympy_integrate | 22.5 ms | 22.2 ms: 1.01x faster |
sympy_sum | 181 ms | 179 ms: 1.01x faster |
2to3 | 390 ms | 386 ms: 1.01x faster |
hexiom | 6.68 ms | 6.61 ms: 1.01x faster |
docutils | 3.03 sec | 3.00 sec: 1.01x faster |
sqlglot_normalize | 121 ms | 120 ms: 1.01x faster |
async_tree_memoization_tg | 392 ms | 389 ms: 1.01x faster |
async_tree_cpu_io_mixed_tg | 614 ms | 609 ms: 1.01x faster |
tomli_loads | 2.20 sec | 2.18 sec: 1.01x faster |
spectral_norm | 102 ms | 101 ms: 1.01x faster |
python_startup_no_site | 34.4 ms | 34.2 ms: 1.01x faster |
genshi_text | 24.6 ms | 24.5 ms: 1.01x faster |
dulwich_log | 119 ms | 118 ms: 1.00x faster |
go | 128 ms | 128 ms: 1.00x faster |
deltablue | 3.62 ms | 3.63 ms: 1.00x slower |
unpickle_pure_python | 247 us | 248 us: 1.00x slower |
xml_etree_generate | 107 ms | 107 ms: 1.01x slower |
django_template | 39.2 ms | 39.4 ms: 1.01x slower |
coroutines | 24.8 ms | 25.0 ms: 1.01x slower |
mako | 13.3 ms | 13.5 ms: 1.01x slower |
unpickle | 15.9 us | 16.1 us: 1.01x slower |
nbody | 119 ms | 121 ms: 1.01x slower |
fannkuch | 465 ms | 472 ms: 1.01x slower |
crypto_pyaes | 81.3 ms | 82.6 ms: 1.02x slower |
json_dumps | 11.5 ms | 11.7 ms: 1.02x slower |
deepcopy | 285 us | 291 us: 1.02x slower |
pprint_safe_repr | 858 ms | 876 ms: 1.02x slower |
xml_etree_iterparse | 136 ms | 139 ms: 1.02x slower |
gc_traversal | 5.03 ms | 5.14 ms: 1.02x slower |
meteor_contest | 115 ms | 117 ms: 1.02x slower |
deepcopy_memo | 33.8 us | 34.7 us: 1.03x slower |
richards_super | 51.1 ms | 52.6 ms: 1.03x slower |
scimark_fft | 327 ms | 337 ms: 1.03x slower |
richards | 44.9 ms | 46.3 ms: 1.03x slower |
pickle_list | 4.83 us | 4.99 us: 1.03x slower |
deepcopy_reduce | 2.93 us | 3.03 us: 1.03x slower |
pprint_pformat | 1.74 sec | 1.80 sec: 1.03x slower |
logging_simple | 10.9 us | 11.4 us: 1.05x slower |
logging_format | 12.1 us | 12.6 us: 1.05x slower |
xml_etree_parse | 197 ms | 208 ms: 1.05x slower |
Geometric mean | (ref) | 1.00x faster |
pgo_clang_20.1.0-rc2 | pgo_clang_thin_20.1.0-rc2 | |
---|---|---|
pginstr | 297.2 | 219.3 |
pgo | 70.0 | 69.0 |
kill | 1.2 | 0.5 |
pgupd | 282.8 | 231.7 |
total time | 651.2 | 520.6 |
Details pginstrument
pgo_clang_20.1.0-rc2 | pgo_clang_thin_20.1.0-rc2 | |
---|---|---|
_freeze_module | 38.5 | 40.0 |
python314 | 141.5 | 81.3 |
pyexpat | 52.7 | 3.9 |
_elementtree | 51.8 | 5.3 |
sqlite3 | 46.0 | 42.4 |
liblzma | 18.2 | 16.5 |
_decimal | 12.4 | 7.7 |
_testcapi | 8.3 | 7.1 |
_bz2 | 7.0 | 4.9 |
_ctypes | 6.9 | 7.5 |
_testlimitedcapi | 4.9 | 4.3 |
_wmi | 4.5 | 3.0 |
_overlapped | 4.5 | 3.2 |
_asyncio | 4.0 | 5.2 |
_lzma | 3.8 | 1.8 |
_ssl | 3.7 | 5.5 |
_ctypes_test | 3.7 | 3.4 |
_multiprocessing | 3.5 | 2.7 |
_sqlite3 | 3.4 | 2.8 |
venvwlauncher | 3.3 | 2.7 |
_zoneinfo | 3.1 | 3.4 |
unicodedata | 2.7 | 3.0 |
pyshellext | 2.7 | 2.6 |
pyw | 2.7 | 2.7 |
py | 2.6 | 2.5 |
_socket | 2.4 | 3.7 |
_testinternalcapi | 2.4 | 2.2 |
_tkinter | 2.2 | 4.1 |
_testclinic | 2.0 | 1.9 |
_hashlib | 1.8 | 3.1 |
select | 1.8 | 2.2 |
venvlauncher | 1.8 | 1.7 |
winsound | 1.7 | 3.3 |
_uuid | 1.6 | 3.2 |
_queue | 1.6 | 2.3 |
_testembed | 1.5 | 1.5 |
_testbuffer | 1.4 | 1.3 |
pythonw | 1.1 | 1.1 |
_testconsole | 1.1 | 1.1 |
_testmultiphase | 1.0 | 1.0 |
_testsinglephase | 1.0 | 1.0 |
python | 1.0 | 0.9 |
_testclinic_limited | 0.9 | 0.9 |
_testimportmultiple | 0.9 | 0.9 |
python3 | 0.5 | 0.5 |
total | 465.8 | 303.3 |
Details pgupdate
pgo_clang_20.1.0-rc2 | pgo_clang_thin_20.1.0-rc2 | |
---|---|---|
_freeze_module | 38.0 | 39.5 |
python314 | 141.9 | 95.4 |
sqlite3 | 44.4 | 42.9 |
liblzma | 17.3 | 16.5 |
_decimal | 11.2 | 8.7 |
_testcapi | 8.6 | 7.3 |
_ctypes | 8.0 | 7.2 |
_bz2 | 7.8 | 5.5 |
_ssl | 5.2 | 5.6 |
_testlimitedcapi | 5.0 | 4.2 |
pyexpat | 4.6 | 3.6 |
_asyncio | 4.5 | 4.6 |
_socket | 4.3 | 3.5 |
_tkinter | 4.0 | 4.2 |
_ctypes_test | 3.7 | 3.4 |
_overlapped | 3.5 | 3.7 |
_elementtree | 3.5 | 4.5 |
_wmi | 3.5 | 3.1 |
_zoneinfo | 3.2 | 3.2 |
_lzma | 3.2 | 1.9 |
unicodedata | 3.2 | 3.0 |
_sqlite3 | 3.1 | 2.7 |
_hashlib | 3.1 | 3.3 |
venvwlauncher | 3.1 | 3.0 |
_multiprocessing | 2.8 | 2.6 |
pyshellext | 2.7 | 2.6 |
pyw | 2.6 | 2.6 |
_uuid | 2.6 | 2.8 |
py | 2.6 | 2.7 |
_testinternalcapi | 2.4 | 2.2 |
_testclinic | 2.0 | 1.9 |
_queue | 1.9 | 2.2 |
winsound | 1.8 | 3.0 |
venvlauncher | 1.7 | 1.5 |
select | 1.6 | 2.0 |
_testembed | 1.5 | 1.4 |
_testbuffer | 1.4 | 1.3 |
_testconsole | 1.1 | 1.0 |
pythonw | 1.1 | 1.1 |
_testmultiphase | 1.0 | 1.1 |
_testsinglephase | 1.0 | 1.0 |
python | 1.0 | 0.9 |
_testclinic_limited | 0.9 | 0.9 |
_testimportmultiple | 0.9 | 0.9 |
python3 | 0.5 | 0.5 |
total | 372.9 | 316.8 |