Created attachment 25352 [details] bitcode from rust#89609 xref: https://github.com/rust-lang/rust/issues/89609 After enabling the new pass manager in rustc, one particular crate showed a large increase in compile time, but only when targeting s390x. I attached the bitcode from rustc, and I can reproduce the problem with opt -O1 on main. $ llvm-dis <rustc_ast_lowering-cgu.0.rcgu.thin-lto-after-patch.bc | wc -lL 157863 2306 $ time opt <rustc_ast_lowering-cgu.0.rcgu.thin-lto-after-patch.bc -O1 -S | wc -lL 794414 4515800 real 2m4.580s user 2m4.675s sys 0m0.221s There are two instances of that longest 4.5M line length, and both are huge phi nodes just before invoking a function ending in 17hde8a472161ebd31bE (so you can search for that). The predecessors look like a mass of loopexit.split-lp expansions gone rogue. The same input behaves reasonably with the old PM, both optimizing faster and resulting in a smaller bitcode. $ time opt <rustc_ast_lowering-cgu.0.rcgu.thin-lto-after-patch.bc -O1 -S --enable-new-pm=0 | wc -lL 148220 2306 real 0m2.266s user 0m2.322s sys 0m0.024s If you edit out the "target-cpu"="z10" attributes in the IR, this reproducer can also be forced to -mtriple x86_64, and that runs fine with new PM.
As mentioned in bug 45253 comment 11, https://reviews.llvm.org/D98481 seems to solve the problem by restricting the inliner. $ time opt <rustc_ast_lowering-cgu.0.rcgu.thin-lto-after-patch.bc -O1 -S | wc -lL 216583 2306 real 0m4.597s user 0m4.663s sys 0m0.037s Still slower than old-pm, but it's acceptable.