LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 52146 - [SystemZ][NewPM] huge loop expansion with opt -O1
Summary: [SystemZ][NewPM] huge loop expansion with opt -O1
Status: NEW
Alias: None
Product: libraries
Classification: Unclassified
Component: Backend: SystemZ (show other bugs)
Version: trunk
Hardware: PC Linux
: P normal
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-10-11 17:33 PDT by Josh Stone
Modified: 2021-10-13 14:33 PDT (History)
2 users (show)

See Also:
Fixed By Commit(s):


Attachments
bitcode from rust#89609 (944.64 KB, application/gzip)
2021-10-11 17:33 PDT, Josh Stone
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Josh Stone 2021-10-11 17:33:03 PDT
Created attachment 25352 [details]
bitcode from rust#89609

xref: https://github.com/rust-lang/rust/issues/89609

After enabling the new pass manager in rustc, one particular crate showed a large increase in compile time, but only when targeting s390x. I attached the bitcode from rustc, and I can reproduce the problem with opt -O1 on main.

$ llvm-dis <rustc_ast_lowering-cgu.0.rcgu.thin-lto-after-patch.bc | wc -lL
 157863    2306

$ time opt <rustc_ast_lowering-cgu.0.rcgu.thin-lto-after-patch.bc -O1 -S | wc -lL
 794414 4515800

real    2m4.580s
user    2m4.675s
sys     0m0.221s

There are two instances of that longest 4.5M line length, and both are huge phi nodes just before invoking a function ending in 17hde8a472161ebd31bE (so you can search for that). The predecessors look like a mass of loopexit.split-lp expansions gone rogue.

The same input behaves reasonably with the old PM, both optimizing faster and resulting in a smaller bitcode.

$ time opt <rustc_ast_lowering-cgu.0.rcgu.thin-lto-after-patch.bc -O1 -S --enable-new-pm=0 | wc -lL
 148220    2306

real    0m2.266s
user    0m2.322s
sys     0m0.024s


If you edit out the "target-cpu"="z10" attributes in the IR, this reproducer can also be forced to -mtriple x86_64, and that runs fine with new PM.
Comment 1 Josh Stone 2021-10-13 14:33:25 PDT
As mentioned in bug 45253 comment 11, https://reviews.llvm.org/D98481 seems to solve the problem by restricting the inliner.

$ time opt <rustc_ast_lowering-cgu.0.rcgu.thin-lto-after-patch.bc -O1 -S | wc -lL
 216583    2306

real    0m4.597s
user    0m4.663s
sys     0m0.037s

Still slower than old-pm, but it's acceptable.