LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 47736 - SystemZ reordered store/compare clobbers CC
Summary: SystemZ reordered store/compare clobbers CC
Status: RESOLVED FIXED
Alias: None
Product: libraries
Classification: Unclassified
Component: Backend: SystemZ (show other bugs)
Version: 11.0
Hardware: PC Linux
: P enhancement
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-10-05 15:12 PDT by Josh Stone
Modified: 2020-10-09 04:10 PDT (History)
4 users (show)

See Also:
Fixed By Commit(s):


Attachments
Reproducer after llvm-extract (604.32 KB, text/plain)
2020-10-06 17:26 PDT, Josh Stone
Details
Reproducer after llvm-reduce (29.70 KB, text/plain)
2020-10-06 17:26 PDT, Josh Stone
Details
reduced testcase (1.03 KB, text/x-matlab)
2020-10-08 04:27 PDT, Jonas Paulsson
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Josh Stone 2020-10-05 15:12:22 PDT
xref: https://bugzilla.redhat.com/show_bug.cgi?id=1883457
xref: https://github.com/rust-lang/rust/issues/77382

In one particular Rust test case, we're seeing code miss a branch that should be taken, and in my limited knowledge of Z, it looks like a clobbered condition.

The IR is too big for bugzilla, and I wasn't able to reduce it, so I uploaded it here:
https://jistone.fedorapeople.org/bz1883457/libcoreinst-bc0e880cd40833af.libcoreinst.354hlw07-cgu.0.rcgu.ll.xz

The function in question is _ZN11libcoreinst8download5tests22test_write_image_limit17hdd3aea2f720151ebE.

In particular, this block:

bb60.i:                                           ; preds = %bb59.i
  call void @llvm.lifetime.end.p0i8(i64 48, i8* nonnull %444), !dbg !182775, !noalias !181924
  call void @llvm.lifetime.end.p0i8(i64 16, i8* nonnull %370), !dbg !182775, !noalias !181924
  %452 = bitcast %"std::option::Option<std::vec::Vec<u8>>"* %signature.i to {}**, !dbg !182787
  store {}* null, {}** %452, align 8, !dbg !182787, !noalias !181924
  call void @llvm.dbg.value(metadata %"std::io::Error"* %err4.i, metadata !6256, metadata !DIExpression()), !dbg !182788
  call void @llvm.dbg.value(metadata %"std::io::Error"* %err4.i, metadata !6247, metadata !DIExpression()), !dbg !182790
  %453 = load i8, i8* %441, align 8, !dbg !182792, !range !20068, !noalias !181924
  %switch.i.i.i251 = icmp ult i8 %453, 2, !dbg !182792
  br i1 %switch.i.i.i251, label %bb61.i, label %bb2.i.i.i252, !dbg !182792

So that's basically a store null, then branch less than 2.

Here's "llc -O0", which looks ok:

.LBB387_319:                            # %bb60.i
	#DEBUG_VALUE: out <- [DW_OP_plus_uconst 4732] [$r15d+0]
	#DEBUG_VALUE: sources:self <- [DW_OP_plus_uconst 9056, DW_OP_stack_value] $r15d
	#DEBUG_VALUE: precious <- [DW_OP_LLVM_fragment 64 64] 11
	#DEBUG_VALUE: offset <- 4194304
	.loc	64 134 17               # src/source.rs:134:17
	lay	%r1, 8192(%r15)
	mvghi	2488(%r1), 0
.Ltmp56831:
	#DEBUG_VALUE: drop_in_place<std::io::error::Error>: <- [DW_OP_plus_uconst 10800, DW_OP_stack_value] $r15d
	#DEBUG_VALUE: drop_in_place<std::io::error::Repr>: <- [DW_OP_plus_uconst 10800, DW_OP_stack_value] $r15d
	.loc	15 175 1                # /rustc/beb5ae474d2835962ebdf7416bd1c9ad864fe101/library/core/src/ptr/mod.rs:175:1
	lg	%r1, 4000(%r15)         # 8-byte Folded Reload
	llc	%r0, 0(%r1)
	chi	%r0, 2
	jl	.LBB387_325
	j	.LBB387_320


Here's "llc -O1":

# %bb.193:                              # %bb60.i
	#DEBUG_VALUE: sources:self <- [DW_OP_plus_uconst 616, DW_OP_stack_value] $r15d
	#DEBUG_VALUE: offset <- 4194304
	#DEBUG_VALUE: precious <- [DW_OP_LLVM_fragment 64 64] 11
	.loc	9 175 1                 # /rustc/beb5ae474d2835962ebdf7416bd1c9ad864fe101/library/core/src/ptr/mod.rs:175:1
	cli	712(%r15), 2
.Ltmp39998:
	.loc	64 134 17               # src/source.rs:134:17
	llilh	%r1, 16
	agr	%r1, %r15
	mvghi	912(%r1), 0
.Ltmp39999:
	#DEBUG_VALUE: drop_in_place<std::io::error::Repr>: <- [DW_OP_plus_uconst 712, DW_OP_stack_value] $r15d
	#DEBUG_VALUE: drop_in_place<std::io::error::Error>: <- [DW_OP_plus_uconst 712, DW_OP_stack_value] $r15d
	.loc	9 175 1                 # /rustc/beb5ae474d2835962ebdf7416bd1c9ad864fe101/library/core/src/ptr/mod.rs:175:1
	jl	.LBB387_201

It looks like CLI will set the CC we want for the jump, but AGR also sets CC, so then JL has the wrong condition.
Comment 1 Josh Stone 2020-10-06 17:26:00 PDT
Created attachment 24026 [details]
Reproducer after llvm-extract
Comment 2 Josh Stone 2020-10-06 17:26:33 PDT
Created attachment 24027 [details]
Reproducer after llvm-reduce
Comment 3 Josh Stone 2020-10-06 17:28:44 PDT
I was trying to reduce with bugpoint before, but I just learned how to use llvm-extract and llvm-reduce instead, with much more success. Those are attached.

(It still left a lot of empty blocks, but I didn't try to clean that up.)
Comment 4 Josh Stone 2020-10-07 09:32:20 PDT
Stepping through IR dumps, here's where it added AGR that redefines $cc:

# *** IR Dump Before Prologue/Epilogue Insertion & Frame Finalization ***:
[...]
bb.2.bb59.i:
; predecessors: %bb.0
  successors: %bb.4(0x40000000), %bb.3(0x40000000); %bb.4(50.00%), %bb.3(50.00%)

  CLI %stack.2.err4.i, 0, 2, implicit-def $cc :: (dereferenceable load 1 from %ir.4, align 8, !noalias !7, !range !11)
  MVGHI %stack.0._42.i, 0, 0 :: (store 8 into %ir.5)
  BRC 14, 4, %bb.4, implicit killed $cc
  J %bb.3

# *** IR Dump After Prologue/Epilogue Insertion & Frame Finalization ***:
[...]
bb.2.bb59.i:
; predecessors: %bb.0
  successors: %bb.4(0x40000000), %bb.3(0x40000000); %bb.4(50.00%), %bb.3(50.00%)

  CLI $r15d, 176, 2, implicit-def $cc :: (dereferenceable load 1 from %ir.4, align 8, !noalias !7, !range !11)
  $r1d = LLILH 16
  $r1d = AGR killed $r1d(tied-def 0), $r15d, implicit-def $cc
  MVGHI killed $r1d, 192, 0 :: (store 8 into %ir.5)
  BRC 14, 4, %bb.4, implicit killed $cc
  J %bb.3
Comment 5 Ulrich Weigand 2020-10-07 09:42:57 PDT
This looks like a bug in SystemZRegisterInfo::eliminateFrameIndex.   This routine  should never introduce a CC clobber, but there is one code path that will emit an AGR:

          // Load the anchor address into a scratch register.
          unsigned LAOpcode = TII->getOpcodeForOffset(SystemZ::LA, HighOffset);
          if (LAOpcode)
            BuildMI(MBB, MI, DL, TII->get(LAOpcode),ScratchReg)
              .addReg(BasePtr).addImm(HighOffset).addReg(0);
          else {
            // Load the high offset into the scratch register and use it as
            // an index.
            TII->loadImmediate(MBB, MI, ScratchReg, HighOffset);
            BuildMI(MBB, MI, DL, TII->get(SystemZ::AGR),ScratchReg)
              .addReg(ScratchReg, RegState::Kill).addReg(BasePtr);
          }

This really ought to use LA in the second branch as well.   This only triggers when the offset is larger than 512K, which is probably rare and explains why we haven't really seen this before.

Jonas, can you have a look?  Should be straightforward to fix.
Comment 6 Jonas Paulsson 2020-10-08 04:27:56 PDT
Created attachment 24033 [details]
reduced testcase

Managed to reduce the test case a bit further...

Patch proposed at https://reviews.llvm.org/D89034.
Comment 7 Jonas Paulsson 2020-10-09 04:10:32 PDT
d851495