Created attachment 19463 [details] clang command line I am building the Linux kernel for PowerPC, cross compiling from x86. clang version 5.0.0-4 (tags/RELEASE_500/final) The bug reproduces with clang version 6.0.0 (trunk 318937). CC [M] drivers/net/ethernet/mellanox/mlx4/resource_tracker.o fatal error: error in backend: Cannot select: 0x55ce75ede940: ch = PPCISD::STBRX<ST6[%177](align=1)> 0x55ce75f74b28, 0x55ce75edf090, 0x55ce75ed94c8, ValueType:ch:i48 0x55ce75edf090: i32 = truncate 0x55ce75e9b398 0x55ce75e9b398: i64 = srl 0x55ce75ed8ca8, Constant:i32<16> 0x55ce75ed8ca8: i64,ch = CopyFromReg 0x55ce744f44e8, Register:i64 %vreg26 0x55ce75f9bd08: i64 = Register %vreg26 0x55ce75f74238: i32 = Constant<16> 0x55ce75ed94c8: i64 = add 0x55ce75edeb48, Constant:i64<10> 0x55ce75edeb48: i64,ch = CopyFromReg 0x55ce744f44e8, Register:i64 %vreg20 0x55ce75f74f38: i64 = Register %vreg20 0x55ce75e9b330: i64 = Constant<10> In function: mlx4_QP_FLOW_STEERING_ATTACH_wrapper clang: error: clang frontend command failed with exit code 70 (use -v to see invocation) clang version 5.0.0-4 (tags/RELEASE_500/final) Target: powerpc64le--linux-gnu Thread model: posix
Created attachment 19464 [details] compressed preprocessed source
This looks very similar to the bug we hit in https://bugzilla.redhat.com/show_bug.cgi?id=1554349
It looks like the regression was introduced by the following commit (r296811): commit f2f076498022ec1ab5abadb7349f62b1dbc2ac0d Author: Guozhi Wei <carrot@google.com> Date: Thu Mar 2 21:07:59 2017 +0000 [PPC] Fix code generation for bswap(int32) followed by store16 This patch fixes pr32063. Current code in PPCTargetLowering::PerformDAGCombine can transform bswap store into a single PPCISD::STBRX instruction. but it doesn't consider the case that the operand size of bswap may be larger than store size. When it occurs, we need 2 modifications, 1 For the last operand of PPCISD::STBRX, we should not use DAG.getValueType(N->getOperand(1).getValueType()), instead we should use cast<StoreSDNode>(N)->getMemoryVT(). 2 Before PPCISD::STBRX, we need to shift the original operand of bswap to the right side. Differential Revision: https://reviews.llvm.org/D30362
Joel, as a quick workaround you could try to build with Clang 4.0 or with trunk and r296811 reverted.
Adding Guozhi on CC.
(gdb) p DAG.dump() SelectionDAG has 20 nodes: t0: ch = EntryToken t10: i64,ch = CopyFromReg t0, Register:i64 %6 t29: i32,ch = load<LD2[@my_data](tbaa=<0x555563a1de68>)(dereferenceable), anyext from i16> t0, GlobalAddress:i64<i16* @my_data> 0, undef:i64 t20: ch = CopyToReg t0, Register:i32 %15, Constant:i32<0> t2: i64,ch = CopyFromReg t0, Register:i64 %4 t3: i64 = bswap t2 t15: i64 = add nuw t10, Constant:i64<10> t32: ch = store<ST6[%be_mac.0.arraydecay.sroa_cast](align=1), trunc to i48> t0, t3, t15, undef:i64 t12: i64 = add nuw t10, Constant:i64<2> t28: ch = PPCISD::STBRX<ST2[%id](align=1)(tbaa=<0x555563cd7ad8>)> t0, t29, t12, ValueType:ch:i16 t31: ch = TokenFactor t20, t32, t29:1, t28 (gdb) p mVT $43 = {V = {SimpleTy = llvm::MVT::INVALID_SIMPLE_VALUE_TYPE}, LLVMTy = 0x555563e59c60} (gdb) p Op1VT $44 = {V = {SimpleTy = llvm::MVT::i64}, LLVMTy = 0x0} The problem is t3: i64 = bswap t2 t32: ch = store<ST6[%be_mac.0.arraydecay.sroa_cast](align=1), trunc to i48> t0, t3, t15, undef:i64 Note MVT is i48, STBRX doesn't support this type, even native store instructions don't support it, but simple store instruction can be expanded automatically, STBRX can't, and causes crash. Without the patch D30362, llvm doesn't crash, but still generates wrong code. It will write 8 bytes memory instead of 6 bytes, if the afterwards 2 bytes have useful data, they will be destroyed. The correct fix should detect if the MVT.isExtended() then return early.
Fixed by r327651.