128 bit division generates __udivti3 and __umodti3 instead of calling __udivmodti4 once This happens because of DivRemPairs pass and lack of instrumentation in the backend. ; Unsigned 128-bit division define i128 @udiv128(i128 %a, i128 %b) { %quot = udiv i128 %a, %b %rem = urem i128 %a, %b %sum = add i128 %quot, %rem ret i128 %sum } => https://gcc.godbolt.org/z/PorhMz Will call __udivti3 on LP64 but libgcc and compiler-rt have __udivmodti4 which computes the quotient and the remainder at the same time. This particular hurts x86 as divq instruction is presented. Other backends can also benefit from this too
The DivRemPairs pass turns the IR into this: define i128 @udiv128(i128 %a, i128 %b) { %a.frozen = freeze i128 %a %b.frozen = freeze i128 %b %quot = udiv i128 %a.frozen, %b.frozen %1 = mul i128 %quot, %b.frozen %rem.decomposed = sub i128 %a.frozen, %1 %sum = add i128 %rem.decomposed, %quot ret i128 %sum } That's based on the TTI call: bool X86TTIImpl::hasDivRemOp(Type *DataType, bool IsSigned) ...returning false for the 128-bit type. But even if I hack that to return 'true', I see calls: callq ___divti3 callq ___modti3 Where in optimization do we recognize that the target supports "__udivmodti4" and convert to that call?
I believe currently we don't recognize __udivmodti4 anywhere, in RuntimeLibcalls.def we don't instrument them at all HANDLE_LIBCALL(SDIVREM_I8, nullptr) HANDLE_LIBCALL(SDIVREM_I16, nullptr) HANDLE_LIBCALL(SDIVREM_I32, nullptr) HANDLE_LIBCALL(SDIVREM_I64, nullptr) HANDLE_LIBCALL(SDIVREM_I128, nullptr) HANDLE_LIBCALL(UDIVREM_I8, nullptr) HANDLE_LIBCALL(UDIVREM_I16, nullptr) HANDLE_LIBCALL(UDIVREM_I32, nullptr) HANDLE_LIBCALL(UDIVREM_I64, nullptr) HANDLE_LIBCALL(UDIVREM_I128, nullptr) __udivmodti4 should be presented on every LP64 bit platform, I believe
Also DivRem is combined in DAGCombiner::useDivRem(SDNode *Node) but if (!TLI.isTypeLegal(VT) && !TLI.isOperationCustom(DivRemOpc, VT)) return SDValue(); returns false for 128 bit integers