LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 47006 - 128 bit division generates __udivti3 and __umodti3 instead of calling __udivmodti4 once
Summary: 128 bit division generates __udivti3 and __umodti3 instead of calling __udivm...
Status: NEW
Alias: None
Product: libraries
Classification: Unclassified
Component: Backend: X86 (show other bugs)
Version: trunk
Hardware: PC Linux
: P enhancement
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-08-05 13:58 PDT by Danila Kutenin
Modified: 2020-09-12 01:09 PDT (History)
5 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Danila Kutenin 2020-08-05 13:58:40 PDT
128 bit division generates __udivti3 and __umodti3 instead of calling __udivmodti4 once

This happens because of DivRemPairs pass and lack of instrumentation in the backend.

; Unsigned 128-bit division
define i128 @udiv128(i128 %a, i128 %b) {
  %quot = udiv i128 %a, %b
  %rem = urem i128 %a, %b
  %sum = add i128 %quot, %rem
  ret i128 %sum
}

=>

https://gcc.godbolt.org/z/PorhMz

Will call __udivti3 on LP64 but libgcc and compiler-rt have __udivmodti4 which computes the quotient and the remainder at the same time. This particular hurts x86 as divq instruction is presented. Other backends can also benefit from this too
Comment 1 Sanjay Patel 2020-08-06 05:51:56 PDT
The DivRemPairs pass turns the IR into this:

define i128 @udiv128(i128 %a, i128 %b) {
  %a.frozen = freeze i128 %a
  %b.frozen = freeze i128 %b
  %quot = udiv i128 %a.frozen, %b.frozen
  %1 = mul i128 %quot, %b.frozen
  %rem.decomposed = sub i128 %a.frozen, %1
  %sum = add i128 %rem.decomposed, %quot
  ret i128 %sum
}


That's based on the TTI call:
bool X86TTIImpl::hasDivRemOp(Type *DataType, bool IsSigned)
...returning false for the 128-bit type.

But even if I hack that to return 'true', I see calls:
 	callq	___divti3
	callq	___modti3

Where in optimization do we recognize that the target supports "__udivmodti4" and convert to that call?
Comment 2 Danila Kutenin 2020-08-06 05:58:20 PDT
I believe currently we don't recognize __udivmodti4 anywhere, in RuntimeLibcalls.def we don't instrument them at all

HANDLE_LIBCALL(SDIVREM_I8, nullptr)
HANDLE_LIBCALL(SDIVREM_I16, nullptr)
HANDLE_LIBCALL(SDIVREM_I32, nullptr)
HANDLE_LIBCALL(SDIVREM_I64, nullptr)
HANDLE_LIBCALL(SDIVREM_I128, nullptr)
HANDLE_LIBCALL(UDIVREM_I8, nullptr)
HANDLE_LIBCALL(UDIVREM_I16, nullptr)
HANDLE_LIBCALL(UDIVREM_I32, nullptr)
HANDLE_LIBCALL(UDIVREM_I64, nullptr)
HANDLE_LIBCALL(UDIVREM_I128, nullptr)


__udivmodti4 should be presented on every LP64 bit platform, I believe
Comment 3 Danila Kutenin 2020-08-06 06:05:39 PDT
Also DivRem is combined in DAGCombiner::useDivRem(SDNode *Node) but

 if (!TLI.isTypeLegal(VT) && !TLI.isOperationCustom(DivRemOpc, VT))
    return SDValue();


returns false for 128 bit integers