47006 – 128 bit division generates __udivti3 and __umodti3 instead of calling __udivmodti4 once

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 47006 - 128 bit division generates __udivti3 and __umodti3 instead of calling __udivmodti4 once

Summary: 128 bit division generates __udivti3 and __umodti3 instead of calling __udivm...

Status:	NEW

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Backend: X86 (show other bugs)
Version:	trunk
Hardware:	PC Linux

Importance:	P enhancement
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2020-08-05 13:58 PDT by Danila Kutenin
Modified:	2020-09-12 01:09 PDT (History)
CC List:	5 users (show)

See Also:
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Danila Kutenin 2020-08-05 13:58:40 PDT

128 bit division generates __udivti3 and __umodti3 instead of calling __udivmodti4 once

This happens because of DivRemPairs pass and lack of instrumentation in the backend.

; Unsigned 128-bit division
define i128 @udiv128(i128 %a, i128 %b) {
  %quot = udiv i128 %a, %b
  %rem = urem i128 %a, %b
  %sum = add i128 %quot, %rem
  ret i128 %sum
}

=>

https://gcc.godbolt.org/z/PorhMz

Will call __udivti3 on LP64 but libgcc and compiler-rt have __udivmodti4 which computes the quotient and the remainder at the same time. This particular hurts x86 as divq instruction is presented. Other backends can also benefit from this too

Comment 1 Sanjay Patel 2020-08-06 05:51:56 PDT

The DivRemPairs pass turns the IR into this:

define i128 @udiv128(i128 %a, i128 %b) {
  %a.frozen = freeze i128 %a
  %b.frozen = freeze i128 %b
  %quot = udiv i128 %a.frozen, %b.frozen
  %1 = mul i128 %quot, %b.frozen
  %rem.decomposed = sub i128 %a.frozen, %1
  %sum = add i128 %rem.decomposed, %quot
  ret i128 %sum
}


That's based on the TTI call:
bool X86TTIImpl::hasDivRemOp(Type *DataType, bool IsSigned)
...returning false for the 128-bit type.

But even if I hack that to return 'true', I see calls:
 	callq	___divti3
	callq	___modti3

Where in optimization do we recognize that the target supports "__udivmodti4" and convert to that call?

Comment 2 Danila Kutenin 2020-08-06 05:58:20 PDT

I believe currently we don't recognize __udivmodti4 anywhere, in RuntimeLibcalls.def we don't instrument them at all

HANDLE_LIBCALL(SDIVREM_I8, nullptr)
HANDLE_LIBCALL(SDIVREM_I16, nullptr)
HANDLE_LIBCALL(SDIVREM_I32, nullptr)
HANDLE_LIBCALL(SDIVREM_I64, nullptr)
HANDLE_LIBCALL(SDIVREM_I128, nullptr)
HANDLE_LIBCALL(UDIVREM_I8, nullptr)
HANDLE_LIBCALL(UDIVREM_I16, nullptr)
HANDLE_LIBCALL(UDIVREM_I32, nullptr)
HANDLE_LIBCALL(UDIVREM_I64, nullptr)
HANDLE_LIBCALL(UDIVREM_I128, nullptr)


__udivmodti4 should be presented on every LP64 bit platform, I believe

Comment 3 Danila Kutenin 2020-08-06 06:05:39 PDT

Also DivRem is combined in DAGCombiner::useDivRem(SDNode *Node) but

 if (!TLI.isTypeLegal(VT) && !TLI.isOperationCustom(DivRemOpc, VT))
    return SDValue();


returns false for 128 bit integers