38527 – backend widens v2f32 vectors only to do more work in software operations later for `sin`, `cos` etc.

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 38527 - backend widens v2f32 vectors only to do more work in software operations later for `sin`, `cos` etc.

Summary: backend widens v2f32 vectors only to do more work in software operations late...

Status:	RESOLVED FIXED

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Common Code Generator Code (show other bugs)
Version:	trunk
Hardware:	PC Linux

Importance:	P enhancement
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2018-08-11 02:07 PDT by simonas+llvm.org
Modified:	2018-08-22 16:04 PDT (History)
CC List:	7 users (show)

See Also:
Fixed By Commit(s):	r340469

Attachments
Simple fsin call on v2f32 (1.55 KB, text/plain) 2018-08-11 02:07 PDT, simonas+llvm.org	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description simonas+llvm.org 2018-08-11 02:07:41 PDT

Created attachment 20678 [details]
Simple fsin call on v2f32

Given the attached LLVM IR, running llc on it as such

> llc winsimd.ll -O3 -filetype=asm -o -

Will result in following assembly being generated


	movq	%rdi, %rbx
	movsd	(%rdi), %xmm0           # xmm0 = mem[0],zero
	movaps	%xmm0, (%rsp)           # 16-byte Spill
	callq	sinf
	movaps	%xmm0, 16(%rsp)         # 16-byte Spill
	movaps	(%rsp), %xmm0           # 16-byte Reload
	callq	sinf
	movaps	16(%rsp), %xmm1         # 16-byte Reload
	unpcklps	%xmm0, %xmm1    # xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
	movaps	%xmm1, 16(%rsp)         # 16-byte Spill
	movaps	(%rsp), %xmm0           # 16-byte Reload
	callq	sinf
	movaps	%xmm0, 32(%rsp)         # 16-byte Spill
	pshufd	$229, (%rsp), %xmm0     # 16-byte Folded Reload
                                        # xmm0 = mem[1,1,2,3]
	callq	sinf
	movdqa	32(%rsp), %xmm1         # 16-byte Reload
	punpckldq	%xmm0, %xmm1    # xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
        <snip>

Note how this generates 4 distinct calls to `sinf`, although 2 should be sufficient. This happens, because the backend legalises the operation on v2f32 to v4f32, as indicated by `-debug` output from `llc`. However, the backend then neglects to take care to not calculate `sinf` on the `undef` components of the vector, resulting in extra work.

Comment 1 simonas+llvm.org 2018-08-11 02:08:16 PDT

Same happens for cos, and possibly other operations.

Comment 2 simonas+llvm.org 2018-08-11 02:13:51 PDT

This bug is present as far back as 5.0.

Comment 3 Sanjay Patel 2018-08-12 15:04:57 PDT

Minimal IR test:
declare <2 x float> @llvm.sin.v2f32(<2 x float>)
  
define <2 x float> @sinv2f32(<2 x float> %x) nounwind {
  %r = call <2 x float> @llvm.sin.v2f32(<2 x float> %x)
  ret <2 x float> %r
}


cc'ing Craig. I don't know what legalization strategy should be used here, but presumably the problem is the same for all of the FP nodes corresponding to libcalls when the vector type is too narrow.

We're currently using TargetLowering::TypeWidenVector to go from v2f32 to v4f32, but after that's done, it looks like we've lost the undef knowledge about the unused elements. 

Should x86 be declaring FP libcall nodes with narrower vector types as custom to allow scalarization instead of widening via DAGTypeLegalizer::CustomWidenLowerNode()?

Comment 4 Craig Topper 2018-08-14 15:12:47 PDT

The sin/cos library functions set errno so I wonder if we should be legalizing these like trapping opcodes?

Comment 5 Sanjay Patel 2018-08-15 09:40:08 PDT

(In reply to Craig Topper from comment #4)
> The sin/cos library functions set errno so I wonder if we should be
> legalizing these like trapping opcodes?

That's the right intent, but I don't think that's a valid motivation. The libcalls may or may not set errno depending on platform.

Something strange in the existing code - why are we treating the regular FP binops (ISD::FADD, etc.) as potentially trapping?

I added tests for the problem here:
https://reviews.llvm.org/rL339790

Comment 6 Sanjay Patel 2018-08-15 10:55:24 PDT

Possible solution:
https://reviews.llvm.org/D50791

Comment 7 Sanjay Patel 2018-08-22 16:04:20 PDT

Changing bug title and component - the problem wasn't x86 specific.
Hopefully fixed with:
https://reviews.llvm.org/rL340469