LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 38527 - backend widens v2f32 vectors only to do more work in software operations later for `sin`, `cos` etc.
Summary: backend widens v2f32 vectors only to do more work in software operations late...
Status: RESOLVED FIXED
Alias: None
Product: libraries
Classification: Unclassified
Component: Common Code Generator Code (show other bugs)
Version: trunk
Hardware: PC Linux
: P enhancement
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-11 02:07 PDT by simonas+llvm.org
Modified: 2018-08-22 16:04 PDT (History)
7 users (show)

See Also:
Fixed By Commit(s): r340469


Attachments
Simple fsin call on v2f32 (1.55 KB, text/plain)
2018-08-11 02:07 PDT, simonas+llvm.org
Details

Note You need to log in before you can comment on or make changes to this bug.
Description simonas+llvm.org 2018-08-11 02:07:41 PDT
Created attachment 20678 [details]
Simple fsin call on v2f32

Given the attached LLVM IR, running llc on it as such

> llc winsimd.ll -O3 -filetype=asm -o -

Will result in following assembly being generated


	movq	%rdi, %rbx
	movsd	(%rdi), %xmm0           # xmm0 = mem[0],zero
	movaps	%xmm0, (%rsp)           # 16-byte Spill
	callq	sinf
	movaps	%xmm0, 16(%rsp)         # 16-byte Spill
	movaps	(%rsp), %xmm0           # 16-byte Reload
	callq	sinf
	movaps	16(%rsp), %xmm1         # 16-byte Reload
	unpcklps	%xmm0, %xmm1    # xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
	movaps	%xmm1, 16(%rsp)         # 16-byte Spill
	movaps	(%rsp), %xmm0           # 16-byte Reload
	callq	sinf
	movaps	%xmm0, 32(%rsp)         # 16-byte Spill
	pshufd	$229, (%rsp), %xmm0     # 16-byte Folded Reload
                                        # xmm0 = mem[1,1,2,3]
	callq	sinf
	movdqa	32(%rsp), %xmm1         # 16-byte Reload
	punpckldq	%xmm0, %xmm1    # xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
        <snip>

Note how this generates 4 distinct calls to `sinf`, although 2 should be sufficient. This happens, because the backend legalises the operation on v2f32 to v4f32, as indicated by `-debug` output from `llc`. However, the backend then neglects to take care to not calculate `sinf` on the `undef` components of the vector, resulting in extra work.
Comment 1 simonas+llvm.org 2018-08-11 02:08:16 PDT
Same happens for cos, and possibly other operations.
Comment 2 simonas+llvm.org 2018-08-11 02:13:51 PDT
This bug is present as far back as 5.0.
Comment 3 Sanjay Patel 2018-08-12 15:04:57 PDT
Minimal IR test:
declare <2 x float> @llvm.sin.v2f32(<2 x float>)
  
define <2 x float> @sinv2f32(<2 x float> %x) nounwind {
  %r = call <2 x float> @llvm.sin.v2f32(<2 x float> %x)
  ret <2 x float> %r
}


cc'ing Craig. I don't know what legalization strategy should be used here, but presumably the problem is the same for all of the FP nodes corresponding to libcalls when the vector type is too narrow.

We're currently using TargetLowering::TypeWidenVector to go from v2f32 to v4f32, but after that's done, it looks like we've lost the undef knowledge about the unused elements. 

Should x86 be declaring FP libcall nodes with narrower vector types as custom to allow scalarization instead of widening via DAGTypeLegalizer::CustomWidenLowerNode()?
Comment 4 Craig Topper 2018-08-14 15:12:47 PDT
The sin/cos library functions set errno so I wonder if we should be legalizing these like trapping opcodes?
Comment 5 Sanjay Patel 2018-08-15 09:40:08 PDT
(In reply to Craig Topper from comment #4)
> The sin/cos library functions set errno so I wonder if we should be
> legalizing these like trapping opcodes?

That's the right intent, but I don't think that's a valid motivation. The libcalls may or may not set errno depending on platform.

Something strange in the existing code - why are we treating the regular FP binops (ISD::FADD, etc.) as potentially trapping?

I added tests for the problem here:
https://reviews.llvm.org/rL339790
Comment 6 Sanjay Patel 2018-08-15 10:55:24 PDT
Possible solution:
https://reviews.llvm.org/D50791
Comment 7 Sanjay Patel 2018-08-22 16:04:20 PDT
Changing bug title and component - the problem wasn't x86 specific.
Hopefully fixed with:
https://reviews.llvm.org/rL340469