Created attachment 20678 [details] Simple fsin call on v2f32 Given the attached LLVM IR, running llc on it as such > llc winsimd.ll -O3 -filetype=asm -o - Will result in following assembly being generated movq %rdi, %rbx movsd (%rdi), %xmm0 # xmm0 = mem[0],zero movaps %xmm0, (%rsp) # 16-byte Spill callq sinf movaps %xmm0, 16(%rsp) # 16-byte Spill movaps (%rsp), %xmm0 # 16-byte Reload callq sinf movaps 16(%rsp), %xmm1 # 16-byte Reload unpcklps %xmm0, %xmm1 # xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1] movaps %xmm1, 16(%rsp) # 16-byte Spill movaps (%rsp), %xmm0 # 16-byte Reload callq sinf movaps %xmm0, 32(%rsp) # 16-byte Spill pshufd $229, (%rsp), %xmm0 # 16-byte Folded Reload # xmm0 = mem[1,1,2,3] callq sinf movdqa 32(%rsp), %xmm1 # 16-byte Reload punpckldq %xmm0, %xmm1 # xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1] <snip> Note how this generates 4 distinct calls to `sinf`, although 2 should be sufficient. This happens, because the backend legalises the operation on v2f32 to v4f32, as indicated by `-debug` output from `llc`. However, the backend then neglects to take care to not calculate `sinf` on the `undef` components of the vector, resulting in extra work.
Same happens for cos, and possibly other operations.
This bug is present as far back as 5.0.
Minimal IR test: declare <2 x float> @llvm.sin.v2f32(<2 x float>) define <2 x float> @sinv2f32(<2 x float> %x) nounwind { %r = call <2 x float> @llvm.sin.v2f32(<2 x float> %x) ret <2 x float> %r } cc'ing Craig. I don't know what legalization strategy should be used here, but presumably the problem is the same for all of the FP nodes corresponding to libcalls when the vector type is too narrow. We're currently using TargetLowering::TypeWidenVector to go from v2f32 to v4f32, but after that's done, it looks like we've lost the undef knowledge about the unused elements. Should x86 be declaring FP libcall nodes with narrower vector types as custom to allow scalarization instead of widening via DAGTypeLegalizer::CustomWidenLowerNode()?
The sin/cos library functions set errno so I wonder if we should be legalizing these like trapping opcodes?
(In reply to Craig Topper from comment #4) > The sin/cos library functions set errno so I wonder if we should be > legalizing these like trapping opcodes? That's the right intent, but I don't think that's a valid motivation. The libcalls may or may not set errno depending on platform. Something strange in the existing code - why are we treating the regular FP binops (ISD::FADD, etc.) as potentially trapping? I added tests for the problem here: https://reviews.llvm.org/rL339790
Possible solution: https://reviews.llvm.org/D50791
Changing bug title and component - the problem wasn't x86 specific. Hopefully fixed with: https://reviews.llvm.org/rL340469