Avx #90

gwenn · 2017-10-01T15:09:23Z

I had trouble implementing _mm256_dp_ps, _mm_cmp_pd, _mm_cmp_sd (I will try again).
I submit this PR because I don't want to prevent you from implementing other avx intrinsics but I will probably continue to commit in this branch until it is merged.

With assert_instr: too many instructions in the disassembly

gwenn · 2017-10-03T15:54:34Z

I don't know why but _mm256_zeroupper and _mm256_zeroall generate "too many instructions in the disassembly" on windows platform:
https://ci.appveyor.com/project/rust-lang-libs/stdsimd/build/1.0.210
What should I do?

alexcrichton · 2017-10-05T16:20:42Z

Thanks for the PR! The error is indeed curious here. I wonder if the ABI for calling functions wrt SIMD registers is slightly different on Windows? In any case it looks like the right instructions are being generated, so want to up the instruction limit to maybe 30 instructions?

alexcrichton · 2017-10-05T16:20:56Z

src/x86/avx.rs

+/// lanes using the control in `imm8`.
+#[inline(always)]
+#[target_feature = "+avx"]
+//#[cfg_attr(test, assert_instr(vshufpd, imm8 = 0x0))] // FIXME


Should this be enabled now?

The doc says that vshufpd should be generated but clang does not assert that it is the case. Maybe the expanded code is wrong ?

match (imm8 >> 0) & 1 { 0 => match (imm8 >> 1) & 1 { 0 => match (imm8 >> 2) & 1 { 0 => match (imm8 >> 3) & 1 { 0 => simd_shuffle4(a, b, [0, 4, 2, 6]), _ => simd_shuffle4(a, b, [0, 4, 2, 7]), }, _ => match (imm8 >> 3) & 1 { 0 => simd_shuffle4(a, b, [0, 4, 3, 6]), _ => simd_shuffle4(a, b, [0, 4, 3, 7]), }, }, _ => match (imm8 >> 2) & 1 { 0 => match (imm8 >> 3) & 1 { 0 => simd_shuffle4(a, b, [0, 5, 2, 6]), _ => simd_shuffle4(a, b, [0, 5, 2, 7]), }, _ => match (imm8 >> 3) & 1 { 0 => simd_shuffle4(a, b, [0, 5, 3, 6]), _ => simd_shuffle4(a, b, [0, 5, 3, 7]), }, }, }, _ => match (imm8 >> 1) & 1 { 0 => match (imm8 >> 2) & 1 { 0 => match (imm8 >> 3) & 1 { 0 => simd_shuffle4(a, b, [1, 4, 2, 6]), _ => simd_shuffle4(a, b, [1, 4, 2, 7]), }, _ => match (imm8 >> 3) & 1 { 0 => simd_shuffle4(a, b, [1, 4, 3, 6]), _ => simd_shuffle4(a, b, [1, 4, 3, 7]), }, }, _ => match (imm8 >> 2) & 1 { 0 => match (imm8 >> 3) & 1 { 0 => simd_shuffle4(a, b, [1, 5, 2, 6]), _ => simd_shuffle4(a, b, [1, 5, 2, 7]), }, _ => match (imm8 >> 3) & 1 { 0 => simd_shuffle4(a, b, [1, 5, 3, 6]), _ => simd_shuffle4(a, b, [1, 5, 3, 7]), }, }, }, }

alexcrichton · 2017-10-05T16:21:35Z

src/x86/avx.rs

+/// Extract an 8-bit integer from `a`, selected with `imm8`.
+#[inline(always)]
+#[target_feature = "+avx"]
+pub unsafe fn _mm256_extract_epi8(a: i8x32, imm8: i32) -> i32 {


Should this and some of the extractions below assert for a particular instruction?

It appears that there is no instrunction specified for this intrinsic.

alexcrichton · 2017-10-05T16:28:42Z

I also closed #91 and #95 as they accidentally duplicated the work from here, but maybe the tests could be included?

vbarrielle

As I had some duplicated work in #95, I've tried to suggest places where some work could still be used :)

vbarrielle · 2017-10-05T16:37:09Z

src/x86/avx.rs

+}
+
+/// Horizontally add adjacent pairs of double-precision (64-bit) floating-point
+/// elements in `a` and `b`, and pack the results.


Is it possible to include documentation for the location of the results, as in #95 ? Eg:

/// In the result, sums of elements from `a` are returned in even locations, /// while sums of elements from `b` are returned in odd locations.

I will try to merge your changes in my branch.

vbarrielle · 2017-10-05T16:37:55Z

src/x86/avx.rs

+}
+
+/// Horizontally add adjacent pairs of single-precision (32-bit) floating-point
+/// elements in `a` and `b`, and pack the results.


Similarly, you could add

/// In the result, sums of elements from `a` are returned in locations of /// indices 0, 1, 4, 5; while sums of elements from `b` are in locations /// 2, 3, 6, 7.

vbarrielle · 2017-10-05T16:38:30Z

src/x86/avx.rs

+}
+
+/// Horizontally subtract adjacent pairs of single-precision (32-bit) floating-point
+/// elements in `a` and `b`, and pack the results.


/// In the result, sums of elements from `a` are returned in locations of /// indices 0, 1, 4, 5; while sums of elements from `b` are in locations /// 2, 3, 6, 7.

vbarrielle · 2017-10-05T16:39:15Z

src/x86/avx.rs

+}
+
+/// Horizontally subtract adjacent pairs of double-precision (64-bit) floating-point
+/// elements in `a` and `b`, and pack the results.


/// In the result, sums of elements from `a` are returned in even locations, /// while sums of elements from `b` are returned in odd locations.

alexcrichton · 2017-10-05T18:42:35Z

Thanks @gwenn!

gwenn added 30 commits September 30, 2017 17:18

avx: _mm256_andnot_pd, _mm256_andnot_ps

637722d

Merge remote-tracking branch 'origin/master' into avx

6ac8fc8

avx: _mm256_blendv_pd

6ed6424

avx: _mm256_blend_pd with no assert_instr

252c24d

With assert_instr: too many instructions in the disassembly

avx: _mm256_blendv_ps

632d14a

avx: _mm256_hadd_pd

121212c

avx: _mm256_hadd_ps

b97e734

avx: _mm256_hsub_pd

5857ef1

avx: _mm256_hsub_ps

062b59f

avx: _mm256_xor_pd

909b7df

avx: _mm256_xor_ps

527383d

avx: _mm256_cvtepi32_pd

d3239be

avx: _mm256_cvtepi32_ps

6912234

avx: _mm256_cvtpd_ps

41f4414

avx: _mm256_cvtps_epi32

5759982

avx: _mm256_cvtps_pd

f88f6e9

avx: _mm256_cvttpd_epi32

37ccd56

avx: _mm256_cvtpd_epi32

638b6ee

avx: replace simd_cast by proper instrunction

c5dbfd1

avx: _mm256_cvttps_epi32

77a3a50

avx: _mm256_extractf128_ps, _mm256_undefined_ps

116282c

avx: _mm256_extractf128_pd, _mm256_undefined_pd

170b974

avx: _mm256_extractf128_si256, _mm256_undefined_si256

d59b6c8

avx: _mm256_extract_epi8

83a0a18

avx: _mm256_extract_epi16

ec4223a

avx: _mm256_extract_epi32

39149fc

avx: _mm256_extract_epi64

7102c52

avx: _mm256_zeroall

e795c4c

avx: _mm256_zeroupper

9996a40

avx: _mm256_permutevar_ps

253df57

avx: _mm_permutevar_ps

49565e4

p32blo mentioned this pull request Oct 3, 2017

AVX: vandnpd, vandnps, vblendvpd and vblendvps #91

Closed

gwenn and others added 7 commits October 3, 2017 18:03

avx: replace simd_cast by as_*

f419cf2

avx: _mm256_permute_ps

df40a67

avx: _mm256_dp_ps

9d71953

avx: _mm256_shuffle_pd

6fa05de

avx: _mm256_shuffle_pd, wrong instruction generated

62c5804

implement _mm256_hadd_ps and _mm256_hadd_pd

1d86253

avx: implement _mm256_hsub_pd and _mm256_hsub_ps

a6b87ef

alexcrichton reviewed Oct 5, 2017

View reviewed changes

alexcrichton mentioned this pull request Oct 5, 2017

avx: implement _mm256_{hadd,hsub}_{ps,pd} #95

Closed

vbarrielle reviewed Oct 5, 2017

View reviewed changes

gwenn added 3 commits October 5, 2017 19:31

Merge remote-tracking branch 'origin/master' into avx

297fc4f

assert_instr: raise the limit up to 30 instructions

646e2e6

Merge remote-tracking branch 'vbarrielle/master' into avx

2cae0b1

alexcrichton merged commit f3f5a9c into rust-lang:master Oct 5, 2017

gwenn deleted the avx branch October 5, 2017 18:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avx #90

Avx #90

Uh oh!

gwenn commented Oct 1, 2017 •

edited

Loading

Uh oh!

gwenn commented Oct 3, 2017

Uh oh!

alexcrichton commented Oct 5, 2017

Uh oh!

alexcrichton Oct 5, 2017

Uh oh!

gwenn Oct 5, 2017

Uh oh!

alexcrichton Oct 5, 2017

Uh oh!

gwenn Oct 5, 2017

Uh oh!

alexcrichton commented Oct 5, 2017

Uh oh!

vbarrielle left a comment

Uh oh!

vbarrielle Oct 5, 2017

Uh oh!

gwenn Oct 5, 2017

Uh oh!

vbarrielle Oct 5, 2017

Uh oh!

vbarrielle Oct 5, 2017

Uh oh!

vbarrielle Oct 5, 2017

Uh oh!

alexcrichton commented Oct 5, 2017

Uh oh!

Uh oh!

Avx #90

Avx #90

Uh oh!

Conversation

gwenn commented Oct 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gwenn commented Oct 3, 2017

Uh oh!

alexcrichton commented Oct 5, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexcrichton commented Oct 5, 2017

Uh oh!

vbarrielle left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexcrichton commented Oct 5, 2017

Uh oh!

Uh oh!

gwenn commented Oct 1, 2017 •

edited

Loading