-
Notifications
You must be signed in to change notification settings - Fork 290
Avx #90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avx #90
Conversation
With assert_instr: too many instructions in the disassembly
I don't know why but |
Thanks for the PR! The error is indeed curious here. I wonder if the ABI for calling functions wrt SIMD registers is slightly different on Windows? In any case it looks like the right instructions are being generated, so want to up the instruction limit to maybe 30 instructions? |
/// lanes using the control in `imm8`. | ||
#[inline(always)] | ||
#[target_feature = "+avx"] | ||
//#[cfg_attr(test, assert_instr(vshufpd, imm8 = 0x0))] // FIXME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be enabled now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc says that vshufpd
should be generated but clang does not assert that it is the case. Maybe the expanded code is wrong ?
match (imm8 >> 0) & 1 {
0 =>
match (imm8 >> 1) & 1 {
0 =>
match (imm8 >> 2) & 1 {
0 =>
match (imm8 >> 3) & 1 {
0 => simd_shuffle4(a, b, [0, 4, 2, 6]),
_ => simd_shuffle4(a, b, [0, 4, 2, 7]),
},
_ =>
match (imm8 >> 3) & 1 {
0 => simd_shuffle4(a, b, [0, 4, 3, 6]),
_ => simd_shuffle4(a, b, [0, 4, 3, 7]),
},
},
_ =>
match (imm8 >> 2) & 1 {
0 =>
match (imm8 >> 3) & 1 {
0 => simd_shuffle4(a, b, [0, 5, 2, 6]),
_ => simd_shuffle4(a, b, [0, 5, 2, 7]),
},
_ =>
match (imm8 >> 3) & 1 {
0 => simd_shuffle4(a, b, [0, 5, 3, 6]),
_ => simd_shuffle4(a, b, [0, 5, 3, 7]),
},
},
},
_ =>
match (imm8 >> 1) & 1 {
0 =>
match (imm8 >> 2) & 1 {
0 =>
match (imm8 >> 3) & 1 {
0 => simd_shuffle4(a, b, [1, 4, 2, 6]),
_ => simd_shuffle4(a, b, [1, 4, 2, 7]),
},
_ =>
match (imm8 >> 3) & 1 {
0 => simd_shuffle4(a, b, [1, 4, 3, 6]),
_ => simd_shuffle4(a, b, [1, 4, 3, 7]),
},
},
_ =>
match (imm8 >> 2) & 1 {
0 =>
match (imm8 >> 3) & 1 {
0 => simd_shuffle4(a, b, [1, 5, 2, 6]),
_ => simd_shuffle4(a, b, [1, 5, 2, 7]),
},
_ =>
match (imm8 >> 3) & 1 {
0 => simd_shuffle4(a, b, [1, 5, 3, 6]),
_ => simd_shuffle4(a, b, [1, 5, 3, 7]),
},
},
},
}
/// Extract an 8-bit integer from `a`, selected with `imm8`. | ||
#[inline(always)] | ||
#[target_feature = "+avx"] | ||
pub unsafe fn _mm256_extract_epi8(a: i8x32, imm8: i32) -> i32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this and some of the extractions below assert for a particular instruction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that there is no instrunction specified for this intrinsic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I had some duplicated work in #95, I've tried to suggest places where some work could still be used :)
src/x86/avx.rs
Outdated
} | ||
|
||
/// Horizontally add adjacent pairs of double-precision (64-bit) floating-point | ||
/// elements in `a` and `b`, and pack the results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to include documentation for the location of the results, as in #95 ? Eg:
/// In the result, sums of elements from `a` are returned in even locations,
/// while sums of elements from `b` are returned in odd locations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to merge your changes in my branch.
src/x86/avx.rs
Outdated
} | ||
|
||
/// Horizontally add adjacent pairs of single-precision (32-bit) floating-point | ||
/// elements in `a` and `b`, and pack the results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, you could add
/// In the result, sums of elements from `a` are returned in locations of
/// indices 0, 1, 4, 5; while sums of elements from `b` are in locations
/// 2, 3, 6, 7.
src/x86/avx.rs
Outdated
} | ||
|
||
/// Horizontally subtract adjacent pairs of single-precision (32-bit) floating-point | ||
/// elements in `a` and `b`, and pack the results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// In the result, sums of elements from `a` are returned in locations of
/// indices 0, 1, 4, 5; while sums of elements from `b` are in locations
/// 2, 3, 6, 7.
src/x86/avx.rs
Outdated
} | ||
|
||
/// Horizontally subtract adjacent pairs of double-precision (64-bit) floating-point | ||
/// elements in `a` and `b`, and pack the results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// In the result, sums of elements from `a` are returned in even locations,
/// while sums of elements from `b` are returned in odd locations.
Thanks @gwenn! |
I had trouble implementing
_mm256_dp_ps
,_mm_cmp_pd
,_mm_cmp_sd
(I will try again).I submit this PR because I don't want to prevent you from implementing other avx intrinsics but I will probably continue to commit in this branch until it is merged.