Skip to content

Avx #90

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 41 commits into from
Oct 5, 2017
Merged

Avx #90

merged 41 commits into from
Oct 5, 2017

Conversation

gwenn
Copy link
Contributor

@gwenn gwenn commented Oct 1, 2017

I had trouble implementing _mm256_dp_ps, _mm_cmp_pd, _mm_cmp_sd (I will try again).
I submit this PR because I don't want to prevent you from implementing other avx intrinsics but I will probably continue to commit in this branch until it is merged.

@gwenn
Copy link
Contributor Author

gwenn commented Oct 3, 2017

I don't know why but _mm256_zeroupper and _mm256_zeroall generate "too many instructions in the disassembly" on windows platform:
https://ci.appveyor.com/project/rust-lang-libs/stdsimd/build/1.0.210
What should I do?

@alexcrichton
Copy link
Member

Thanks for the PR! The error is indeed curious here. I wonder if the ABI for calling functions wrt SIMD registers is slightly different on Windows? In any case it looks like the right instructions are being generated, so want to up the instruction limit to maybe 30 instructions?

/// lanes using the control in `imm8`.
#[inline(always)]
#[target_feature = "+avx"]
//#[cfg_attr(test, assert_instr(vshufpd, imm8 = 0x0))] // FIXME
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be enabled now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc says that vshufpd should be generated but clang does not assert that it is the case. Maybe the expanded code is wrong ?

           match (imm8 >> 0) & 1 {
                0 =>
                match (imm8 >> 1) & 1 {
                    0 =>
                    match (imm8 >> 2) & 1 {
                        0 =>
                        match (imm8 >> 3) & 1 {
                            0 => simd_shuffle4(a, b, [0, 4, 2, 6]),
                            _ => simd_shuffle4(a, b, [0, 4, 2, 7]),
                        },
                        _ =>
                        match (imm8 >> 3) & 1 {
                            0 => simd_shuffle4(a, b, [0, 4, 3, 6]),
                            _ => simd_shuffle4(a, b, [0, 4, 3, 7]),
                        },
                    },
                    _ =>
                    match (imm8 >> 2) & 1 {
                        0 =>
                        match (imm8 >> 3) & 1 {
                            0 => simd_shuffle4(a, b, [0, 5, 2, 6]),
                            _ => simd_shuffle4(a, b, [0, 5, 2, 7]),
                        },
                        _ =>
                        match (imm8 >> 3) & 1 {
                            0 => simd_shuffle4(a, b, [0, 5, 3, 6]),
                            _ => simd_shuffle4(a, b, [0, 5, 3, 7]),
                        },
                    },
                },
                _ =>
                match (imm8 >> 1) & 1 {
                    0 =>
                    match (imm8 >> 2) & 1 {
                        0 =>
                        match (imm8 >> 3) & 1 {
                            0 => simd_shuffle4(a, b, [1, 4, 2, 6]),
                            _ => simd_shuffle4(a, b, [1, 4, 2, 7]),
                        },
                        _ =>
                        match (imm8 >> 3) & 1 {
                            0 => simd_shuffle4(a, b, [1, 4, 3, 6]),
                            _ => simd_shuffle4(a, b, [1, 4, 3, 7]),
                        },
                    },
                    _ =>
                    match (imm8 >> 2) & 1 {
                        0 =>
                        match (imm8 >> 3) & 1 {
                            0 => simd_shuffle4(a, b, [1, 5, 2, 6]),
                            _ => simd_shuffle4(a, b, [1, 5, 2, 7]),
                        },
                        _ =>
                        match (imm8 >> 3) & 1 {
                            0 => simd_shuffle4(a, b, [1, 5, 3, 6]),
                            _ => simd_shuffle4(a, b, [1, 5, 3, 7]),
                        },
                    },
                },
            }

/// Extract an 8-bit integer from `a`, selected with `imm8`.
#[inline(always)]
#[target_feature = "+avx"]
pub unsafe fn _mm256_extract_epi8(a: i8x32, imm8: i32) -> i32 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this and some of the extractions below assert for a particular instruction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that there is no instrunction specified for this intrinsic.

@alexcrichton
Copy link
Member

I also closed #91 and #95 as they accidentally duplicated the work from here, but maybe the tests could be included?

Copy link
Contributor

@vbarrielle vbarrielle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I had some duplicated work in #95, I've tried to suggest places where some work could still be used :)

src/x86/avx.rs Outdated
}

/// Horizontally add adjacent pairs of double-precision (64-bit) floating-point
/// elements in `a` and `b`, and pack the results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to include documentation for the location of the results, as in #95 ? Eg:

/// In the result, sums of elements from `a` are returned in even locations,
/// while sums of elements from `b` are returned in odd locations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to merge your changes in my branch.

src/x86/avx.rs Outdated
}

/// Horizontally add adjacent pairs of single-precision (32-bit) floating-point
/// elements in `a` and `b`, and pack the results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, you could add

/// In the result, sums of elements from `a` are returned in locations of
/// indices 0, 1, 4, 5; while sums of elements from `b` are in locations
/// 2, 3, 6, 7.

src/x86/avx.rs Outdated
}

/// Horizontally subtract adjacent pairs of single-precision (32-bit) floating-point
/// elements in `a` and `b`, and pack the results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// In the result, sums of elements from `a` are returned in locations of
/// indices 0, 1, 4, 5; while sums of elements from `b` are in locations
/// 2, 3, 6, 7.

src/x86/avx.rs Outdated
}

/// Horizontally subtract adjacent pairs of double-precision (64-bit) floating-point
/// elements in `a` and `b`, and pack the results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// In the result, sums of elements from `a` are returned in even locations,
/// while sums of elements from `b` are returned in odd locations.

@alexcrichton alexcrichton merged commit f3f5a9c into rust-lang:master Oct 5, 2017
@alexcrichton
Copy link
Member

Thanks @gwenn!

@gwenn gwenn deleted the avx branch October 5, 2017 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants