Skip to content

Conversation

andjo403
Copy link
Contributor

as a way towards being able to use the optimized intrinsics ctlz_nonzero and cttz_nonzero from stable.

have not crated any tracking issue if this is not a solution that is wanted

@rust-highfive
Copy link
Contributor

r? @m-ou-se

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 16, 2020
@scottmcm
Copy link
Member

I'm a fan 👍 This is important for avoiding dead ASM for the default x64 target, but can even improve the generated code on newer target-cpus too because of the tighter value range. An example: #70835 (comment)

Copy link
Member

@m-ou-se m-ou-se left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good addition!

Maybe it's good to explain the existence of these functions in the doc comments? Maybe something like On many architectures, this function can perform better than `leading_zeros()` on the underlying integer type, as special handling of zero can be avoided., or something in that direction.

@m-ou-se m-ou-se added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Nov 17, 2020
@andjo403 andjo403 force-pushed the nonzero_leading_trailing_zeros branch from 7cda148 to 02a6aad Compare November 17, 2020 18:24
@andjo403
Copy link
Contributor Author

thanks for the comments have addressed them and pushed up the changes now

@andjo403 andjo403 force-pushed the nonzero_leading_trailing_zeros branch from 02a6aad to e8864a0 Compare November 17, 2020 18:29
Copy link
Member

@m-ou-se m-ou-se left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! One more small comment:

@andjo403 andjo403 force-pushed the nonzero_leading_trailing_zeros branch from e8864a0 to 9bbc4c1 Compare November 17, 2020 18:55
@andjo403
Copy link
Contributor Author

thanks again and fixed

@m-ou-se
Copy link
Member

m-ou-se commented Nov 17, 2020

Thanks!

@bors r+ rollup=always

@bors
Copy link
Collaborator

bors commented Nov 17, 2020

📌 Commit 9bbc4c1 has been approved by m-ou-se

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 17, 2020
m-ou-se added a commit to m-ou-se/rust that referenced this pull request Nov 18, 2020
…eros, r=m-ou-se

add trailing_zeros and leading_zeros to non zero types

as a way towards being able to use the optimized intrinsics ctlz_nonzero and cttz_nonzero from stable.

have not crated any tracking issue if this is not a solution that is wanted
bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 18, 2020
Rollup of 11 pull requests

Successful merges:

 - rust-lang#78361 (Updated the list of white-listed target features for x86)
 - rust-lang#78785 (linux: try to use libc getrandom to allow interposition)
 - rust-lang#78999 (stability: More precise location for deprecation lint on macros)
 - rust-lang#79039 (Tighten the bounds on atomic Ordering in std::sys::unix::weak::Weak)
 - rust-lang#79079 (Turn top-level comments into module docs in MIR visitor)
 - rust-lang#79114 (add trailing_zeros and leading_zeros to non zero types)
 - rust-lang#79131 (Enable AVX512 *epi64 variants by updating stdarch)
 - rust-lang#79133 (bootstrap: use the same version number for rustc and cargo)
 - rust-lang#79145 (Fix handling of panic calls)
 - rust-lang#79151 (Fix typo in `std::io::Write` docs)
 - rust-lang#79158 (type is too big -> values of the type are too big)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 126d88b into rust-lang:master Nov 18, 2020
@rustbot rustbot added this to the 1.50.0 milestone Nov 18, 2020
@andjo403 andjo403 deleted the nonzero_leading_trailing_zeros branch November 18, 2020 20:39
@leonardo-m
Copy link

leonardo-m commented Nov 19, 2020

I've tried this in my codebase, looking at the generated asm, and I've seen that in 100% of the cases, thanks sometimes to inlining, LLVM was able to infer that the input isn't zero, so no performance change has happened.

@m-ou-se
Copy link
Member

m-ou-se commented Nov 19, 2020

In cases where the non-zero value are is created somewhat close to the place where leading_zeros/trailing_zeros is applied, it will definitely optimize this nicely. But just using NonZeroI32 by itself without the context that creates the value, doesn't result in this optimization (yet?): https://godbolt.org/z/bEoPE9

@scottmcm
Copy link
Member

@leonardo-m As another example, note that LLVM is currently not capable of optimizing this even when there's an explicit check for zero in the code before doing the uN::leading_zeros: https://rust.godbolt.org/z/4EhnK4

@leonardo-m
Copy link

But just using NonZeroI32 by itself without the context that creates the value, doesn't result in this optimization (yet?): https://godbolt.org/z/bEoPE9

Using "-C opt-level=2 -C target-cpu=native" it seems to optimize it well.

@leonardo-m
Copy link

As another example, note that LLVM is currently not capable of optimizing this even when there's an explicit check for zero in the code before doing the uN::leading_zeros: https://rust.godbolt.org/z/4EhnK4

With "-C opt-level=3 -C target-cpu=native -Z mir-opt-level=3" it seems to optimize well.

@m-ou-se
Copy link
Member

m-ou-se commented Nov 21, 2020

@leonardo-m With that option it makes use of an instruction that does not need a special case for zero. It still doesn't make use of the fact that NonZero* cannot be zero.

@leonardo-m
Copy link

Oh, fun, thank you. It's still the same LLVM limit, I guess:
#54868

@scottmcm
Copy link
Member

With "-C opt-level=3 -C target-cpu=native -Z mir-opt-level=3" it seems to optimize well.

native there isn't doing anything different from haswell in my example -- the choice was those two was just to pick up instruction sets with and without BMI1.

Very interesting that mir-opts affect this, though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants