Skip to content

Suboptimal codegen for match from enum to almost-same-value usize (unnecessary table lookups) #136972

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
oxalica opened this issue Feb 13, 2025 · 2 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@oxalica
Copy link
Contributor

oxalica commented Feb 13, 2025

I tried this code:

#[derive(Clone, Copy)]
#[repr(u8)]
pub enum Len {
    Zero = 0,
    One,
    Two,
    Three,
}

#[no_mangle]
pub fn convert_len(len: &Len) -> Option<usize> {
    match *len {
        // Len::Zero => Some(0), // This produces optimal code.
        Len::Zero => None, // This does not: two table lookups, one being identity table.
        Len::One => Some(1),
        Len::Two => Some(2),
        Len::Three => Some(3),
    }
}

// Manual optimal code.
#[no_mangle]
pub fn convert_len_optimal(len: &Len) -> Option<usize> {
    match *len {
        Len::Zero => None,
        len => Some(unsafe { std::mem::transmute::<Len, u8>(len) } as usize),
    }
}

It gives (Compiler explorer):

convert_len:
        movzx   eax, byte ptr [rdi]
        shl     eax, 3
        lea     rcx, [rip + .Lswitch.table.convert_len]
        mov     rdx, qword ptr [rax + rcx]
        lea     rcx, [rip + .Lswitch.table.convert_len.1]
        mov     rax, qword ptr [rax + rcx]
        ret

convert_len_optimal:
        movzx   edx, byte ptr [rdi]
        xor     eax, eax
        test    rdx, rdx
        setne   al
        ret

.Lswitch.table.convert_len:
        .zero   8
        .quad   1
        .quad   2
        .quad   3

.Lswitch.table.convert_len.1:
        .quad   0
        .quad   1
        .quad   1
        .quad   1

I expect convert_len to generate code like convert_len_optimal. No table lookup should be generated, especially not the first identity table. Though I don't expect performance to differ a lot when the cache is hot, but wasting cache on these trivial tables does not worth it.
I'm not sure if LLVM or rustc is to blame here.

Meta

Reproduced on compiler Explorer's 1.84, latest nightly, and local nightly (rustc 1.86.0-nightly (854f22563 2025-01-31))

@oxalica oxalica added the C-bug Category: This is a bug. label Feb 13, 2025
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Feb 13, 2025
@oxalica oxalica changed the title Suboptimal code for match from enum to almost-same-value usize (unnecessary table lookups) Suboptimal codegen for match from enum to almost-same-value usize (unnecessary table lookups) Feb 13, 2025
@hanna-kruppe
Copy link
Contributor

Tangential but can’t the transmute in the “optimal” variant be a safe as cast?

@oxalica
Copy link
Contributor Author

oxalica commented Feb 13, 2025

Tangential but can’t the transmute in the “optimal” variant be a safe as cast?

Yes it is. I totally forget it exists 🤦 .

@jieyouxu jieyouxu added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such and removed C-bug Category: This is a bug. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Feb 13, 2025
@nikic nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. labels May 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants