Skip to content

println!() prevents optimization by capturing pointers #50519

Open
@df5602

Description

@df5602

This weekend I ran some benchmarks on some of my code. After making a seemingly insignificant code change I noticed a small, but measurable performance regression. After investigating the generated assembly, I stumbled upon a case, where the compiler emits code that is not optimal.

This minimal example shows the same behaviour (Playground link):

extern crate rand;

use std::f32;
use rand::Rng;

fn main() {
    let mut list = [0.0; 16];
    let mut rg = rand::thread_rng();

    // Random initialization to prevent the compiler from optimizing the whole example away
    for i in 0..list.len() {
        list[i] = rg.gen_range(0.0, 0.1);
    }

    let mut lowest = f32::INFINITY;

    for i in 0..list.len() {
        lowest = if list[i] < lowest {    // <<<<<<<<<<<<<<<
            list[i]
        } else {
            lowest
        };
    }

    println!("{}", lowest);
}

When compiling with the --release flag, the compiler generates the following instructions for the marked block:

...
minss	%xmm0, %xmm1
movss	88(%rsp), %xmm0
minss	%xmm1, %xmm0
movss	92(%rsp), %xmm1
...

However, if I replace those lines with the following:

if list[i] < lowest {
    lowest = list[i];
}

the compiler emits a strange series of float compare and jump instructions:

.LBB5_38:
	movss	92(%rsp), %xmm1
	ucomiss	%xmm1, %xmm0
	ja	.LBB5_39
...
.LBB5_42:
	movss	100(%rsp), %xmm1
	ucomiss	%xmm1, %xmm0
	ja	.LBB5_43
...
.LBB5_39:
	movss	%xmm1, 12(%rsp)
	movaps	%xmm1, %xmm0
	movss	96(%rsp), %xmm1
	ucomiss	%xmm1, %xmm0
	jbe	.LBB5_42

As a comparison, both gcc and clang can optimize a similar C++ example:

#include <stdlib.h>
#include <iostream>

using namespace std;

int main() {
    float list[16];
    for(size_t i = 0; i < 16; ++i) {
        list[i] = rand();
    }

    float lowest = 1000.0f;

    for (size_t i = 0; i < 16; ++i) {
        
        /* Variant A: */
        //lowest = list[i] < lowest ? list[i] : lowest;

        /* Variant B: */
        if (list[i] < lowest) {
            lowest = list[i];
        }
    }

    cout << lowest;
}

Both compilers generate minss instructions for both variants.
(Godbolt)

I wasn't sure whether rustc or LLVM were responsible for this behaviour, however after a quick glance at the generated LLVM IR, I'm tending towards rustc, since in the first case it emits fcmp and select instructions, while in the latter it generates fcmp and br.

What do you think?

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-codegenArea: Code generationC-enhancementCategory: An issue proposing an enhancement or a PR with one.C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchI-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions