Description
This weekend I ran some benchmarks on some of my code. After making a seemingly insignificant code change I noticed a small, but measurable performance regression. After investigating the generated assembly, I stumbled upon a case, where the compiler emits code that is not optimal.
This minimal example shows the same behaviour (Playground link):
extern crate rand;
use std::f32;
use rand::Rng;
fn main() {
let mut list = [0.0; 16];
let mut rg = rand::thread_rng();
// Random initialization to prevent the compiler from optimizing the whole example away
for i in 0..list.len() {
list[i] = rg.gen_range(0.0, 0.1);
}
let mut lowest = f32::INFINITY;
for i in 0..list.len() {
lowest = if list[i] < lowest { // <<<<<<<<<<<<<<<
list[i]
} else {
lowest
};
}
println!("{}", lowest);
}
When compiling with the --release
flag, the compiler generates the following instructions for the marked block:
...
minss %xmm0, %xmm1
movss 88(%rsp), %xmm0
minss %xmm1, %xmm0
movss 92(%rsp), %xmm1
...
However, if I replace those lines with the following:
if list[i] < lowest {
lowest = list[i];
}
the compiler emits a strange series of float compare and jump instructions:
.LBB5_38:
movss 92(%rsp), %xmm1
ucomiss %xmm1, %xmm0
ja .LBB5_39
...
.LBB5_42:
movss 100(%rsp), %xmm1
ucomiss %xmm1, %xmm0
ja .LBB5_43
...
.LBB5_39:
movss %xmm1, 12(%rsp)
movaps %xmm1, %xmm0
movss 96(%rsp), %xmm1
ucomiss %xmm1, %xmm0
jbe .LBB5_42
As a comparison, both gcc and clang can optimize a similar C++ example:
#include <stdlib.h>
#include <iostream>
using namespace std;
int main() {
float list[16];
for(size_t i = 0; i < 16; ++i) {
list[i] = rand();
}
float lowest = 1000.0f;
for (size_t i = 0; i < 16; ++i) {
/* Variant A: */
//lowest = list[i] < lowest ? list[i] : lowest;
/* Variant B: */
if (list[i] < lowest) {
lowest = list[i];
}
}
cout << lowest;
}
Both compilers generate minss
instructions for both variants.
(Godbolt)
I wasn't sure whether rustc or LLVM were responsible for this behaviour, however after a quick glance at the generated LLVM IR, I'm tending towards rustc, since in the first case it emits fcmp
and select
instructions, while in the latter it generates fcmp
and br
.
What do you think?