I got into a discussion with some Actuaries on different implementations to a simple Actuarial modeling problem. We all implemented a simple 10-period net present value. Here is my simple implementation:
pub fn npv(mortality_rates: &Vec<f64>, lapse_rates: &Vec<f64>, interest_rate: f64, sum_assured: f64, premium: f64, init_pols: f64, term: Option<usize>) -> f64 {
let term = term.unwrap_or_else(|| mortality_rates.len());
let mut result = 0.0;
let mut inforce = init_pols;
let v: f64 = 1.0 / (1.0 + interest_rate);
for (t, (q, w)) in mortality_rates.iter().zip(lapse_rates).enumerate() {
let no_deaths = if t < term {inforce * q} else {0.0};
let no_lapses = if t < term {inforce * w} else {0.0};
let premiums = inforce * premium;
let claims = no_deaths * sum_assured;
let net_cashflow = premiums - claims;
result += net_cashflow * v.powi(t as i32);
inforce = inforce - no_deaths - no_lapses;
}
result
}
I bench marked it and it ran in 364.35 ns.
Another person re-implemented my version in Julia:
function npv3(q,w,P,S,r,term=nothing)
term = isnothing(term) ? length(q) + 1 : term + 1
inforce = 1.0
result = 0.0
for (t,(q,w)) in enumerate(zip(q,w))
deaths = t < term ? inforce * q : 0.0
lapses = t < term ? inforce * w : 0.0
premiums = inforce * P
claims = deaths * S
ncf = premiums - claims
result += ncf * (1 / ( 1 + r)) ^ (t-1)
inforce = inforce - deaths - lapses
end
return result
end
This user bench marked his version and found that it ran in 135 ns. This is not really the problem to decide between Rust vs Julia but it is interesting. He is using an M1 Mac where Julia runs on Rosetta (I'm on windows 10) so it's hard to compare these two implementations in a meaningful way.
My question is, is there anything obvious that I am missing with the Rust implemenation that could improve it? If anyone has an M1 Mac and could bench mark on their machine that would be really interesting.
And even if it doesn't improve speed, it's better style -- it's more general as it allows passing subslices instead of requiring full vectors, without really any cost (not even syntax at the call site, as &Vec<T> will deref-coerce to &[T] automatically).
Can you share how you got the measurement you did? Anything below about 16ms is extremely sensitive to external stuff, and thus rather hard to benchmark reliably. Obligatory mention of Criterion.rs - Criterion.rs Documentation
You should probably run the two programs on the same machine to get comparable benchmarks first. Preferably a machine without much else running at the same time. (edit : and of course don't forget the --release flag ?)
Yes, I'm doing that. I'll have to install Julia and re-run his benchmarks to be able to compare I guess. I was just checking that I wasn't doing anything obviously wrong.
If you know you'll always have sane (i.e. not NaN) inputs and are okay with slightly less accurate results, you can always use the Rust equivalent of the -ffast-math flag provided by most C compilers. These "fast" floating point operations are exposed using intrinsics like std::intrinsics::fadd_fast() and give the compiler more flexibility in reordering operations.
Crates like fast-floats have wrappers around f32/f64 which overload the normal arithmetic operators to use the fast float intrinsics.
In addition to fast math (to enable fused multiply-add and associativity, plus other transformations that may be unsafe/undesirable), I'd also recommend using native compilation flags, as in
I profiled the implementation and found that powi was taking most of the time. I changed to the following impl:
pub fn npv(mortality_rates: &[f64], lapse_rates: &[f64], interest_rate: f64, sum_assured: f64, premium: f64, init_pols: f64, term: Option<usize>) -> f64 {
let term = term.unwrap_or_else(|| mortality_rates.len());
let mut result = 0.0;
let mut inforce = init_pols;
let v = 1.0 / (1.0 + interest_rate);
let mut v_t = v;
for (t, (q, w)) in mortality_rates.iter().zip(lapse_rates).enumerate() {
let no_deaths = if t < term {inforce * q} else {0.0};
let no_lapses = if t < term {inforce * w} else {0.0};
let premiums = inforce * premium;
let claims = no_deaths * sum_assured;
let net_cashflow = premiums - claims;
result += net_cashflow * v_t;
v_t *= v;
inforce = inforce - no_deaths - no_lapses;
}
result
}
Now it's down to 31 ns.
I know it's not having to re-compute the power in each loop as it's taking the previous power from the previous loop but would there be a difference in the impl of powi on Arch vs windows as well?
Hmm, I wonder if this is because of float non-associativity.
powi is an LLVM intrinsic https://rust.godbolt.org/z/jsdMex that calls a function from compiler-rt, which is (if we're using the C implementations and not rust ones, which I don't remember) implemented like this:
That's really good for computing it once, but because of floating point it's technically not allowed for the compiler to replace that with the running version in your alternative implementation, which can accumulate more error.
That said, the intrinsic's definition does say
The order of evaluation of multiplications is not defined.
So maybe it's supposed to be able to do this anyway?
An easy way to check if your refactoring changed the output would be to fuzz the old and new implementation against each other. (though I think refactoring the code was a bit out of scope here, since the goal was to compare the exact same algorithm ^^)
Sorry but you did, after the fold you select the second value of the result (equivalent to v_t), but you actually needed to select the first by using .0 instead of .1. Since v_t is only multiplied by a constant the compiler can remove all the other stuff, which of course is much faster.
Can I just point out that it's absolutely awesome that someone goes "hey, why is this code slow?", and the community spends over a week optimizing the living hell out of it just to show that it's possible?