Hi,
I compile my code with the following commands …
includes="$NVCOMPILERS/$NVARCH/20.5/compilers/include-stdpar"
compile="nvc++ -Wall -fast -I $includes"
$compile -DPOLICY=std::execution::par -o main_par main.cpp
This enables me to define different policies each time so I can compare performance.
The loop of my code (ignoring various dependencies) is …
std::transform
(
POLICY,
thrust::counting_iterator<unsigned>(0),
thrust::counting_iterator<unsigned>(num_samples),
result->data(),
[=](unsigned index) -> result_scalar_type
{
result_scalar_type k = (index * max_k) / num_samples;
result_scalar_type n = initial_value;
for (unsigned i = 0; i < iterations ; i++)
n = k * n * (1.0 - n);
return n;
}
);
Whereas if I compile with --stdpar the code is parallelized on the GPU, without the --stdpar there is no difference between the par and seq execution policies. I would have expected this to be spread across the cpu cores & threads?
Am I doing something wrong?
Thanks,
Leigh.