[Omp] Overhead of #pragma omp for static nowait
Greg Bronevetsky
greg at bronevetsky.com
Fri Dec 8 12:47:38 PST 2006
I have recently executed the EPCC microbenchmarks on several machines and
noticed that there is a consistent overhead of ~1us (~several thousand
cycles) for #pragma omp for static nowait and its variants on the
platforms I've tried. Given the simplicity of this scheduling policy, it
seems to me that it should be possible to convert the parallel loop into a
set of serial loops at compile-time. This would result in a loop that
requires no inter-thread communication and costs only a few tens of
cycles.
What is the reason for this much-higher than expected overhead? Is it just
that the above compiler analysis is not typically performed or is there a
more fundamental reason. Here at LLNL, we have applications that would
like to use OpenMP to parallelize loops with ~50 iterations and ~.25us of
work per iteration. ~1us overheads for the #pragma omp for static nowait
make OpenMP too expensive for this task.
Greg Bronevetsky
More information about the Omp
mailing list