[Omp] Overhead of #pragma omp for static nowait
Greg Bronevetsky
greg at bronevetsky.com
Sun Dec 10 14:10:06 PST 2006
> How many threads are used and are they active when the parallel loop is
hit?
Depending on the machine, 2, 4 or 8. Yes, they are all active at this
time.
> What is the call overhead of the omp_get_thread_num() ad
> omp_get_num_threads() and any other calls the compiler might have to insert?
I think that those two would be sufficient to implement this type of
functionality. I've tried it on the IA32 machine and a Power5 machine and
on IA32 both calls cost ~110 ns (~300 cycles) while on the Power5 its ~31
ns (~60 cycles). There's no reason to call either of these functions more
than once since their results can be saved in a temporary variable.
Furthermore, since these functions should just be returning the value of
some variable, I'm surprised that they're taking as much time as they are.
Given that the base cost (as opposed to per-iteration cost) of a
#pragma omp for schedule(static) nowait is on the order of us, the cost of
calls to omp_get_thread_num() and omp_get_num_threads() doesn't seem to
account for this overhead.
> Have you looked at what the compiler is generating?
> What compilers have you tried?
>
Intel 9.1 and IBM xlC 5
Greg Bronevetsky
More information about the Omp
mailing list