[Omp] Overhead of #pragma omp for static nowait
Greg Bronevetsky
greg at bronevetsky.com
Mon Dec 11 11:24:57 PST 2006
What kind of false sharing issues? Are you referring to my code below or
to typical OpenMP implementations?
Greg Bronevetsky
On Mon, 11 Dec 2006, Dieter an Mey wrote:
> Greg,
> if you really use static(1) you may hit false sharing issues ...
>
> Dieter
>
> Greg Bronevetsky schrieb:
> > I mean the following compiler transformation:
> > #pragma omp for static(1) nowait
> > for(int i=0; i<n; i++){}
> > should become:
> > for(int i=omp_get_thread_num(); i<n; i+=omp_get_num_threads())
> > {}
> >
> > and
> > #pragma omp for static nowait
> > for(int i=0; i<n; i++){}
> > should become:
> > // the id of the last thread that gets 1 more iteration than others
> > int midPoint=n%omp_get_num_threads();
> > // number of iterations assigned to threads with smaller ids
> > int itersBeforeMe;
> > if(omp_get_thread_num()<=midPoint)
> > itersBeforeMe = omp_get_thread_num()*(n/omp_get_num_threads()+1);
> > else
> > itersBeforeMe = midPoint*(n/omp_get_num_threads()+1)+
> > (omp_get_thread_num()-midPoint)*(n/omp_get_num_threads());
> > // number of iterations assigned to this thread
> > int numIter;
> > if(omp_get_thread_num()<=midPoint)
> > numIter = n/omp_get_num_threads()+1;
> > else
> > numIter = n/omp_get_num_threads();
> >
> > for(int i=itersBeforeMe; i<itersBeforeMe+numIter; i++)
> > {}
> >
> > Other chunk sizes or loop bounds would involve more complex arithmetic to
> > set up loop bounds but the basic idea is pretty much the same. The overall
> > cost of the above implementation of "#pragma omp for static(1) nowait"
> > should be several ns per iteration. However, I am seeing much higher
> > overheads in my experiments.
> >
> > Greg Bronevetsky
> >
> > On Fri, 8 Dec 2006, Meadows, Lawrence F wrote:
> >
> >> What do you mean by "converting to a set of serial loops"
> >>
> >> -----Original Message-----
> >> From: omp-bounces at openmp.org [mailto:omp-bounces at openmp.org] On Behalf
> >> Of Greg Bronevetsky
> >> Sent: Friday, December 08, 2006 12:48 PM
> >> To: omp at openmp.org
> >> Subject: [Omp] Overhead of #pragma omp for static nowait
> >>
> >> I have recently executed the EPCC microbenchmarks on several machines
> >> and
> >> noticed that there is a consistent overhead of ~1us (~several thousand
> >> cycles) for #pragma omp for static nowait and its variants on the
> >> platforms I've tried. Given the simplicity of this scheduling policy, it
> >> seems to me that it should be possible to convert the parallel loop into
> >> a
> >> set of serial loops at compile-time. This would result in a loop that
> >> requires no inter-thread communication and costs only a few tens of
> >> cycles.
> >>
> >> What is the reason for this much-higher than expected overhead? Is it
> >> just
> >> that the above compiler analysis is not typically performed or is there
> >> a
> >> more fundamental reason. Here at LLNL, we have applications that would
> >> like to use OpenMP to parallelize loops with ~50 iterations and ~.25us
> >> of
> >> work per iteration. ~1us overheads for the #pragma omp for static nowait
> >> make OpenMP too expensive for this task.
> >>
> >> Greg Bronevetsky
> >>
> >> _______________________________________________
> >> Omp mailing list
> >> Omp at openmp.org
> >> http://openmp.org/mailman/listinfo/omp
> >>
> >>
> >
> > _______________________________________________
> > Omp mailing list
> > Omp at openmp.org
> > http://openmp.org/mailman/listinfo/omp
> >
>
> --
> --------------------------------------------------------------------
> Dieter an Mey
> High Performance Computing Hochleistungsrechnen
> RWTH Aachen University Rechen- und Kommunikations-
> Center for Computing and Communication zentrum der RWTH Aachen
> phone: ++49-(0)241-80-24377 Seffenter Weg 23
> fax: ++49-(0)241-80-22134 52074 Aachen, Germany
> email: anmey at rz.rwth-aachen.de
> --------------------------------------------------------------------
>
>
More information about the Omp
mailing list