[Omp] OpenMP optimisation of setting up and taking down parallelregions

Haab, Grant grant.haab at intel.com
Thu Oct 19 11:09:06 PDT 2006


Yes, the Intel implementation already does this pooling behavior for
threads and re-uses the same team for each successive parallel region.
 
The overhead that cannot be removed in this way is the overhead of the
fork and join operations at the beginning and end of each parallel
region (handing out work to threads and waiting for all threads to
arrive).   Every active parallel region must do these two things
regardless if threads are already set up and waiting.
 
Another option is to restrict the parallel region to only go parallel
(be active) when there is enough work to do to overcome the
synchronization and setup overhead of a parallel region with an "if"
clause.  for example:
 
#pragma omp parallel if (amount_of_work > threshold)
 
However, if you could exploit a courser-grained level of parallelism in
the algorithm (which would admittedly take more work to accomplish), you
may be able to get much better speedup.
 
- Grant

________________________________

From: omp-bounces at openmp.org [mailto:omp-bounces at openmp.org] On Behalf
Of davide rossetti
Sent: Thursday, October 19, 2006 10:37 AM
To: omp at openmp.org
Subject: Re: [Omp] OpenMP optimisation of setting up and taking down
parallelregions




On 10/19/06, Themos Tsikas <themos at nag.co.uk> wrote: 

	Yes, I thought of that. Of course, each function (this is in C)
has many
	scopes, control statements and the like so that I have to
carefully steer
	every thread to the right calls by setting up shared boolean
variables to 
	coordinate everything. In fact, every variable encountered all
the way down
	to CPU_bound must be declared shared as well and lots of
barriers put in to
	make sure that "if (var) foo1(..)" gets done properly. It's a
big mess for a 
	simple thing to want to do!
	
	Maybe I should do some benchmarking to get a handle on the
latency of "pragma
	omp parallel". I am using the Intel compiler on IA-32 Linux.
	
	Any thoughts on whether a directive like 
	
	#pragma omp parallel deferred num_threads(4)
	
	would make sense? It would create and "prime" the threads but
wait for a
	
	#pragma omp parallel
	
	before filling the required (library, kernel) data structures
and allowing the 
	threads to run.
	
	

I think that any sensible OpenMP implementation already behaves this
way. It pre-allocate a bunch of threads in a thread pool, possibly
sleeping on some mutex, and dispatches bits of work to the pool. 


-- 
davide.rossetti at gmail.com ICQ:290677265 SKYPE:d.rossetti 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.openmp.org/pipermail/omp/attachments/20061019/84099b16/attachment-0001.html


More information about the Omp mailing list