[Omp] OpenMP spec 2.5 seems to have incorrect flush example on page 12

Greg Bronevetsky greg at bronevetsky.com
Sat May 5 18:51:13 PDT 2007


Good point. I didn't worry about making sure that the events always happen
in this order in every execution. It should be pretty easy to modify that
example to make sure that thread 1 knows whether its write happened before
or after thread 0's read and only worry about the former case.

                             Greg Bronevetsky

On Sat, 5 May 2007, Hoeflinger, Jay P wrote:

> Greg,
> 
> I don't agree that the first example prints 42.  You have no guaranteed
> ordering between the threads, so the assignment to X on thread 1 can
> happen AFTER the value is printed on thread 0.  Sometimes it prints 42
> and sometimes 21.
> 
> Likewise, the second example doesn't always print 21.  If thread 1
> acquires the lock after thread 0 assigns X=r, it will print 42.
> 
> The last example has exactly the same indeterminacy!
> 
> The flush-without-a-list implied by the lock forces the assignment of 42
> to happen either before or after the printing, but that's all it does.
> The third example hasn't changed anything at all.
> 
> Jay
> 
> -----Original Message-----
> From: omp-bounces at openmp.org [mailto:omp-bounces at openmp.org] On Behalf
> Of Greg Bronevetsky
> Sent: Saturday, May 05, 2007 6:03 PM
> To: Meadows, Lawrence F
> Cc: omp at openmp.org
> Subject: Re: [Omp] OpenMP spec 2.5 seems to have incorrect flush example
> on page 12
> 
> I don't understand. This particular optimization doesn't move anything
> around. It introduces new operations that were not present in the
> original code. Furthermore, Jay's example specifically applies to
> flushes with lists. My example uses flushes without lists (the ones
> implied by lock operations), so its not the same thing.
> 
>                              Greg Bronevetsky
> 
> On Sat, 5 May 2007, Meadows, Lawrence F wrote:
> 
> > Locks imply flush, right? You can't examine or set X without a lock or
> 
> > a flush. Your first example is broken because the assignment to X by 
> > thread 0 is not guarded with a lock or a flush. It doesn't matter 
> > whether or not it is ever executed. The compiler is free to move 
> > things around if a store/load (whether executed or not) isn't guarded 
> > with a flush.
> > 
> > The actual act of execution is not the issue, I finally understand 
> > this after Jay pointed it out. If a path of execution includes a flush
> 
> > of a variable then the compiler is not allowed to hoist loads or 
> > stores out of that path.
> > 
> > 
> > -----Original Message-----
> > From: omp-bounces at openmp.org [mailto:omp-bounces at openmp.org] On Behalf
> 
> > Of Greg Bronevetsky
> > Sent: Saturday, May 05, 2007 2:45 PM
> > To: Marcel Beemster
> > Cc: omp at openmp.org
> > Subject: Re: [Omp] OpenMP spec 2.5 seems to have incorrect flush 
> > example on page 12
> > 
> > My understanding of what Marcel has been saying is that it applies 
> > equally to flush with and without a list. Marcel suggests that flush 
> > in general is a bad idea since it prevents certain sequential 
> > optimizations.
> > Although
> > Marcel suggests that programmers be limited to barriers and the like, 
> > I believe that barriers are the only synchronization construct that 
> > his optimization is compatible with. In particular, locks, critical 
> > regions and ordered regions don't seem to be compatible. For example, 
> > consider the following program, which uses locks: (the loop is just 
> > like Marcel's in that X=1 is never executed but the compiler can't 
> > know that) (initially X=21)
> > Thread 0           Thread 1
> > --------           --------
> > while(?)           omp_set_lock(&l)
> >   if(?) X=1        X=42
> >                    omp_unset_lock(&l)
> > omp_set_lock(&l)
> > print X
> > omp_unset_lock(&l)
> > 
> > I think everyone can agree that the print on Thread 0 should output 
> > 42. However, Marcel's optimization can transform that above execution 
> > into the one below:
> > (initially X=21)
> > Thread 0           Thread 1
> > --------           --------
> > r=X                omp_set_lock(&l)
> > while(?)           X=42
> >   if(?) r=1        omp_unset_lock(&l)
> > X=r
> > omp_set_lock(&l)
> > print X
> > omp_unset_lock(&l)
> > 
> > The result is that Thread 0 prints 21.
> > 
> > As such, we have a choice: This sequential optimization or all 
> > synchronization constructs besides barrier.
> > 
> > There is one other option. Instead of the above transformation, do the
> > following:
> > (initially X=21)
> > Thread 0           Thread 1
> > --------           --------
> > r=X                omp_set_lock(&l)
> > X2=X               X=42
> > while(?)           omp_unset_lock(&l)
> >   if(?) r=1        
> > if(X2!=X) X=r
> > omp_set_lock(&l)
> > print X
> > omp_unset_lock(&l)
> > 
> > This would preserve the desired semantics at a pretty small reduction 
> > in sequential performance.
> > 
> >                              Greg Bronevetsky
> > 
> > On Sat, 5 May 2007, Marcel Beemster wrote:
> > 
> > > Larry wrote:
> > > > It really does seem specific to this particular optimization; I've
> > been
> > > > trying to think of other cases. Is a flush required even if the 
> > > > assignment is never executed?
> > > 
> > > While Jakub wrote:
> > > > There are no OpenMP directives in the for( i = 0 ; i < 100 ; i++ )
> 
> > > > loop, it can very well be moved to a new routine in some other
> > compilation
> > > > unit, perhaps not built with OpenMP flags at all.  Are you saying
> > that
> > > > because of OpenMP existence all similar loop transformations are
> > illegal?
> > > 
> > > I side with Jakub on this. This is really not specific for this 
> > > particular optimization. If an OMP compiler does not have this 
> > > freedom >between flushes<, then any optimization involving globally 
> > > visible objects (including malloced arrays residing in memory), also
> 
> > > in library code that is free of OMP directives, has to be looked at 
> > > very carefully.
> > > 
> > > As a compiler writer, I was happy to read the intentions of the OMP 
> > > memory model at the start of 1.4, page 10. It says:
> > > 
> > >     "OpenMP provides a relaxed-consistency, shared-memory model. All
> > >     OpenMP threads have access to a place to store and retrieve
> > >     variables, called the memory. In addition, each thread
> > >     is allowed to have its own temporary view of the memory.
> > >     The temporary view of memory for each thread is not a required
> > >     part of the OpenMP memory model, but can represent any kind
> > >     of intervening structure, such as machine registers, cache,
> > >     or other local storage, between the thread and the memory.
> > >     The temporary view of memory allows the thread to cache
> > >     variables and thereby avoid going to memory for every reference
> > >     to a variable."
> > > 
> > > I interpret this as saying that an OMP compiler can be as aggressive
> 
> > > in its optimizations as a compiler for sequential C or Fortran, 
> > > between points where the application programmer explicitly places 
> > > OMP directives. I also believe that this is what we all should want,
> 
> > > because we should not start off our parallelization effort with a 
> > > built-in disadvantage over sequential code.
> > > 
> > > The implication of allowing compilers to do such optimizations is 
> > > that the page 12 communication of shared variables example should be
> 
> > > removed from the OMP specification. I argue that this is not a great
> 
> > > loss: we lose the ability to communicate a shared variable between 
> > > essentially >unsynchronized< parallel threads. It is actually 
> > > non-trivial to construct a program that performs such communication,
> 
> > > see my example code. If you want to do this, use volatile.
> > > 
> > > In that case, the explicit and unsynchronized "#pragma omp flush" 
> > > also becomes meaningless and must be removed from OMP. The flush 
> > > only has use when it occurs synchronized between two or more 
> > > threads, for example impled at a barrier. I am not a fan of suddenly
> 
> > > removing well-recognized features from language specifications, but 
> > > it really is the only logical outcome of wanting OMP compilers to do
> 
> > > optimizations (in an equally agressive way as their sequential 
> > > counterparts).
> > > 
> > > Marcel
> > > 
> > > 
> > > --
> > > Dr. Marcel Beemster, Senior Software Engineer,
> > marcel at ace.nl,www.ace.nl
> > > Associated Compiler Experts bv. Amsterdam, Netherlands. +31 20
> > 6646416.
> > >
> > ----------------------------------------------------------------------
> > -
> > > This e-mail and any  files transmitted  with it are  confidential.
> > Any
> > > technical information contained herein is supplied as-is, and no
> > rights
> > > can be  derived therefrom.  If you have received this message in
> > error,
> > > please notify  the sender by reply  e-mail immediately,  and delete
> > the
> > > message and all copies thereof.
> > > 
> > > 
> > > _______________________________________________
> > > Omp mailing list
> > > Omp at openmp.org
> > > http://openmp.org/mailman/listinfo/omp
> > > 
> > 
> > _______________________________________________
> > Omp mailing list
> > Omp at openmp.org
> > http://openmp.org/mailman/listinfo/omp
> > 
> > 
> 
> 
> _______________________________________________
> Omp mailing list
> Omp at openmp.org
> http://openmp.org/mailman/listinfo/omp
> 
> 





More information about the Omp mailing list