[Omp] OpenMP spec 2.5 seems to have incorrect flush example on page 12

Greg Bronevetsky greg at bronevetsky.com
Mon May 7 11:32:51 PDT 2007


Actually, I disagree with both Larry and Jay. This discussion was not
about flushes in particular and would not be resolved if locks were used
instead of flushes (though it would be resolved if barriers were used).
The point of this discussion was whether a given compiler optimization was
legal. If it were, then this would have implications for the memory model
(i.e. the user might need to be more careful with their inter-thread
communication). If not, we'd have less efficient code.

BTW, if you don't want to go through the details, the summary is in the
last paragraph.

Consider the following piece of code:
X=21
#pragma omp barrier
if(omp_get_thread_num()==0)
{
   omp_set_lock(&l);
   X=42;
   omp_unset_lock(&l);
}
else if(omp_get_thread_num()==1)
{
   while(?) 
      if(0) { X=1; }
   omp_set_lock(&l);
   print(X);
   omp_unset_lock(&l);
}

This code has a potential data race, which can emerge in the following
execution. Note that the purposes of the remaining discussion I'll assume
that thread 1 acquires the lock first. I can make an example that doesn't
depend on this assumption but it would require extra code that obscures
the main point.
(Initially 21)
Thread 0            Thread 1
--------            --------
                    omp_set_lock(&l);
X=1;                X=42
                    omp_unset_lock(&l);
omp_set_lock(&l);
print(X);
omp_unset_lock(&l);

Now, you may say that this race is ridiculous. The write "X=1;" can only
happen if "if(0)" evaluates to true, which will never happen in any
execution. However, consider the following compiler optimization.

The compiler sees the above code and is not smart enough to realize that 
"if(0)" will never evaluate to true (for realistic compilers we need a
more complex condition but the point remains). However, it does see that X
is being used a lot in the loop and as such, places it into a register for
the duration of the loop and copies the value back to X at the end of the
loop. The resulting code looks like this:
X=21
#pragma omp barrier
if(omp_get_thread_num()==0)
{
   omp_set_lock(&l);
   X=42;
   omp_unset_lock(&l);
}
else
{
>>>register = X;
   while(?)
>>>   if(0) { register=1; }
>>>X=register;
   omp_set_lock(&l);
   print(X);
   omp_unset_lock(&l);
}

So now consider what happens in the new code. First: what would have
happened if the write "X=1;" ever actually executed:
(Initially 21)
Thread 0            Thread 1
--------            --------
register=X          omp_set_lock(&l);
register=1;         X=42
X=register;
                    omp_unset_lock(&l);
omp_set_lock(&l);
print(X);
omp_unset_lock(&l);
In this case Thread 0 may print 21 rather than 42. However, the result is
undefined to begin with because there was a race in the original code (or
would have been if "X=1;" ever executed), so it doesn't matter.

Second, the real case, where the write "X=1;" never executes:
execution:
(Initially 21)
Thread 0            Thread 1
--------            --------
register=X          omp_set_lock(&l);
                    X=42
                    omp_unset_lock(&l);
X=register
omp_set_lock(&l);
print(X);
omp_unset_lock(&l);
In this case Thread 0 may also print 21 rather than 42.

So that's the weird bit. The original code contains a write that would
participate in a race if it were ever executed. The compiler performs an
optimization that has weird side-effects, which don't matter if the write
is never executed, since the race would cause the output to be undefined.
However, the write is never actually executed and we don't have a race but
the optimization's weird effect (the possibility that Thread 0 prints 21
rather than 42) persists.

So here's the question: if you have a write in the code that may
participate in a race if it were ever executed, should that count as a
race or not? If it counts, then the compiler can perform the optimization
without worrying if the write that may execute, actually does in a given
run. If it doesn't, then the above optimization is illegal and needs to be
replaced with something that explicitly tracks whether the write was
actually executed. So which is it? I don't think that have a consensus on
this question.

                             Greg Bronevetsky

PS. In a way, this issue illuminates the complexity of doing
multi-threading with incoherent caches (e.g. registers). Regular caches
perform this optimization all the time but they have explicit dirty bits
to detect whether their data is overwritten. This universal use of caching
suggests to me that the above optimization shouldn't be legal because it
would violate too many user assumptions about how shared memory works.

On Mon, 7 May 2007, Dieter an Mey wrote:

> Could anyone please illuminate, what the outcome is for a stupid OpenMP 
> user?
> 
> best regards
> Dieter
> 
> -- 
> --------------------------------------------------------------------
> Dieter an Mey
> High Performance Computing               Hochleistungsrechnen
> RWTH Aachen University                   Rechen- und Kommunikations-
> Center for Computing and Communication   zentrum der RWTH Aachen
> phone: ++49-(0)241-80-24377              Seffenter Weg 23
> fax:   ++49-(0)241-80-22134              52074 Aachen, Germany
> email: anmey at rz.rwth-aachen.de
> --------------------------------------------------------------------
> 
> 




More information about the Omp mailing list