[Omp] OpenMP spec 2.5 seems to have incorrect flush example on page 12
Greg Bronevetsky
greg at bronevetsky.com
Fri May 4 18:14:34 PDT 2007
I disagree that this behavior is undefined. In fact, the spec is quite
explicit about the sequence of operations that will guarantee that a write
on one thread reaches a read on another thread. Marcel's example should
work fine according to the 2.5 spec and the new formal specification of
the OpenMP memory model. The problem lies with gcc's implementation which
performs incorrect registerization.
Marcel suggested that the problem lies with the fact that gcc's code
copies the value back from the register to the shared variable, even if
the variable hasn't been written to. I disagree with that a little because
I think that the writeback should still be ok if the variable has been
read from but not written to. The reason is that if the variable was
read/written by T1 and simultaneously written by T2, then we get a race
and thus, nothing is guaranteed anyhow (i.e. gcc is not responsible for
giving screwy results). However, if T1 doesn't even access the shared
variable, then gcc is wrong to registerize it.
Greg Bronevetsky
> Marcel, your example is very interesting!
>
> The problem here is the memory model is not so well defined in the base
> language and OpenMP spec is also kind of loose, and therefore a compiler
> may do something unexpectedly.
>
> I have not tried your example with gcc, but I can imagine something like
> the following happened.
>
> First, given the following piece of code
>
> for (i=1; i<100; i++) {
> if (some_condition)
> X = whatever;
> }
>
> Assuming sharedVar is a writable memory location, a compiler may
> transfer the code to
>
> r = X;
> for (i=1; i<100; i++) {
> if (some_condition)
> r = whatever;
> }
> X = r;
>
> This typically happens in the registerization phase or the load/store
> elimination phase in an optimizing compiler. It is a legal
> transformation in C.
>
> [Question: is this also a legal transformation in OpenMP? I cannot find
> anything says it is illegal.]
>
> Now, let's go back to your example. Just to make the code simple, let's
> assume there is an implied flush before and after each load and store.
>
> Your code can be illustrated as
>
> X = 21;
> thread 1: thread 2:
>
> post(v1);
> wait(v1);
> loop; X = 42;
> post(v2);
> wait(v2);
> print X;
>
> where 'loop' is similar to the above loop before transformation.
>
> One would expect thread 1 to print out 42.
>
> But because of the compiler transformation, the code may now looks like
>
> X = 21;
>
> thread 1: thread 2:
>
> post(v1);
> r = X;
> wait(v1);
> X = 42;
> loop';
> X = r;
> post(v2);
> wait(v2);
> print X;
>
>
> Now the thread 1 will print out 21! It does not matter whether the
> compiler introduced store has a flush or not.
>
>
> Hans Boehm from HP has an article on this kind of issues.
> http://www.hpl.hp.com/techreports/2004/HPL-2004-209.html
>
>
> Regards,
>
> -- Yuan
>
>
>
>
> Marcel Beemster wrote On 05/04/07 05:20,:
> > Dear OpenMP specification gurus,
> >
> > Thanks to the people who responded to my questions of
> > yesterday. It is clear to me now that there is something
> > wrong with the OMP specification. See and run the example
> > program below.
> >
> > The consensus seems to be:
> > 1) The compiler can optimize as much as possible
> > between flushes.
> >
> > - This is what I would like to see too
> >
> > 2) My program can be "fixed" by adding more flushes
> > and synchronization to prevent compiler optimizations.
> >
> > - I know how to fix my program, but the OMP 2.5
> > specification says on page 12 that the exact
> > sequence specified there is sufficient to
> > communicate a shared variable between two
> > threads. But that is not sufficient, given the
> > possible compiler optimizations that we all
> > want.
> >
> > My conclusion is that the description of communication of a
> > shared variable using the event sequence on page 12 of the
> > OMP specification is incorrect.
> >
> > To further clarify, see below a self-checking stand-alone
> > program that exactly implements the sequence of events
> > described on page 12. Additionally, the program is construed
> > in such a way that the compiler possibly caches the shared
> > value across the flush by T1. Note that T2 never writes to
> > sharedVar, though the compiler cannot see that.
> >
> > Compiling and running the program without optimization gives:
> >
> > $ export OMP_NUM_THREADS=2
> > $ gcc -fopenmp exampleP12.c && ./a.out
> > exampleP12.c: In function 'main':
> > exampleP12.c:65: warning: division by zero
> > Start of Program
> > 0: T2 sees value 21 in variable sharedVar
> > 0: T2 has flushed variable sharedVar
> > 1: T1 writes 42 in variable sharedVar
> > 2: T1 has flushed variable sharedVar
> > 3: T2 flushes variable sharedVar
> > 4: T2 reads from variable sharedVar
> > T2 has value 42 from sharedVar
> > That is the right value
> > End of Program
> >
> > Compiling and running the program with optimization gives:
> >
> > $ export OMP_NUM_THREADS=2
> > $ gcc -fopenmp -O3 exampleP12.c && ./a.out
> > [...]
> > 4: T2 reads from variable sharedVar
> > T2 has value 21 from sharedVar
> > The value 21 is incorrect
> > End of Program
> >
> > These results are for gcc 4.2 on i686/linux:
> > $ gcc -v
> > Using built-in specs.
> > Target: i686-pc-linux-gnu
> > Configured with: ../configure
> > --prefix=/home/marcel/tmp/gcc-4.2-bin
> > Thread model: posix
> > gcc version 4.2.0 20070214 (prerelease)
> >
> > My questions:
> >
> > 1) Please run this program with your non-gcc compiler
> > and report the results, compiler and platform. This
> > will give an idea of how other implementations
> > view optimization. Make sure to run with at least
> > two threads: export OMP_NUM_THREADS=2, otherwise
> > the program will hang.
> >
> > 2) Do you agree that the program is a correct
> > implementation of the events described on page 12 of
> > the OMP specification?
> >
> > 3) Do you agree that that sequence of events is hence
> > not sufficient to communicate a value between
> > threads because we do want the compiler to optimize,
> > hence the OMP specification is incorrect on this point?
> >
> > Thanks a lot,
> > Marcel
> >
> > ===================================================================
> >
> >
> > #include <stdio.h>
> >
> > /*
> > * Lightweight but inefficient sequencing mechanism between two running
> > * threads. Does not require OS or omp-library calls.
> > * Make sure that at least two threads exist by setting the environment
> > * variable OMP_NUM_THREADS=2.
> > */
> > volatile int nextStep = 0 ;
> > #define DONEXTSTEP( x ) nextStep = x ;
> > #define WAITFORSTEP( x ) while( nextStep != x ) { /*nothing */ }
> >
> > /* The shared variable written by T1 and read by T2 */
> > int sharedVar = 21 ;
> >
> > /* Dynamic variable so compiler cannot derive its value */
> > int cutOff = 500 ;
> >
> > int main( void ) {
> >
> > printf( "Start of Program\n" ) ;
> >
> > #pragma omp parallel sections
> > {
> >
> > #pragma omp section
> > { /* Start of T1 */
> >
> > WAITFORSTEP( 10 ) ; /* Wait until T2 says to continue */
> > sharedVar = 42 ;
> > printf( "1: T1 writes 42 in variable sharedVar\n" ) ;
> >
> > #pragma omp flush( sharedVar )
> > printf( "2: T1 has flushed variable sharedVar\n" ) ;
> > DONEXTSTEP( 30 ) ; /* Tell T2 that write&flush is done */
> >
> > } /* End of thread T1 */
> >
> > #pragma omp section
> > { /* Start of T2 */
> > int i, locVar ;
> >
> > printf( "0: T2 sees value %d in variable sharedVar\n",
> > sharedVar ) ;
> > #pragma omp flush( sharedVar )
> > printf( "0: T2 has flushed variable sharedVar\n" ) ;
> >
> > /* If the compiler decides to optimize and cache
> > * the value of sharedVar in a register, this
> > is the
> > * place, before the loop, where sharedVar is
> > read */
> > for( i = 0 ; i < 100 ; i++ ) {
> > if( i == 0 ) {
> > /* After potentially caching sharedVar, tell T1
> > * to continue with its write&flush */
> > DONEXTSTEP( 10 ) ;
> > }
> > if( i == 99 ) {
> > /* Before exiting the loop and writing back the
> > * cached but not written value, wait until T1
> > * finished its flush of sharedVar */
> > WAITFORSTEP( 30 ) ;
> > }
> > if( i > cutOff ) {
> > /* This code is never executed but the
> > * compiler cannot know that */
> > sharedVar = i / 0 ;
> > }
> > /* After the loop, the potentially cached value
> > * of sharedVar is written to memory */
> > }
> >
> > #pragma omp flush( sharedVar )
> > printf( "3: T2 flushes variable sharedVar\n" ) ;
> >
> > locVar = sharedVar ;
> > printf( "4: T2 reads from variable sharedVar\n" ) ;
> > printf( " T2 has value %d from sharedVar\n", locVar ) ;
> > if( locVar == 42 ) {
> > printf( " That is the right value\n" ) ;
> > } else {
> > printf( " The value %d is incorrect\n", locVar ) ;
> > }
> >
> > } /* End of thread T2 */
> >
> > } /* End of parallel construct */
> > printf( "End of Program\n" ) ;
> > return 0 ;
> > }
> >
> _______________________________________________
> Omp mailing list
> Omp at openmp.org
> http://openmp.org/mailman/listinfo/omp
>
More information about the Omp
mailing list