[Omp] OpenMP spec 2.5 seems to have incorrect flush example on page 12

Greg Bronevetsky greg at bronevetsky.com
Fri May 4 18:17:02 PDT 2007


In this example, if you have an assignment inside the loop then we get a
data race anyhow, so nothing matters. What we need to is to avoid
registerization if there is no access to the variable (read or write) and
only allow it if there is an access. In fact, the only problem is the copy
from the register back to the shared variable. If there are no accesses to
the registerized shared variable in the registerization region, it should
not be copied back.

                             Greg Bronevetsky

On Fri, 4 May 2007, Meadows, Lawrence F wrote:

> Certainly not! C makes that clear. The ';' at the end of the assignment
> to X is a sequence point. If X is volatile the transformation is
> illegal.
> 
> I think that the original question is whether or not the OMP standard
> requires a flush inside the loop (if no volatile storage class
> modifier).
> 
> Marcel was arguing that the standard
> does not so require and that therefore the standard is broken.
> 
> It really does seem specific to this particular optimization; I've been
> trying to think of other cases. Is a flush required even if the
> assignment
> is never executed?
> 
> -- Larry
> 
> -----Original Message-----
> From: omp-bounces at openmp.org [mailto:omp-bounces at openmp.org] On Behalf
> Of Haoqiang H. Jin
> Sent: Friday, May 04, 2007 5:57 PM
> To: Yuan Lin
> Cc: omp at openmp.org
> Subject: Re: [Omp] OpenMP spec 2.5 seems to have incorrect flush example
> on page 12
> 
> Yuan,
> 
> Your analysis makes sense, provided the "volatile" keyword
> is not used for the variable 'X' in your example.
> Now, the question is:  If 'X' is declared as volatile,
> is the transformation still legal?
> 
> -Henry
> 
> 
> On Fri, 4 May 2007, Yuan Lin wrote:
> 
> > Marcel, your example is very interesting!
> >
> > The problem here is the memory model is not so well defined in the
> base
> > language and OpenMP spec is also kind of loose, and therefore a
> compiler
> > may do something unexpectedly.
> >
> > I have not tried your example with gcc, but I can imagine something
> like
> > the following happened.
> >
> > First, given the following piece of code
> >
> > for (i=1; i<100; i++) {
> >   if (some_condition)
> >       X = whatever;
> > }
> >
> > Assuming sharedVar is a writable memory location, a compiler may
> > transfer the code to
> >
> > r = X;
> > for (i=1; i<100; i++) {
> >   if (some_condition)
> >       r = whatever;
> > }
> > X = r;
> >
> > This typically happens in the registerization phase or the load/store
> > elimination phase in an optimizing compiler. It is a legal
> > transformation in C.
> >
> > [Question: is this also a legal transformation in OpenMP? I cannot
> find
> > anything says it is illegal.]
> >
> > Now, let's go back to your example. Just to make the code simple,
> let's
> > assume there is an implied flush before and after each load and store.
> >
> > Your code can be illustrated as
> >
> >          X = 21;
> > thread 1:       thread 2:
> >
> > post(v1);
> >                 wait(v1);
> > loop;            X = 42;
> >                 post(v2);
> > wait(v2);
> > print X;
> >
> > where 'loop' is similar to the above loop before transformation.
> >
> > One would expect thread 1 to print out 42.
> >
> > But because of the compiler transformation, the code may now looks
> like
> >
> >           X = 21;
> >
> > thread 1:         thread 2:
> >
> > post(v1);
> > r = X;
> >                  wait(v1);
> >                  X = 42;
> > loop';
> > X = r;
> >                  post(v2);
> > wait(v2);
> > print X;
> >
> >
> > Now the thread 1 will print out 21! It does not matter whether the
> > compiler introduced store has a flush or not.
> >
> >
> > Hans Boehm from HP has an article on this kind of issues.
> > http://www.hpl.hp.com/techreports/2004/HPL-2004-209.html
> >
> >
> > Regards,
> >
> > -- Yuan
> >
> >
> >
> >
> > Marcel Beemster wrote On 05/04/07 05:20,:
> >> Dear OpenMP specification gurus,
> >>
> >> Thanks to the people who responded to my questions of
> >> yesterday. It is clear to me now that there is something
> >> wrong with the OMP specification. See and run the example
> >> program below.
> >>
> >> The consensus seems to be:
> >> 	1) The compiler can optimize as much as possible
> >> 	   between flushes.
> >>
> >> 	   	- This is what I would like to see too
> >>
> >> 	2) My program can be "fixed" by adding more flushes
> >> 	   and synchronization to prevent compiler optimizations.
> >>
> >> 	   	- I know how to fix my program, but the OMP 2.5
> >> 		  specification says on page 12 that the exact
> >> 		  sequence specified there is sufficient to
> >> 		  communicate a shared variable between two
> >> 		  threads. But that is not sufficient, given the
> >> 		  possible compiler optimizations that we all
> >> 		  want.
> >>
> >> My conclusion is that the description of communication of a
> >> shared variable using the event sequence on page 12 of the
> >> OMP specification is incorrect.
> >>
> >> To further clarify, see below a self-checking stand-alone
> >> program that exactly implements the sequence of events
> >> described on page 12.  Additionally, the program is construed
> >> in such a way that the compiler possibly caches the shared
> >> value across the flush by T1. Note that T2 never writes to
> >> sharedVar, though the compiler cannot see that.
> >>
> >> Compiling and running the program without optimization gives:
> >>
> >> 	$ export OMP_NUM_THREADS=2
> >> 	$ gcc -fopenmp exampleP12.c && ./a.out
> >> 	exampleP12.c: In function 'main':
> >> 	exampleP12.c:65: warning: division by zero
> >> 	Start of Program
> >> 	0: T2 sees value 21 in variable sharedVar
> >> 	0: T2 has flushed variable sharedVar
> >> 	1: T1 writes 42 in variable sharedVar
> >> 	2: T1 has flushed variable sharedVar
> >> 	3: T2 flushes variable sharedVar
> >> 	4: T2 reads from variable sharedVar
> >> 	   T2 has value 42 from sharedVar
> >> 	   That is the right value
> >> 	End of Program
> >>
> >> Compiling and running the program with optimization gives:
> >>
> >> 	$ export OMP_NUM_THREADS=2
> >> 	$ gcc -fopenmp -O3 exampleP12.c && ./a.out
> >> 	[...]
> >> 	4: T2 reads from variable sharedVar
> >> 	   T2 has value 21 from sharedVar
> >> 	   The value 21 is incorrect
> >> 	End of Program
> >>
> >> These results are for gcc 4.2 on i686/linux:
> >> 	$ gcc -v
> >> 	Using built-in specs.
> >> 	Target: i686-pc-linux-gnu
> >> 	Configured with: ../configure
> >> 	--prefix=/home/marcel/tmp/gcc-4.2-bin
> >> 	Thread model: posix
> >> 	gcc version 4.2.0 20070214 (prerelease)
> >>
> >> My questions:
> >>
> >> 	1) Please run this program with your non-gcc compiler
> >> 	   and report the results, compiler and platform. This
> >> 	   will give an idea of how other implementations
> >> 	   view optimization. Make sure to run with at least
> >> 	   two threads: export OMP_NUM_THREADS=2, otherwise
> >> 	   the program will hang.
> >>
> >> 	2) Do you agree that the program is a correct
> >> 	   implementation of the events described on page 12 of
> >> 	   the OMP specification?
> >>
> >> 	3) Do you agree that that sequence of events is hence
> >> 	   not sufficient to communicate a value between
> >> 	   threads because we do want the compiler to optimize,
> >> 	   hence the OMP specification is incorrect on this point?
> >>
> >> Thanks a lot,
> >> 	Marcel
> >>
> >> ===================================================================
> >>
> >>
> >> #include <stdio.h>
> >>
> >> /*
> >>   * Lightweight but inefficient sequencing mechanism between two
> running
> >>   * threads. Does not require OS or omp-library calls.
> >>   * Make sure that at least two threads exist by setting the
> environment
> >>   * variable OMP_NUM_THREADS=2.
> >>   */
> >> volatile int nextStep = 0 ;
> >> #define DONEXTSTEP(  x )    nextStep = x ;
> >> #define WAITFORSTEP( x )    while( nextStep != x ) { /*nothing */ }
> >>
> >> /* The shared variable written by T1 and read by T2 */
> >> int sharedVar = 21 ;
> >>
> >> /* Dynamic variable so compiler cannot derive its value */
> >> int cutOff = 500 ;
> >>
> >> int main( void ) {
> >>
> >>      printf( "Start of Program\n" ) ;
> >>
> >> #pragma omp parallel sections
> >>      {
> >>
> >> #pragma omp section
> >>          {   /* Start of T1 */
> >>
> >>              WAITFORSTEP( 10 ) ; /* Wait until T2 says to continue */
> >>              sharedVar = 42 ;
> >>              printf( "1: T1 writes 42 in variable sharedVar\n" ) ;
> >>
> >>              #pragma omp flush( sharedVar )
> >>              printf( "2: T1 has flushed variable sharedVar\n" ) ;
> >>              DONEXTSTEP( 30 ) ;  /* Tell T2 that write&flush is done
> */
> >>
> >>          }   /* End of thread T1 */
> >>
> >> #pragma omp section
> >>          {   /* Start of T2 */
> >>              int i, locVar ;
> >>
> >>              printf( "0: T2 sees value %d in variable sharedVar\n",
> >> sharedVar ) ;
> >>              #pragma omp flush( sharedVar )
> >>              printf( "0: T2 has flushed variable sharedVar\n" ) ;
> >>
> >>                          /* If the compiler decides to optimize and
> cache
> >>                           * the value of sharedVar in a register,
> this
> >> is the
> >>                           * place, before the loop, where sharedVar
> is
> >> read */
> >>              for( i = 0 ; i < 100 ; i++ ) {
> >>                  if( i == 0 ) {
> >>                          /* After potentially caching sharedVar, tell
> T1
> >>                           * to continue with its write&flush */
> >>                      DONEXTSTEP( 10 ) ;
> >>                  }
> >>                  if( i == 99 ) {
> >>                          /* Before exiting the loop and writing back
> the
> >>                           * cached but not written value, wait until
> T1
> >>                           * finished its flush of sharedVar */
> >>                      WAITFORSTEP( 30 ) ;
> >>                  }
> >>                  if( i > cutOff ) {
> >>                          /* This code is never executed but the
> >>                           * compiler cannot know that */
> >>                      sharedVar = i / 0 ;
> >>                  }
> >>                          /* After the loop, the potentially cached
> value
> >>                           * of sharedVar is written to memory */
> >>              }
> >>
> >>              #pragma omp flush( sharedVar )
> >>              printf( "3: T2 flushes variable sharedVar\n" ) ;
> >>
> >>              locVar = sharedVar ;
> >>              printf( "4: T2 reads from variable sharedVar\n" ) ;
> >>              printf( "   T2 has value %d from sharedVar\n", locVar )
> ;
> >>              if( locVar == 42 ) {
> >>                  printf( "   That is the right value\n" ) ;
> >>              } else {
> >>                  printf( "   The value %d is incorrect\n", locVar ) ;
> >>              }
> >>
> >>          } /* End of thread T2 */
> >>
> >>      }   /* End of parallel construct */
> >>      printf( "End of Program\n" ) ;
> >>      return 0 ;
> >> }
> >>
> > _______________________________________________
> > Omp mailing list
> > Omp at openmp.org
> > http://openmp.org/mailman/listinfo/omp
> >
> _______________________________________________
> Omp mailing list
> Omp at openmp.org
> http://openmp.org/mailman/listinfo/omp
> _______________________________________________
> Omp mailing list
> Omp at openmp.org
> http://openmp.org/mailman/listinfo/omp
> 



More information about the Omp mailing list