[Omp] Memory consistency contradiction between 2.5 specification and GCC
Marcel Beemster
marcel at ace.nl
Thu May 3 03:20:44 PDT 2007
We are looking into OpenMP integration in the CoSy compiler
framework in particular from the point of view of compiler
optimizations. We found a contradiction between the spec and
the optimizations of GCC4.2, and wonder who is right?
Looking at page 12 of the OpenMP 2.5 specification there is
an example of communication through a shared variable using flush.
It describes two threads T1 and T2 that perform the
following actions in this sequence:
0. The variable is flushed by T2 (assumption described in text)
1. The value is written to the variable by T1
2. The variable is flushed by T1
3. The variable is flushed by T2
4. The value is read from the variable by T2
After this, T2 should have the value that is written by T1.
This example is part of the definition of the flush and the
memory consistency model. Hence, the definition does not
allow the following sequence of actions by T2 (the reader):
0. The variable is flushed by T2 (assumption described in text)
a. T2 reads the variable and stores (caches) it in
a register for potential reading and writing.
b. Without modification, T2 writes the original value
from the register back to the variable
3. The variable is flushed by T2
4. The value is read from the variable by T2
If we interleave these actions of T2 in the original example, we
get:
a. T2 reads the variable and stores (caches) it in
a register for potential reading and writing.
1. The value is written to the variable by T1
2. The variable is flushed by T1
b. Without modification, T2 writes the original value
from the register back to the variable
3. The variable is flushed by T2
4. The value is read from the variable by T2
Since 'b' happens after T1's flush, the value read in 4 is not
the value written in 2.
So the spec does not allow this. To be precise, the spec does
not allow writing a variable from the cache/temporary view if
that variable has not been written while it was in the cache.
Yet this is precisely what some compiler optimizations
do. I don't care for GCC in particular, but I use that to
demonstrate. Using this source code, stored by itself in the
file write.c:
$ cat write.c
int privatevar, sharedvar ;
#pragma omp threadprivate( privatevar )
int maybeNotAWriter( int loopbound, int cutoff ) {
int loc ;
int i ;
for( i = 23 ; i < loopbound ; i++ ) {
if( cutoff < i ) {
privatevar = privatevar * i ;
sharedvar = sharedvar * i ;
}
}
return 0 ;
}
This could be code that is executed by T2. I have included
the threadprivate pragma for privatevar to show that the
compiler is really aware of OMP.
The code is structured in such a way that the compiler cannot
know if the update of sharedvar inside the 'if' inside the
'for' is done or not. If the function is called with the right
arguments, say '25' and '100', sharedvar is not written by T2.
Yet the compiler knows that access to global memory is expensive
and it will try to cache the value of sharedvar by reading it
into a register before the loop and writing it after the loop.
This creates points 'a' and 'b' of the extended T2.
I compile the program write.c with GCC 4.2:
$ gcc -v
Using built-in specs.
Target: i686-pc-linux-gnu
Configured with: ../configure --prefix=/home/marcel/tmp/gcc-4.2-bin
Thread model: posix
gcc version 4.2.0 20070214 (prerelease)
$ gcc -fopenmp -O3 -S write.c
$
Here is the assembly created, with my additional annotation
including the points where sharedvar is read into and flushed
out of T2's temporary view:
$ cat write.s
.file "write.c"
.text
.p2align 4,,15
.globl maybeNotAWriter
.type maybeNotAWriter, @function
maybeNotAWriter:
pushl %ebp
movl %esp, %ebp
pushl %edi
pushl %esi
movl 12(%ebp), %esi
pushl %ebx
movl 8(%ebp), %ebx
cmpl $23, %ebx
jle .L2
movl privatevar at INDNTPOFF, %edi <--- Offset of thread-
private var
movl $23, %eax
movl sharedvar, %ecx <------ Point 'a'
movl %gs:(%edi), %edx
.p2align 4,,7
.L4: <------ Start of loop
cmpl %eax, %esi
jge .L5 <------ 'if'
imull %eax, %edx
imull %eax, %ecx <------ Update of sharedvar
.L5:
addl $1, %eax
cmpl %ebx, %eax
jne .L4
movl %ecx, sharedvar <------ Point 'b'
movl %edx, %gs:(%edi)
.L2:
popl %ebx
xorl %eax, %eax
popl %esi
popl %edi
popl %ebp
ret
.size maybeNotAWriter, .-maybeNotAWriter
.tls_common privatevar,4,4
.comm sharedvar,4,4
.ident "GCC: (GNU) 4.2.0 20070214 (prerelease)"
.section .note.GNU-stack,"", at progbits
So, this should make the contradiction clear: while there are
run-time program flows that do not write to sharedvar, the code
created by GCC creates a temporary view of sharedvar and always
writes sharedvar at the end of the temporary view. This
optimization is incorrect according to the example on page 12 of
the spec.
My analysis:
The OpenMP specification has the intention to allow for compilers or
hardware to create temporary views on memory between flushes. It
says as much on page 10 at the beginning of Section 1.4.1.
Because of that intention, GCC is correct in the application of
its optimization.
However, the optimization renders the example of communication
between threads on page 12 incorrect. In fact, I see no way
to guarantee correctness of communication of shared variables
between parallel threads using flushes if such optimizations
are allowed.
Options:
1) Do not allow this optimization. Writing a temporary
view back to shared memory is only allowed if the
variable has actually been written. This would incur
a significant performance hit for generated code, as
all global variables have to be treated differently
by compilers compiling for OpenMP than by compilers
compiling for sequential processing (or even thread
programming.)
2) Remove the concept of flush from OpenMP as a means of
communicating between threads executing in parallel.
Instead, rely on shared variables explicitly
annotated with 'volatile' to do such communication.
Note that flush is still useful to guarantee memory
consistency before and after sequentializing
constructs, such a barrier.
My apologies for this long post, if you got this far:-) However
as the concepts are quite fundamental, I tried to be as clear as
possible.
Comments?
Thanks,
Marcel
--
Dr. Marcel Beemster, Senior Software Engineer, marcel at ace.nl,www.ace.nl
Associated Compiler Experts bv. Amsterdam, Netherlands. +31 20 6646416.
-----------------------------------------------------------------------
This e-mail and any files transmitted with it are confidential. Any
technical information contained herein is supplied as-is, and no rights
can be derived therefrom. If you have received this message in error,
please notify the sender by reply e-mail immediately, and delete the
message and all copies thereof.
More information about the Omp
mailing list