[Omp] A question about OpenMP 2.5

Haab, Grant grant.haab at intel.com
Thu Mar 22 08:44:29 PDT 2007


Dieter,

The problem Greg is describing is not data alignment at all, but instead
what minimum data size can be used so that loads and stores are
performed atomically by the processor and memory system hardware.   Most
processors support byte-sized atomicity for regular loads and store, but
several have pointed out that the Alpha processors supported a minimum
of 4-byte atomicity.  

I know of no general-purpose processor that supports less than
byte-granularity loads and stores, because a byte is the minimum
addressable unit for most processors.  (I'm sure somebody will find a
counterexample though ;-)

I don't believe the compiler can easily fix this problem because C and
Fortran don't allow you to pad array elements to the minimum atomic
load/store size.  That would break unions, equvialence and the like, not
to mention make users very irate that their character array now takes 4
times more space!

- Grant



-----Original Message-----
From: omp-bounces at openmp.org [mailto:omp-bounces at openmp.org] On Behalf
Of Dieter an Mey
Sent: Thursday, March 22, 2007 4:51 AM
To: Greg Bronevetsky
Cc: omp at openmp.org
Subject: Re: [Omp] A question about OpenMP 2.5

I see what you say.
As a user I would expect that the compiler takes care of proper 
alignment etc. to avoid these "false sharing" effects which could lead 
to a data race.

I wonder in how far this can really cause any problems on the current 
hardware and how this has been taken care of by the current OpenMP 
compliant compilers.

I assume that the compiler has to guarantee and in many or all cases can

guarantee that elements can be aligned such that each two elements of a 
structure, class etc. can be load and stored.

In Fortran you can try to force bad alignment by common blocks or 
equivalence, which would be be programming practice anyway.
I tried to create such a bad case, but I was not "successful" yet.

I don't know in how far C/C++ programmers can do this (unions or so?)

The question is, can primitive datatypes be forced to be so badly 
aligned that the compiler cannot generate single load/store instructions

for those data elements.

regards
Dieter

Greg Bronevetsky schrieb:
> The difference is evoked by the following example. Suppose that all
memory
> operations operate at 4-byte granularity. The code in question is:
>    char buf[BUF_SIZE];
>    #pragma omp for
>    for(i=0; i<BUF_SIZE; i++)
>       buf[i] = ?; 
> Suppose that buf[] is 4-byte-aligned, thread t gets iteration i=0 and
> thread r gets iteration i=1. t writes to address &buf, bringing the
memory
> range [&buf - &buf+4] into its cache. r writes to &buf+1, also
bringing
> the memory range [&buf - &buf+4] into its cache. When these cache
lines
> are finally evicted, each contains data that the other does not. As
such,
> regardless of which cache line we pick, we will lose data.
> 
> In short, when the system moves data at 4-byte granularity, writes by
> multiple threads to the same 4-byte region are data races. It should
be
> noted that the above is the reverse of Dieter's example. We're
worrying
> about code that operates on memory locations of size x, while the
hardware
> supports memory transfers of size y. If x>=y (Dieter's example), we
have
> no problem. The problem is cases where x<y (the above example).
> 
>                              Greg Bronevetsky
> 
> On Wed, 21 Mar 2007, Dieter an Mey wrote:
> 
>> Well, Bronis and Greg, I still don't see whether it should make any 
>> difference to any potential data race, whether the "memory location" 
>> which is spoiling my fun is written in bit or page atomicity by the 
>> memory system of the hardware I am using.
>> The results are thus unspecified or broken and  may be correct a 
>> thousand times but may be wrong the 1001st time.
>>
>> I agree completely that there may be situations where it may be
highly 
>> desirable to know with which atomicity I have to deal with.
>>
>> For example on a Sparc system 64-bit floating point numbers may be 
>> written or loaded by two 4-byte memory operations.
>>
>> And I would be happy to have an atomic directive for load and store 
>> operations and not only for updates.
>>
>> best regards,
>> Dieter
>>
>>
>> Bronis R. de Supinski schrieb:
>>> Dieter and all:
>>>
>>> Re:
>>>>   >    If multiple threads write to the same ** memory location **
>>> What is a memory location? It is a central question to
>>> the memory model and is why Greg has said this has
>>> implications for the memory model.
>>>
>>>>   >    without synchronization, the resulting ** memory content **
>>>>   >    is unspecified. If at least one thread reads from
>>> Anything that says some memory location becomes "unspecified"
>>> is an issue for the memory model. The memory model must define
>>> what the state of memory is after any action (legal or not).
>>> In the case of a location becoming unspecified, it is equivalent
>>> to a write of that location of random value lambda. The memory
>>> model needs to state that this occurs.
>>>
>>>>   >    a shared ** memory location ** and at least one thread
writes to
>>>>   >    it without
>>>>   >    synchronization, the value seen by any reading thread is
>>>>   >    unspecified.
>>> Currently, we have no precise definition of a memory
>>> location because stating that a memory location is more
>>> than one bit could imply that an implementation must
>>> write that much data atomically. In this case, we are
>>> not talking about the OpenMP "atomic" construct but
>>> hardware atomicity.
>>>
>>> Simply saying b is a pointer does not solve the problem.
>>> Consider a simple variant of Brad's example in which bit
>>> operations to write individual bits in a single byte. By
>>> the suggested "variable" definitions the code would still
>>> be correct. However, I know of no current hardware that
>>> provides atomic writes to individual bits. The reality
>>> is that writes to the same byte are a data race, even if
>>> the code describes them as array operations to distinct
>>> bits. I am certain our vendors would (rightly) oppose being
>>> required to make that code work.
>>>
>>> Note that it is not clear where to define the hardware
>>> aromicity level, which is why the specification has tried
>>> to avoid doing so. I could easily argue that the right
>>> level of write atomicity for a DSM implementation is at
>>> the page granularity. While I don't think anyone would
>>> accept that, it is very unclear where we stop. If Brad's
>>> example used a char array, does it work? I would hope so...
>>>
>>>> This text just describes the circumstances of a data race.
>>> Defining data races and what happens under them are the
>>> primary role of the memory model. The example demonstrates
>>> that we probably need to make some statement about the
>>> minimum level at which the programer can assume write
>>> atomicity (in the hardware sense). This is much bigger
>>> issue than what I had intended to cover in the memory
>>> model revisions, which was really just intended to be
>>> clarifications and consolidations.
>>>
>>> Bronis
>>>
>>>
>>>
>>>> regards
>>>> Dieter
>>>>  >
>>>>
>>>> Brad Bell schrieb:
>>>>> I have a question about the OpenMP 2.5 standard
>>>>>     http://www.openmp.org/drupal/mp-documents/spec25.pdf
>>>>>
>>>>> In Section 1.2.3 Data Terminology of spec25.pdf,
>>>>> the following text appears:
>>>>>
>>>>>    variable
>>>>>    A named data object, whose value can be defined and
>>>>>    redefined during the execution of a program.
>>>>>
>>>>>    Only an object that is not part of another object is
>>>>>    considered a variable. For example, array elements,
>>>>>    structure components, array sections and substrings
>>>>>    are not considered variables.
>>>>>
>>>>>
>>>>> In Section 1.4.1 Structure of the OpenMP Memory Model of
spec25.pdf,
>>>>> the following text appears:
>>>>>
>>>>>    If multiple threads write to the same shared variable
>>>>>    without synchronization, the resulting value of the variable
>>>>>    in memory is unspecified. If at least one thread reads from
>>>>>    a shared variable and at least one thread writes to it without
>>>>>    synchronization, the value seen by any reading thread is
unspecified.
>>>>>
>>>>> It appears to me that, given the text above, that Example A.1.1.c
of
>>>>> in the OpenMP 2.5 standard is not correct (or at least
misleading).
>>>>> Here is the code for that example:
>>>>>
>>>>>     void a1(int n, float *a, float *b)
>>>>>     {
>>>>>         int i;
>>>>>     #pragma omp parallel for
>>>>>         for (i=1; i<n; i++) /* i is private by default */
>>>>>             b[i] = (a[i] + a[i-1]) / 2.0;
>>>>>     }
>>>>>
>>>>> 1. As I understand the parallel command above, different threads
may
>>>>> execute
>>>>> the loop for different values of i.
>>>>>
>>>>> 2. As I understand, the variable b is a shared variable because it
is
>>>>> defined before the loop.
>>>>>
>>>>> 3. The arguments b to the routine a1 may be an array, for example
>>>>> it may be declared in the calling program by
>>>>>     float b[SIZE];
>>>>> where SIZE is any positive integer constant greater than or equal
n.
>>>>>
>>>>> 4. In the case of 3 above, b is a variable, and b[i] is not a
variable,
>>>>> hence multiple threads may be writing to the same variable; namely
b.
>>>>>
>>>>> 5. Thus, in the case described above, the result of the loop is
undefined.
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Omp mailing list
>>>>> Omp at openmp.org
>>>>> http://openmp.org/mailman/listinfo/omp
>>>>>
>>>> --
>>>>
--------------------------------------------------------------------
>>>> Dieter an Mey
>>>> High Performance Computing               Hochleistungsrechnen
>>>> RWTH Aachen University                   Rechen- und
Kommunikations-
>>>> Center for Computing and Communication   zentrum der RWTH Aachen
>>>> phone: ++49-(0)241-80-24377              Seffenter Weg 23
>>>> fax:   ++49-(0)241-80-22134              52074 Aachen, Germany
>>>> email: anmey at rz.rwth-aachen.de
>>>>
--------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Omp mailing list
>>>> Omp at openmp.org
>>>> http://openmp.org/mailman/listinfo/omp
>>>>
>> -- 
>> --------------------------------------------------------------------
>> Dieter an Mey
>> High Performance Computing               Hochleistungsrechnen
>> RWTH Aachen University                   Rechen- und Kommunikations-
>> Center for Computing and Communication   zentrum der RWTH Aachen
>> phone: ++49-(0)241-80-24377              Seffenter Weg 23
>> fax:   ++49-(0)241-80-22134              52074 Aachen, Germany
>> email: anmey at rz.rwth-aachen.de
>> --------------------------------------------------------------------
>>
>> _______________________________________________
>> Omp mailing list
>> Omp at openmp.org
>> http://openmp.org/mailman/listinfo/omp
>>
> 
> 
> 
> 

-- 
--------------------------------------------------------------------
Dieter an Mey
High Performance Computing               Hochleistungsrechnen
RWTH Aachen University                   Rechen- und Kommunikations-
Center for Computing and Communication   zentrum der RWTH Aachen
phone: ++49-(0)241-80-24377              Seffenter Weg 23
fax:   ++49-(0)241-80-22134              52074 Aachen, Germany
email: anmey at rz.rwth-aachen.de
--------------------------------------------------------------------

_______________________________________________
Omp mailing list
Omp at openmp.org
http://openmp.org/mailman/listinfo/omp


More information about the Omp mailing list