[Omp] A question about OpenMP 2.5
Dieter an Mey
anmey at rz.rwth-aachen.de
Thu Mar 22 02:51:10 PDT 2007
I see what you say.
As a user I would expect that the compiler takes care of proper
alignment etc. to avoid these "false sharing" effects which could lead
to a data race.
I wonder in how far this can really cause any problems on the current
hardware and how this has been taken care of by the current OpenMP
compliant compilers.
I assume that the compiler has to guarantee and in many or all cases can
guarantee that elements can be aligned such that each two elements of a
structure, class etc. can be load and stored.
In Fortran you can try to force bad alignment by common blocks or
equivalence, which would be be programming practice anyway.
I tried to create such a bad case, but I was not "successful" yet.
I don't know in how far C/C++ programmers can do this (unions or so?)
The question is, can primitive datatypes be forced to be so badly
aligned that the compiler cannot generate single load/store instructions
for those data elements.
regards
Dieter
Greg Bronevetsky schrieb:
> The difference is evoked by the following example. Suppose that all memory
> operations operate at 4-byte granularity. The code in question is:
> char buf[BUF_SIZE];
> #pragma omp for
> for(i=0; i<BUF_SIZE; i++)
> buf[i] = ?;
> Suppose that buf[] is 4-byte-aligned, thread t gets iteration i=0 and
> thread r gets iteration i=1. t writes to address &buf, bringing the memory
> range [&buf - &buf+4] into its cache. r writes to &buf+1, also bringing
> the memory range [&buf - &buf+4] into its cache. When these cache lines
> are finally evicted, each contains data that the other does not. As such,
> regardless of which cache line we pick, we will lose data.
>
> In short, when the system moves data at 4-byte granularity, writes by
> multiple threads to the same 4-byte region are data races. It should be
> noted that the above is the reverse of Dieter's example. We're worrying
> about code that operates on memory locations of size x, while the hardware
> supports memory transfers of size y. If x>=y (Dieter's example), we have
> no problem. The problem is cases where x<y (the above example).
>
> Greg Bronevetsky
>
> On Wed, 21 Mar 2007, Dieter an Mey wrote:
>
>> Well, Bronis and Greg, I still don't see whether it should make any
>> difference to any potential data race, whether the "memory location"
>> which is spoiling my fun is written in bit or page atomicity by the
>> memory system of the hardware I am using.
>> The results are thus unspecified or broken and may be correct a
>> thousand times but may be wrong the 1001st time.
>>
>> I agree completely that there may be situations where it may be highly
>> desirable to know with which atomicity I have to deal with.
>>
>> For example on a Sparc system 64-bit floating point numbers may be
>> written or loaded by two 4-byte memory operations.
>>
>> And I would be happy to have an atomic directive for load and store
>> operations and not only for updates.
>>
>> best regards,
>> Dieter
>>
>>
>> Bronis R. de Supinski schrieb:
>>> Dieter and all:
>>>
>>> Re:
>>>> > If multiple threads write to the same ** memory location **
>>> What is a memory location? It is a central question to
>>> the memory model and is why Greg has said this has
>>> implications for the memory model.
>>>
>>>> > without synchronization, the resulting ** memory content **
>>>> > is unspecified. If at least one thread reads from
>>> Anything that says some memory location becomes "unspecified"
>>> is an issue for the memory model. The memory model must define
>>> what the state of memory is after any action (legal or not).
>>> In the case of a location becoming unspecified, it is equivalent
>>> to a write of that location of random value lambda. The memory
>>> model needs to state that this occurs.
>>>
>>>> > a shared ** memory location ** and at least one thread writes to
>>>> > it without
>>>> > synchronization, the value seen by any reading thread is
>>>> > unspecified.
>>> Currently, we have no precise definition of a memory
>>> location because stating that a memory location is more
>>> than one bit could imply that an implementation must
>>> write that much data atomically. In this case, we are
>>> not talking about the OpenMP "atomic" construct but
>>> hardware atomicity.
>>>
>>> Simply saying b is a pointer does not solve the problem.
>>> Consider a simple variant of Brad's example in which bit
>>> operations to write individual bits in a single byte. By
>>> the suggested "variable" definitions the code would still
>>> be correct. However, I know of no current hardware that
>>> provides atomic writes to individual bits. The reality
>>> is that writes to the same byte are a data race, even if
>>> the code describes them as array operations to distinct
>>> bits. I am certain our vendors would (rightly) oppose being
>>> required to make that code work.
>>>
>>> Note that it is not clear where to define the hardware
>>> aromicity level, which is why the specification has tried
>>> to avoid doing so. I could easily argue that the right
>>> level of write atomicity for a DSM implementation is at
>>> the page granularity. While I don't think anyone would
>>> accept that, it is very unclear where we stop. If Brad's
>>> example used a char array, does it work? I would hope so...
>>>
>>>> This text just describes the circumstances of a data race.
>>> Defining data races and what happens under them are the
>>> primary role of the memory model. The example demonstrates
>>> that we probably need to make some statement about the
>>> minimum level at which the programer can assume write
>>> atomicity (in the hardware sense). This is much bigger
>>> issue than what I had intended to cover in the memory
>>> model revisions, which was really just intended to be
>>> clarifications and consolidations.
>>>
>>> Bronis
>>>
>>>
>>>
>>>> regards
>>>> Dieter
>>>> >
>>>>
>>>> Brad Bell schrieb:
>>>>> I have a question about the OpenMP 2.5 standard
>>>>> http://www.openmp.org/drupal/mp-documents/spec25.pdf
>>>>>
>>>>> In Section 1.2.3 Data Terminology of spec25.pdf,
>>>>> the following text appears:
>>>>>
>>>>> variable
>>>>> A named data object, whose value can be defined and
>>>>> redefined during the execution of a program.
>>>>>
>>>>> Only an object that is not part of another object is
>>>>> considered a variable. For example, array elements,
>>>>> structure components, array sections and substrings
>>>>> are not considered variables.
>>>>>
>>>>>
>>>>> In Section 1.4.1 Structure of the OpenMP Memory Model of spec25.pdf,
>>>>> the following text appears:
>>>>>
>>>>> If multiple threads write to the same shared variable
>>>>> without synchronization, the resulting value of the variable
>>>>> in memory is unspecified. If at least one thread reads from
>>>>> a shared variable and at least one thread writes to it without
>>>>> synchronization, the value seen by any reading thread is unspecified.
>>>>>
>>>>> It appears to me that, given the text above, that Example A.1.1.c of
>>>>> in the OpenMP 2.5 standard is not correct (or at least misleading).
>>>>> Here is the code for that example:
>>>>>
>>>>> void a1(int n, float *a, float *b)
>>>>> {
>>>>> int i;
>>>>> #pragma omp parallel for
>>>>> for (i=1; i<n; i++) /* i is private by default */
>>>>> b[i] = (a[i] + a[i-1]) / 2.0;
>>>>> }
>>>>>
>>>>> 1. As I understand the parallel command above, different threads may
>>>>> execute
>>>>> the loop for different values of i.
>>>>>
>>>>> 2. As I understand, the variable b is a shared variable because it is
>>>>> defined before the loop.
>>>>>
>>>>> 3. The arguments b to the routine a1 may be an array, for example
>>>>> it may be declared in the calling program by
>>>>> float b[SIZE];
>>>>> where SIZE is any positive integer constant greater than or equal n.
>>>>>
>>>>> 4. In the case of 3 above, b is a variable, and b[i] is not a variable,
>>>>> hence multiple threads may be writing to the same variable; namely b.
>>>>>
>>>>> 5. Thus, in the case described above, the result of the loop is undefined.
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Omp mailing list
>>>>> Omp at openmp.org
>>>>> http://openmp.org/mailman/listinfo/omp
>>>>>
>>>> --
>>>> --------------------------------------------------------------------
>>>> Dieter an Mey
>>>> High Performance Computing Hochleistungsrechnen
>>>> RWTH Aachen University Rechen- und Kommunikations-
>>>> Center for Computing and Communication zentrum der RWTH Aachen
>>>> phone: ++49-(0)241-80-24377 Seffenter Weg 23
>>>> fax: ++49-(0)241-80-22134 52074 Aachen, Germany
>>>> email: anmey at rz.rwth-aachen.de
>>>> --------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Omp mailing list
>>>> Omp at openmp.org
>>>> http://openmp.org/mailman/listinfo/omp
>>>>
>> --
>> --------------------------------------------------------------------
>> Dieter an Mey
>> High Performance Computing Hochleistungsrechnen
>> RWTH Aachen University Rechen- und Kommunikations-
>> Center for Computing and Communication zentrum der RWTH Aachen
>> phone: ++49-(0)241-80-24377 Seffenter Weg 23
>> fax: ++49-(0)241-80-22134 52074 Aachen, Germany
>> email: anmey at rz.rwth-aachen.de
>> --------------------------------------------------------------------
>>
>> _______________________________________________
>> Omp mailing list
>> Omp at openmp.org
>> http://openmp.org/mailman/listinfo/omp
>>
>
>
>
>
--
--------------------------------------------------------------------
Dieter an Mey
High Performance Computing Hochleistungsrechnen
RWTH Aachen University Rechen- und Kommunikations-
Center for Computing and Communication zentrum der RWTH Aachen
phone: ++49-(0)241-80-24377 Seffenter Weg 23
fax: ++49-(0)241-80-22134 52074 Aachen, Germany
email: anmey at rz.rwth-aachen.de
--------------------------------------------------------------------
More information about the Omp
mailing list