[Omp] A question about OpenMP 2.5
Dieter an Mey
anmey at rz.rwth-aachen.de
Fri Mar 23 02:02:32 PDT 2007
Hmm.
I would be curious, if anyone can really demonstrate that problem with
an existing compiler and hardware.
I was not able to provoke it.
But, yes, I admit that we need to put something in the sepcs.
Dieter
Haab, Grant schrieb:
> Dieter,
>
> I do see your point about unaligned data now.
>
> Both the Compaq Alpha Compilers and KAP/Pro toolset compilers supported
> access to 1-byte data types with OpenMP compilation.
>
> I believe the Alpha Unix OS (can't remember the name) would issue
> unaligned access messages whenever a piece of data was accessed
> unaligned to a four-byte boundary. The code would still run correctly,
> but the unaligned accesses were very slow compared to the aligned ones.
>
> Greg's example code with the byte array and your example of an unaligned
> four-byte data would have data races for either OpenMP compiler, based
> on experiments I did years ago.
>
> Finally, an OpenMP implementation that doesn't allow 2-byte datatypes
> would be a nightmare to port codes to.
>
> So I respectfully disagree with both your feeling (1st paragraph) and
> guess (3rd paragraph) below.
>
> - Grant
>
> -----Original Message-----
> From: Dieter an Mey [mailto:anmey at rz.rwth-aachen.de]
> Sent: Thursday, March 22, 2007 12:07 PM
> To: Haab, Grant
> Cc: Greg Bronevetsky; omp at openmp.org
> Subject: Re: [Omp] A question about OpenMP 2.5
>
> Me feeling is that the (OpenMP) compilers we are looking at don't really
>
> have any problem if they follow the language specifications.
>
> I never used an Alpha processor.
> But imagine you want to access 4-byte data which are not aligned to a 4
>
> byte boundary, then you will probably run into such a problem, if the
> processor is only able to load and store 4 byte aligned data.
>
> Thinking about Fortran, I would guess that an OpenMP compiler for such a
>
> processor will not support any 2-byte datatypes at the same time.
> Because otherwise you could by bad programming style (common,
> equivalence) force 4-byte data on 2-byte boundaries and run into that
> problem.
>
> regards,
> Dieter
>
>
>
> Haab, Grant schrieb:
>> Dieter,
>>
>> The problem Greg is describing is not data alignment at all, but
> instead
>> what minimum data size can be used so that loads and stores are
>> performed atomically by the processor and memory system hardware.
> Most
>> processors support byte-sized atomicity for regular loads and store,
> but
>> several have pointed out that the Alpha processors supported a minimum
>> of 4-byte atomicity.
>>
>> I know of no general-purpose processor that supports less than
>> byte-granularity loads and stores, because a byte is the minimum
>> addressable unit for most processors. (I'm sure somebody will find a
>> counterexample though ;-)
>>
>> I don't believe the compiler can easily fix this problem because C and
>> Fortran don't allow you to pad array elements to the minimum atomic
>> load/store size. That would break unions, equvialence and the like,
> not
>> to mention make users very irate that their character array now takes
> 4
>> times more space!
>>
>> - Grant
>>
>>
>>
>> -----Original Message-----
>> From: omp-bounces at openmp.org [mailto:omp-bounces at openmp.org] On Behalf
>> Of Dieter an Mey
>> Sent: Thursday, March 22, 2007 4:51 AM
>> To: Greg Bronevetsky
>> Cc: omp at openmp.org
>> Subject: Re: [Omp] A question about OpenMP 2.5
>>
>> I see what you say.
>> As a user I would expect that the compiler takes care of proper
>> alignment etc. to avoid these "false sharing" effects which could lead
>
>> to a data race.
>>
>> I wonder in how far this can really cause any problems on the current
>> hardware and how this has been taken care of by the current OpenMP
>> compliant compilers.
>>
>> I assume that the compiler has to guarantee and in many or all cases
> can
>> guarantee that elements can be aligned such that each two elements of
> a
>> structure, class etc. can be load and stored.
>>
>> In Fortran you can try to force bad alignment by common blocks or
>> equivalence, which would be be programming practice anyway.
>> I tried to create such a bad case, but I was not "successful" yet.
>>
>> I don't know in how far C/C++ programmers can do this (unions or so?)
>>
>> The question is, can primitive datatypes be forced to be so badly
>> aligned that the compiler cannot generate single load/store
> instructions
>> for those data elements.
>>
>> regards
>> Dieter
>>
>> Greg Bronevetsky schrieb:
>>> The difference is evoked by the following example. Suppose that all
>> memory
>>> operations operate at 4-byte granularity. The code in question is:
>>> char buf[BUF_SIZE];
>>> #pragma omp for
>>> for(i=0; i<BUF_SIZE; i++)
>>> buf[i] = ?;
>>> Suppose that buf[] is 4-byte-aligned, thread t gets iteration i=0 and
>>> thread r gets iteration i=1. t writes to address &buf, bringing the
>> memory
>>> range [&buf - &buf+4] into its cache. r writes to &buf+1, also
>> bringing
>>> the memory range [&buf - &buf+4] into its cache. When these cache
>> lines
>>> are finally evicted, each contains data that the other does not. As
>> such,
>>> regardless of which cache line we pick, we will lose data.
>>>
>>> In short, when the system moves data at 4-byte granularity, writes by
>>> multiple threads to the same 4-byte region are data races. It should
>> be
>>> noted that the above is the reverse of Dieter's example. We're
>> worrying
>>> about code that operates on memory locations of size x, while the
>> hardware
>>> supports memory transfers of size y. If x>=y (Dieter's example), we
>> have
>>> no problem. The problem is cases where x<y (the above example).
>>>
>>> Greg Bronevetsky
>>>
>>> On Wed, 21 Mar 2007, Dieter an Mey wrote:
>>>
>>>> Well, Bronis and Greg, I still don't see whether it should make any
>>>> difference to any potential data race, whether the "memory location"
>
>>>> which is spoiling my fun is written in bit or page atomicity by the
>>>> memory system of the hardware I am using.
>>>> The results are thus unspecified or broken and may be correct a
>>>> thousand times but may be wrong the 1001st time.
>>>>
>>>> I agree completely that there may be situations where it may be
>> highly
>>>> desirable to know with which atomicity I have to deal with.
>>>>
>>>> For example on a Sparc system 64-bit floating point numbers may be
>>>> written or loaded by two 4-byte memory operations.
>>>>
>>>> And I would be happy to have an atomic directive for load and store
>>>> operations and not only for updates.
>>>>
>>>> best regards,
>>>> Dieter
>>>>
>>>>
>>>> Bronis R. de Supinski schrieb:
>>>>> Dieter and all:
>>>>>
>>>>> Re:
>>>>>> > If multiple threads write to the same ** memory location **
>>>>> What is a memory location? It is a central question to
>>>>> the memory model and is why Greg has said this has
>>>>> implications for the memory model.
>>>>>
>>>>>> > without synchronization, the resulting ** memory content **
>>>>>> > is unspecified. If at least one thread reads from
>>>>> Anything that says some memory location becomes "unspecified"
>>>>> is an issue for the memory model. The memory model must define
>>>>> what the state of memory is after any action (legal or not).
>>>>> In the case of a location becoming unspecified, it is equivalent
>>>>> to a write of that location of random value lambda. The memory
>>>>> model needs to state that this occurs.
>>>>>
>>>>>> > a shared ** memory location ** and at least one thread
>> writes to
>>>>>> > it without
>>>>>> > synchronization, the value seen by any reading thread is
>>>>>> > unspecified.
>>>>> Currently, we have no precise definition of a memory
>>>>> location because stating that a memory location is more
>>>>> than one bit could imply that an implementation must
>>>>> write that much data atomically. In this case, we are
>>>>> not talking about the OpenMP "atomic" construct but
>>>>> hardware atomicity.
>>>>>
>>>>> Simply saying b is a pointer does not solve the problem.
>>>>> Consider a simple variant of Brad's example in which bit
>>>>> operations to write individual bits in a single byte. By
>>>>> the suggested "variable" definitions the code would still
>>>>> be correct. However, I know of no current hardware that
>>>>> provides atomic writes to individual bits. The reality
>>>>> is that writes to the same byte are a data race, even if
>>>>> the code describes them as array operations to distinct
>>>>> bits. I am certain our vendors would (rightly) oppose being
>>>>> required to make that code work.
>>>>>
>>>>> Note that it is not clear where to define the hardware
>>>>> aromicity level, which is why the specification has tried
>>>>> to avoid doing so. I could easily argue that the right
>>>>> level of write atomicity for a DSM implementation is at
>>>>> the page granularity. While I don't think anyone would
>>>>> accept that, it is very unclear where we stop. If Brad's
>>>>> example used a char array, does it work? I would hope so...
>>>>>
>>>>>> This text just describes the circumstances of a data race.
>>>>> Defining data races and what happens under them are the
>>>>> primary role of the memory model. The example demonstrates
>>>>> that we probably need to make some statement about the
>>>>> minimum level at which the programer can assume write
>>>>> atomicity (in the hardware sense). This is much bigger
>>>>> issue than what I had intended to cover in the memory
>>>>> model revisions, which was really just intended to be
>>>>> clarifications and consolidations.
>>>>>
>>>>> Bronis
>>>>>
>>>>>
>>>>>
>>>>>> regards
>>>>>> Dieter
>>>>>> >
>>>>>>
>>>>>> Brad Bell schrieb:
>>>>>>> I have a question about the OpenMP 2.5 standard
>>>>>>> http://www.openmp.org/drupal/mp-documents/spec25.pdf
>>>>>>>
>>>>>>> In Section 1.2.3 Data Terminology of spec25.pdf,
>>>>>>> the following text appears:
>>>>>>>
>>>>>>> variable
>>>>>>> A named data object, whose value can be defined and
>>>>>>> redefined during the execution of a program.
>>>>>>>
>>>>>>> Only an object that is not part of another object is
>>>>>>> considered a variable. For example, array elements,
>>>>>>> structure components, array sections and substrings
>>>>>>> are not considered variables.
>>>>>>>
>>>>>>>
>>>>>>> In Section 1.4.1 Structure of the OpenMP Memory Model of
>> spec25.pdf,
>>>>>>> the following text appears:
>>>>>>>
>>>>>>> If multiple threads write to the same shared variable
>>>>>>> without synchronization, the resulting value of the variable
>>>>>>> in memory is unspecified. If at least one thread reads from
>>>>>>> a shared variable and at least one thread writes to it without
>>>>>>> synchronization, the value seen by any reading thread is
>> unspecified.
>>>>>>> It appears to me that, given the text above, that Example A.1.1.c
>> of
>>>>>>> in the OpenMP 2.5 standard is not correct (or at least
>> misleading).
>>>>>>> Here is the code for that example:
>>>>>>>
>>>>>>> void a1(int n, float *a, float *b)
>>>>>>> {
>>>>>>> int i;
>>>>>>> #pragma omp parallel for
>>>>>>> for (i=1; i<n; i++) /* i is private by default */
>>>>>>> b[i] = (a[i] + a[i-1]) / 2.0;
>>>>>>> }
>>>>>>>
>>>>>>> 1. As I understand the parallel command above, different threads
>> may
>>>>>>> execute
>>>>>>> the loop for different values of i.
>>>>>>>
>>>>>>> 2. As I understand, the variable b is a shared variable because
> it
>> is
>>>>>>> defined before the loop.
>>>>>>>
>>>>>>> 3. The arguments b to the routine a1 may be an array, for example
>>>>>>> it may be declared in the calling program by
>>>>>>> float b[SIZE];
>>>>>>> where SIZE is any positive integer constant greater than or equal
>> n.
>>>>>>> 4. In the case of 3 above, b is a variable, and b[i] is not a
>> variable,
>>>>>>> hence multiple threads may be writing to the same variable;
> namely
>> b.
>>>>>>> 5. Thus, in the case described above, the result of the loop is
>> undefined.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Omp mailing list
>>>>>>> Omp at openmp.org
>>>>>>> http://openmp.org/mailman/listinfo/omp
>>>>>>>
>>>>>> --
>>>>>>
>> --------------------------------------------------------------------
>>>>>> Dieter an Mey
>>>>>> High Performance Computing Hochleistungsrechnen
>>>>>> RWTH Aachen University Rechen- und
>> Kommunikations-
>>>>>> Center for Computing and Communication zentrum der RWTH Aachen
>>>>>> phone: ++49-(0)241-80-24377 Seffenter Weg 23
>>>>>> fax: ++49-(0)241-80-22134 52074 Aachen, Germany
>>>>>> email: anmey at rz.rwth-aachen.de
>>>>>>
>> --------------------------------------------------------------------
>>>>>> _______________________________________________
>>>>>> Omp mailing list
>>>>>> Omp at openmp.org
>>>>>> http://openmp.org/mailman/listinfo/omp
>>>>>>
>>>> --
>>>> --------------------------------------------------------------------
>>>> Dieter an Mey
>>>> High Performance Computing Hochleistungsrechnen
>>>> RWTH Aachen University Rechen- und Kommunikations-
>>>> Center for Computing and Communication zentrum der RWTH Aachen
>>>> phone: ++49-(0)241-80-24377 Seffenter Weg 23
>>>> fax: ++49-(0)241-80-22134 52074 Aachen, Germany
>>>> email: anmey at rz.rwth-aachen.de
>>>> --------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Omp mailing list
>>>> Omp at openmp.org
>>>> http://openmp.org/mailman/listinfo/omp
>>>>
>>>
>>>
>
--
--------------------------------------------------------------------
Dieter an Mey
High Performance Computing Hochleistungsrechnen
RWTH Aachen University Rechen- und Kommunikations-
Center for Computing and Communication zentrum der RWTH Aachen
phone: ++49-(0)241-80-24377 Seffenter Weg 23
fax: ++49-(0)241-80-22134 52074 Aachen, Germany
email: anmey at rz.rwth-aachen.de
--------------------------------------------------------------------
More information about the Omp
mailing list