Based on an incomplete reading, it appears to me that the OpenMP memory model is completely inconsistent with the proposed C++ one (and pthreads) in several ways. This worries me since I expect that in many cases similar compielr back-ends will be used. Issues: (C++ refers to the C++ working paper N2461, not an approved standard.)
- Semantics of data races. C++ carefully defines what constitutes a data race (nothing here is implementation defined anymore). Programs with data races have undefined semantics. A rationale for this is given in http://www.open-std.org/jtc1/sc22/wg21/ ... n2176.html . Under some easily detectable restrictions, other programs have sequentially consistent semantics. This is consistent with pthreads. OpenMP gives some rules in 1.4.1 which are very unclear to me. It talks about an implementation-defined minimum size for atomicity (page 14, line 4). There is presumably also a maximum size, since some architectures cannot store or load say, 64-bit integers, atomically. In any case, it does not disallow data races, or provide a way to identify them to the compiler. See N2176 for a reason to do so. (An unspecified value is very different from undefined behavior here. Volatile won't work to identify races. See separate posting. C++ uses atomics to identify races. Simultaneous access to atomics is allowed, but is not defined as a data race.)
- 1.4.1 appears to allow updates of small variables (page 14, line 4) to rewrite adjacent memory in implementation-defined ways? (In what other sense might SMALL updates not be atomic?) Given that compilers are often allowed to reorder variables, and linkers often do, I think this is unusable in portable code, and needs to be pinned down at a minimum. (C++ limits it to contiguous bit-fields. Java disallows it. It doesn't have bit-fields.)
- I think that as it stands the set_lock and unset_lock primitives require a full fence since they're specified that way, and since races do not result in undefined behavior, this is detectable. On some architectures that would require an appreciably slower implementation than is customary, e.g. for pthreads.
- There are other issues with volatiles, discussed in a separate message.
Hans
