[ I noticed after the fact that I probably posted this to the wrong forum. My apologies.]
Regarding undefined semantics for data races:
I may well be misreading the spec (which would be good). To be concrete, if I write:
int i = x; // x global, shared, i local
if (i > 2) foo();
... // doesn't touch i
if (i > 2) bar();
In the event of a race, is it possible that exactly one of foo() and bar() will be called, i.e. that the results of "i > 2" will be inconsistent?
The C++ WP says "yes". Posix implicitly says "yes". I have been told by a gcc implementor that it could conceivably result in answer of "yes", though that's very hard to get in practice. (The issue is that if i gets spilled, the compiler may legitinately reload it from x, rather than storing it to the stack, since the compiler "knows" that i and x are copies.)
I read the OpenMP spec as saying this must not happen. I'd be happy to have been wrong on this one.
As far as atomicity guarantees: I know of no current machines that do not provide atomic byte loads and stores that do not touch surrounding memory. The first generation of Alphas were the canonical exception. They're no longer very interesting. If there is such a multiprocessor machine, it can't run Java very well. I expect it will not be able to run the next C++. For uniprocessors there are workarounds in any case. I suspect that most multiprocessor vendors are much more concerned with programmability than backwards compatibility with ancient history.
Yes, char arrays must work correctly. I don't think any parallel programming language is viable without that guarantee. The Java memory model effort made that decision early on with a minimum of dissent. I still haven't heard of problems in that area.
As you can tell, I'm all infavor of reopening this issue. In my mind, this is a showstopper for usability. And, incontrast to widely held beliefs, I don't think there is a serious implementation issue any more.
Data objects smaller than a byte, or not byte-aligned, are a different issue. The C++ working paper allows a store to a bit-field to rewrite any bit-fields in a contiguous sequence of (nonzero-length) bit-fields. I think that's the right definition. It is more stringent than what current compiler back-ends implement. But I think there are strong argument to fix that. And current measurements (on SPEC
) indicate the cost is minor.
This is also different from the issue with larger misaligned (but byte-aligned) data, or data larger than what's atomically updatable. I think that is real on some platforms, and is yet another reason that data races should result in completely undefined semantics. You cannot always be guaranteed to see either the new or the old value.