Although simple and effective, loop level parallelism is usually limited in its scalability because it typically leaves some constant fraction of sequential work in the program that by Amdahl's law can quickly overtake the gains from parallel execution. It is important however to distinguish between the type of parallelism (e.g. loop level versus coarse grained) and the programming model. The type of parallelism exposed in a program is dependent on the algorithm and data structures employed and not on the programming model. Therefore given a parallel algorithm and a scalable shared-memory architecture, a shared memory implementation scales as well as a message passing implementation.
OpenMP introduces the very powerful concept of orphan directives that greatly simplify the task of implementing coarse grain parallel algorithms. Orphan directives are directives encountered outside the lexical extent of the parallel region. Coarse grain parallel algorithms typically consist of a single parallel region with most of the execution taking place within that single region. In implementing a coarse grained parallel algorithm, it becomes desirable, and often necessary, to have the ability to specify control or synchronization from anywhere inside the parallel region, not from just the lexically contained portion. OpenMP provides this functionality by specifying binding rules for all directives and allowing them to be encountered dynamically in the call chain originating from the parallel region. In contrast, X3H5 does not allow directives to be orphaned, so all the control and synchronization for the program must be lexically visible in the parallel construct. This is highly restrictive to the programmer and makes any non-trivial coarse grained parallel application virtually impossible to write.