HOME
| OPENMP API Specification: "Version 5.2 -- GIT rev 95b2e3a44"

1.3  Execution Model

The OpenMP API uses the fork-join model of parallel execution. Multiple threads of execution perform tasks defined implicitly or explicitly by OpenMP directives. The OpenMP API is intended to support programs that will execute correctly both as parallel programs (multiple threads of execution and a full OpenMP support library) and as sequential programs (directives ignored and a simple OpenMP stubs library). However, a conforming OpenMP program may execute correctly as a parallel program but not as a sequential program, or may produce different results when executed as a parallel program compared to when it is executed as a sequential program. Further, using different numbers of threads may result in different numeric results because of changes in the association of numeric operations. For example, a serial addition reduction may have a different pattern of addition associations than a parallel reduction. These different associations may change the results of floating-point addition.

An OpenMP program begins as a single thread of execution, called an initial thread. An initial thread executes sequentially, as if the code encountered is part of an implicit task region, called an initial task region, that is generated by the implicit parallel region surrounding the whole program.

The thread that executes the implicit parallel region that surrounds the whole program executes on the host device. An implementation may support other devices besides the host device. If supported, these devices are available to the host device for offloading code and data. Each device has its own threads that are distinct from threads that execute on another device. Threads cannot migrate from one device to another device. Each device is identified by a device number. The device number for the host device is the value of the total number of non-host devices, while each non-host device has a unique device number that is greater than or equal to zero and less than the device number for the host device. Additionally, the constant omp_initial_device can be used as an alias for the host device and the constant omp_invalid_device can be used to specify an invalid device number. A conforming device number is either a non-negative integer that is less than or equal to omp_get_num_devices() or equal to omp_initial_device or omp_invalid_device.

When a target construct is encountered, a new target task is generated. The target task region encloses the target region. The target task is complete after the execution of the target region is complete.

When a target task executes, the enclosed target region is executed by an initial thread. The initial thread executes sequentially, as if the target region is part of an initial task region that is generated by an implicit parallel region. The initial thread may execute on the requested target device, if it is available and supported. If the target device does not exist or the implementation does not support it, all target regions associated with that device execute on the host device.

The implementation must ensure that the target region executes as if it were executed in the data environment of the target device unless an if clause is present and the if clause expression evaluates to false.

The teams construct creates a league of teams, where each team is an initial team that comprises an initial thread that executes the teams region. Each initial thread executes sequentially, as if the code encountered is part of an initial task region that is generated by an implicit parallel region associated with each team. Whether the initial threads concurrently execute the teams region is unspecified, and a program that relies on their concurrent execution for the purposes of synchronization may deadlock.

If a construct creates a data environment, the data environment is created at the time the construct is encountered. The description of a construct defines whether it creates a data environment.

When any thread encounters a parallel construct, the thread creates a team of itself and zero or more additional threads and becomes the primary thread of the new team. A set of implicit tasks, one per thread, is generated. The code for each task is defined by the code inside the parallel construct. Each task is assigned to a different thread in the team and becomes tied; that is, it is always executed by the thread to which it is initially assigned. The task region of the task being executed by the encountering thread is suspended, and each member of the new team executes its implicit task. An implicit barrier occurs at the end of the parallel region. Only the primary thread resumes execution beyond the end of the parallel construct, resuming the task region that was suspended upon encountering the parallel construct. Any number of parallel constructs can be specified in a single program.

parallel regions may be arbitrarily nested inside each other. If nested parallelism is disabled, or is not supported by the OpenMP implementation, then the new team that is created by a thread that encounters a parallel construct inside a parallel region will consist only of the encountering thread. However, if nested parallelism is supported and enabled, then the new team can consist of more than one thread. A parallel construct may include a proc_bind clause to specify the places to use for the threads in the team within the parallel region.

When any team encounters a worksharing construct, the work inside the construct is divided among the members of the team, and executed cooperatively instead of being executed by every thread. An implicit barrier occurs at the end of any region that corresponds to a worksharing construct for which the nowait clause is not specified. Redundant execution of code by every thread in the team resumes after the end of the worksharing construct.

When any thread encounters a task generating construct, one or more explicit tasks are generated. Execution of explicitly generated tasks is assigned to one of the threads in the current team, subject to the thread’s availability to execute work. Thus, execution of the new task could be immediate, or deferred until later according to task scheduling constraints and thread availability. Threads are allowed to suspend the current task region at a task scheduling point in order to execute a different task. If the suspended task region is for a tied task, the initially assigned thread later resumes execution of the suspended task region. If the suspended task region is for an untied task, then any thread may resume its execution. Completion of all explicit tasks bound to a given parallel region is guaranteed before the primary thread leaves the implicit barrier at the end of the region. Completion of a subset of all explicit tasks bound to a given parallel region may be specified through the use of task synchronization constructs. Completion of all explicit tasks bound to the implicit parallel region is guaranteed by the time the program exits.

When any thread encounters a simd construct, the iterations of the loop associated with the construct may be executed concurrently using the SIMD lanes that are available to the thread.

When a loop construct is encountered, the iterations of the loop associated with the construct are executed in the context of its encountering threads, as determined according to its binding region. If the loop region binds to a teams region, the region is encountered by the set of primary threads that execute the teams region. If the loop region binds to a parallel region, the region is encountered by the team of threads that execute the parallel region. Otherwise, the region is encountered by a single thread.

If the loop region binds to a teams region, the encountering threads may continue execution after the loop region without waiting for all iterations to complete; the iterations are guaranteed to complete before the end of the teams region. Otherwise, all iterations must complete before the encountering threads continue execution after the loop region. All threads that encounter the loop construct may participate in the execution of the iterations. Only one of these threads may execute any given iteration.

The cancel construct can alter the previously described flow of execution in an OpenMP region. The effect of the cancel construct depends on its construct-type-clause. If a task encounters a cancel construct with a taskgroup construct-type-clause, then the task activates cancellation and continues execution at the end of its task region, which implies completion of that task. Any other task in that taskgroup that has begun executing completes execution unless it encounters a cancellation point construct, in which case it continues execution at the end of its task region, which implies its completion. Other tasks in that taskgroup region that have not begun execution are aborted, which implies their completion.

For all other construct-type-clause values, if a thread encounters a cancel construct, it activates cancellation of the innermost enclosing region of the type specified and the thread continues execution at the end of that region. Threads check if cancellation has been activated for their region at cancellation points and, if so, also resume execution at the end of the canceled region.

If cancellation has been activated, regardless of construct-type-clause, threads that are waiting inside a barrier other than an implicit barrier at the end of the canceled region exit the barrier and resume execution at the end of the canceled region. This action can occur before the other threads reach that barrier.

When compile-time error termination is performed, the effect is as if an error directive for which sev-level is fatal and action-time is compilation is encountered. When runtime error termination is performed, the effect is as if an error directive for which sev-level is fatal and action-time is execution is encountered.

Synchronization constructs and library routines are available in the OpenMP API to coordinate tasks and data access in parallel regions. In addition, library routines and environment variables are available to control or to query the runtime environment of OpenMP programs.

The OpenMP specification makes no guarantee that input or output to the same file is synchronous when executed in parallel. In this case, the programmer is responsible for synchronizing input and output processing with the assistance of OpenMP synchronization constructs or library routines. For the case where each thread accesses a different file, the programmer does not need to synchronize access.

All concurrency semantics defined by the base language with respect to threads of execution apply to OpenMP threads, unless specified otherwise.