HOME
| OPENMP API Specification: Version 5.1 November 2020

2.11.4  Worksharing-Loop Construct

Summary The worksharing-loop construct specifies that the iterations of one or more associated loops will be executed in parallel by threads in the team in the context of their implicit tasks. The iterations are distributed across threads that already exist in the team that is executing the parallel region to which the worksharing-loop region binds.

Syntax

SVG-Viewer needed.

The syntax of the worksharing-loop construct is as follows:  

 
#pragma omp for [clause[ [,] clause] ... ] new-line 
    loop-nest  

where loop-nest is a canonical loop nest and clause is one of the following:  

 
private(list) 
firstprivate(list) 
lastprivate([lastprivate-modifier:]list) 
linear(list[:linear-step]) 
reduction([reduction-modifier,]reduction-identifier:list) 
schedule([modifier [, modifier]:]kind[, chunk_size]) 
collapse(n) 
ordered[(n)] 
nowait 
allocate([allocator:]list) 
order([order-modifier:]concurrent)  

SVG-Viewer needed.

SVG-Viewer needed.

The syntax of the worksharing-loop construct is as follows:  

 
!$omp do [clause[ [,] clause] ... ] 
   loop-nest 
[!$omp end do [nowait]]  

where loop-nest is a canonical loop nest and clause is one of the following:  

 
private(list) 
firstprivate(list) 
lastprivate([lastprivate-modifier:]list) 
linear(list[:linear-step]) 
reduction([reduction-modifier,]reduction-identifier:list) 
schedule([modifier [, modifier]:]kind[, chunk_size]) 
collapse(n)  
 
ordered[(n)] 
allocate([allocator:]list) 
order([order-modifier:]concurrent)  

If an end do directive is not specified, an end do directive is assumed at the end of the do-loops.

SVG-Viewer needed.

Binding The binding thread set for a worksharing-loop region is the current team. A worksharing-loop region binds to the innermost enclosing parallel region. Only the threads of the team executing the binding parallel region participate in the execution of the loop iterations and the implied barrier of the worksharing-loop region when that barrier is not eliminated by a nowait clause.

Description An implicit barrier occurs at the end of a worksharing-loop region if a nowait clause is not specified.

The collapse and ordered clauses may be used to specify the number of loops from the loop nest that are associated with the worksharing-loop construct. If specified, their parameters must be constant positive integer expressions.

The collapse clause specifies the number of loops that are collapsed into a logical iteration space that is then divided according to the schedule clause. If the collapse clause is omitted, the behavior is as if a collapse clause with a parameter value of one was specified.

If the ordered clause is specified with parameter n then the n outer loops from the associated loop nest form a doacross loop nest. The parameter of the ordered clause does not affect how the logical iteration space is divided.

At the beginning of each logical iteration, the loop iteration variable or the variable declared by range-decl of each associated loop has the value that it would have if the set of the associated loops was executed sequentially. The schedule clause specifies how iterations of these associated loops are divided into contiguous non-empty subsets, called chunks, and how these chunks are distributed among threads of the team. Each thread executes its assigned chunks in the context of its implicit task. The iterations of a given chunk are executed in sequential order by the assigned thread. The chunk_size expression is evaluated using the original list items of any variables that are made private in the worksharing-loop construct. Whether, in what order, or how many times, any side effects of the evaluation of this expression occur is unspecified. The use of a variable in a schedule clause expression of a worksharing-loop construct causes an implicit reference to the variable in all enclosing constructs.

See Section 2.11.4.1 for details of how the schedule for a worksharing-loop region is determined.

The schedule kind can be one of those specified in Table 2.5.

The schedule modifier can be one of those specified in Table 2.6. If the static schedule kind is specified or if the ordered clause is specified, and if the nonmonotonic modifier is not specified, the effect is as if the monotonic modifier is specified. Otherwise, unless the monotonic modifier is specified, the effect is as if the nonmonotonic modifier is specified. If a schedule clause specifies a modifier then that modifier overrides any modifier that is specified in the run-sched-var ICV.

If an order clause is present then the semantics are as described in Section 2.11.3.

The schedule is reproducible if one of the following conditions is true:

Programs can only depend on which thread executes a particular iteration if the schedule is reproducible. Schedule reproducibility is also used for determining its consistency with other schedules (see Section 2.11.2).


Table 2.5: schedule Clause kind Values


static When kind is static, iterations are divided into chunks of size chunk_size, and the chunks are assigned to the threads in the team in a round-robin fashion in the order of the thread number. Each chunk contains chunk_size iterations, except for the chunk that contains the sequentially last iteration, which may have fewer iterations.
When no chunk_size is specified, the iteration space is divided into chunks that are approximately equal in size, and at most one chunk is distributed to each thread. The size of the chunks is unspecified in this case.
dynamic When kind is dynamic, the iterations are distributed to threads in the team in chunks. Each thread executes a chunk of iterations, then requests another chunk, until no chunks remain to be distributed.
Each chunk contains chunk_size iterations, except for the chunk that contains the sequentially last iteration, which may have fewer iterations.
When no chunk_size is specified, it defaults to 1.
guided When kind is guided, the iterations are assigned to threads in the team in chunks. Each thread executes a chunk of iterations, then requests another chunk, until no chunks remain to be assigned.
For a chunk_size of 1, the size of each chunk is proportional to the number of unassigned iterations divided by the number of threads in the team, decreasing to 1. For a chunk_size with value k (greater than 1), the size of each chunk is determined in the same way, with the restriction that the chunks do not contain fewer than k iterations (except for the chunk that contains the sequentially last iteration, which may have fewer than k iterations).
When no chunk_size is specified, it defaults to 1.
auto When kind is auto, the decision regarding scheduling is delegated to the compiler and/or runtime system. The programmer gives the implementation the freedom to choose any possible mapping of iterations to threads in the team.
runtime When kind is runtime, the decision regarding scheduling is deferred until run time, and the schedule and chunk size are taken from the run-sched-var ICV. If the ICV is set to auto, the schedule is implementation defined.


SVG-Viewer needed.

Note – For a team of p threads and a loop of n iterations, let ⌈⌈n∕p⌉⌉ be the integer q that satisfies n = p * q - r, with 0 <= r < p. One compliant implementation of the static schedule (with no specified chunk_size) would behave as though chunk_size had been specified with value q. Another compliant implementation would assign q iterations to the first p - r threads, and q - 1 iterations to the remaining r threads. This illustrates why a conforming program must not rely on the details of a particular implementation.

A compliant implementation of the guided schedule with a chunk_size value of k would assign q = ⌈⌈n∕p⌉⌉ iterations to the first available thread and set n to the larger of n - q and p * k. It would then repeat this process until q is greater than or equal to the number of remaining iterations, at which time the remaining iterations form the final chunk. Another compliant implementation could use the same method, except with q = ⌈⌈n∕(2p)⌉⌉, and set n to the larger of n - q and 2 * p * k.

SVG-Viewer needed.


Table 2.6: schedule Clause modifier Values


monotonic When the monotonic modifier is specified then each thread executes the chunks that it is assigned in increasing logical iteration order.
nonmonotonic When the nonmonotonic modifier is specified then chunks are assigned to threads in any order and the behavior of an application that depends on any execution order of the chunks is unspecified.
simd When the simd modifier is specified and the loop is associated with a SIMD construct, the chunk_size for all chunks except the first and last chunks is new_chunk_size = ⌈⌈chunk_size∕simd_width⌉⌉*simd_width where simd_width is an implementation-defined value. The first chunk will have at least new_chunk_size iterations except if it is also the last chunk. The last chunk may have fewer iterations than new_chunk_size. If the simd modifier is specified and the loop is not associated with a SIMD construct, the modifier is ignored.


Execution Model Events The ws-loop-begin event occurs after an implicit task encounters a worksharing-loop construct but before the task starts execution of the structured block of the worksharing-loop region.

The ws-loop-end event occurs after a worksharing-loop region finishes execution but before resuming execution of the encountering task.

The ws-loop-iteration-begin event occurs once for each iteration of a worksharing-loop before the iteration is executed by an implicit task.

Tool Callbacks A thread dispatches a registered ompt_callback_work callback with ompt_scope_begin as its endpoint argument and work_loop as its wstype argument for each occurrence of a ws-loop-begin event in that thread. Similarly, a thread dispatches a registered ompt_callback_work callback with ompt_scope_end as its endpoint argument and work_loop as its wstype argument for each occurrence of a ws-loop-end event in that thread. The callbacks occur in the context of the implicit task. The callbacks have type signature ompt_callback_work_t.

A thread dispatches a registered ompt_callback_dispatch callback for each occurrence of a ws-loop-iteration-begin event in that thread. The callback occurs in the context of the implicit task. The callback has type signature ompt_callback_dispatch_t.

Restrictions Restrictions to the worksharing-loop construct are as follows:

SVG-Viewer needed.

SVG-Viewer needed.

SVG-Viewer needed.

SVG-Viewer needed.

Cross References


PIC
Figure 2.1: Determining the schedule for a Worksharing-Loop

2.11.4.1 Determining the Schedule of a Worksharing-Loop

When execution encounters a worksharing-loop directive, the schedule clause (if any) on the directive, and the run-sched-var and def-sched-var ICVs are used to determine how loop iterations are assigned to threads. See Section 2.4 for details of how the values of the ICVs are determined. If the worksharing-loop directive does not have a schedule clause then the current value of the def-sched-var ICV determines the schedule. If the worksharing-loop directive has a schedule clause that specifies the runtime schedule kind then the current value of the run-sched-var ICV determines the schedule. Otherwise, the value of the schedule clause determines the schedule. Figure 2.1 describes how the schedule for a worksharing-loop is determined.

Cross References