HOME
| OPENMP API Specification: Version 5.0 November 2018

2.9.2  Worksharing-Loop Construct

SummaryThe worksharing-loop construct specifies that the iterations of one or more associated loops will be executed in parallel by threads in the team in the context of their implicit tasks. The iterations are distributed across threads that already exist in the team that is executing the parallel region to which the worksharing-loop region binds.

Syntax

SVG-Viewer needed.

The syntax of the worksharing-loop construct is as follows:  

 
#pragma omp for [clause[ [,] clause] ... ] new-line 
    for-loops  

where clause is one of the following:  

 
private(list) 
firstprivate(list) 
lastprivate([ lastprivate-modifier:] list) 
linear(list[ : linear-step]) 
reduction([ reduction-modifier,]reduction-identifier : list) 
schedule([modifier [, modifier]:]kind[, chunk_size]) 
collapse(n) 
ordered[(n)] 
nowait 
allocate([allocator :]list) 
order(concurrent)  

The for directive places restrictions on the structure of all associated for-loops. Specifically, all associated for-loops must have canonical loop form (see Section 2.9.1 on page 271).

SVG-Viewer needed.

SVG-Viewer needed.

The syntax of the worksharing-loop construct is as follows:  

 
!$omp do [clause[ [,] clause] ... ] 
   do-loops 
[!$omp end do [nowait]]  

where clause is one of the following:  

 
private(list) 
firstprivate(list) 
lastprivate([ lastprivate-modifier:] list) 
linear(list[ : linear-step]) 
reduction([ reduction-modifier,]reduction-identifier : list) 
schedule([modifier [, modifier]:]kind[, chunk_size]) 
collapse(n) 
ordered[(n)] 
allocate([allocator :]list) 
order(concurrent)  

If an end do directive is not specified, an end do directive is assumed at the end of the do-loops.

The do directive places restrictions on the structure of all associated do-loops. Specifically, all associated do-loops must have canonical loop form (see Section 2.9.1 on page 271).

SVG-Viewer needed.

BindingThe binding thread set for a worksharing-loop region is the current team. A worksharing-loop region binds to the innermost enclosing parallel region. Only the threads of the team executing the binding parallel region participate in the execution of the loop iterations and the implied barrier of the worksharing-loop region if the barrier is not eliminated by a nowait clause.

DescriptionThe worksharing-loop construct is associated with a loop nest that consists of one or more loops that follow the directive. There is an implicit barrier at the end of a worksharing-loop construct unless a nowait clause is specified.

The collapse clause may be used to specify how many loops are associated with the worksharing-loop construct. The parameter of the collapse clause must be a constant positive integer expression. If a collapse clause is specified with a parameter value greater than 1, then the iterations of the associated loops to which the clause applies are collapsed into one larger iteration space that is then divided according to the schedule clause. The sequential execution of the iterations in these associated loops determines the order of the iterations in the collapsed iteration space. If no collapse clause is present or its parameter is 1, the only loop that is associated with the worksharing-loop construct for the purposes of determining how the iteration space is divided according to the schedule clause is the one that immediately follows the worksharing-loop directive.

If more than one loop is associated with the worksharing-loop construct then the number of times that any intervening code between any two associated loops will be executed is unspecified but will be at least once per iteration of the loop enclosing the intervening code and at most once per iteration of the innermost loop associated with the construct. If the iteration count of any loop that is associated with the worksharing-loop construct is zero and that loop does not enclose the intervening code, the behavior is unspecified.

The integer type (or kind, for Fortran) used to compute the iteration count for the collapsed loop is implementation defined.

A worksharing-loop has logical iterations numbered 0,1,...,N-1 where N is the number of loop iterations, and the logical numbering denotes the sequence in which the iterations would be executed if a set of associated loop(s) were executed sequentially. At the beginning of each logical iteration, the loop iteration variable of each associated loop has the value that it would have if the set of the associated loop(s) were executed sequentially. The schedule clause specifies how iterations of these associated loops are divided into contiguous non-empty subsets, called chunks, and how these chunks are distributed among threads of the team. Each thread executes its assigned chunk(s) in the context of its implicit task. The iterations of a given chunk are executed in sequential order by the assigned thread. The chunk_size expression is evaluated using the original list items of any variables that are made private in the worksharing-loop construct. It is unspecified whether, in what order, or how many times, any side effects of the evaluation of this expression occur. The use of a variable in a schedule clause expression of a worksharing-loop construct causes an implicit reference to the variable in all enclosing constructs.

Different worksharing-loop regions with the same schedule and iteration count, even if they occur in the same parallel region, can distribute iterations among threads differently. The only exception is for the static schedule as specified in Table 2.5. Programs that depend on which thread executes a particular iteration under any other circumstances are non-conforming.

See Section 2.9.2.1 on page 315 for details of how the schedule for a worksharing-loop region is determined.

The schedule kind can be one of those specified in Table 2.5.

The schedule modifier can be one of those specified in Table 2.6. If the static schedule kind is specified or if the ordered clause is specified, and if the nonmonotonic modifier is not specified, the effect is as if the monotonic modifier is specified. Otherwise, unless the monotonic modifier is specified, the effect is as if the nonmonotonic modifier is specified. If a schedule clause specifies a modifier then that modifier overrides any modifier that is specified in the run-sched-var ICV.

The ordered clause with the parameter may also be used to specify how many loops are associated with the worksharing-loop construct. The parameter of the ordered clause must be a constant positive integer expression if specified. The parameter of the ordered clause does not affect how the logical iteration space is then divided. If an ordered clause with the parameter is specified for the worksharing-loop construct, then those associated loops form a doacross loop nest.

If the value of the parameter in the collapse or ordered clause is larger than the number of nested loops following the construct, the behavior is unspecified.

If an order(concurrent) clause is present, then after assigning the iterations of the associated loops to their respective threads, as specified in Table 2.5, the iterations may be executed in any order, including concurrently.


Table 2.5: schedule Clause kind Values



static When kind is static, iterations are divided into chunks of size chunk_size, and the chunks are assigned to the threads in the team in a round-robin fashion in the order of the thread number. Each chunk contains chunk_size iterations, except for the chunk that contains the sequentially last iteration, which may have fewer iterations.
When no chunk_size is specified, the iteration space is divided into chunks that are approximately equal in size, and at most one chunk is distributed to each thread. The size of the chunks is unspecified in this case.
A compliant implementation of the static schedule must ensure that the same assignment of logical iteration numbers to threads will be used in two worksharing-loop regions if the following conditions are satisfied: 1) both worksharing-loop regions have the same number of loop iterations, 2) both worksharing-loop regions have the same value of chunk_size specified, or both worksharing-loop regions have no chunk_size specified, 3) both worksharing-loop regions bind to the same parallel region, and 4) neither loop is associated with a SIMD construct. A data dependence between the same logical iterations in two such loops is guaranteed to be satisfied allowing safe use of the nowait clause.
dynamic When kind is dynamic, the iterations are distributed to threads in the team in chunks. Each thread executes a chunk of iterations, then requests another chunk, until no chunks remain to be distributed.
Each chunk contains chunk_size iterations, except for the chunk that contains the sequentially last iteration, which may have fewer iterations.
When no chunk_size is specified, it defaults to 1.
guided When kind is guided, the iterations are assigned to threads in the team in chunks. Each thread executes a chunk of iterations, then requests another chunk, until no chunks remain to be assigned.
For a chunk_size of 1, the size of each chunk is proportional to the number of unassigned iterations divided by the number of threads in the team, decreasing to 1. For a chunk_size with value k (greater than 1), the size of each chunk is determined in the same way, with the restriction that the chunks do not contain fewer than k iterations (except for the chunk that contains the sequentially last iteration, which may have fewer than k iterations).
When no chunk_size is specified, it defaults to 1.
auto When kind is auto, the decision regarding scheduling is delegated to the compiler and/or runtime system. The programmer gives the implementation the freedom to choose any possible mapping of iterations to threads in the team.
runtime When kind is runtime, the decision regarding scheduling is deferred until run time, and the schedule and chunk size are taken from the run-sched-var ICV. If the ICV is set to auto, the schedule is implementation defined.


SVG-Viewer needed.

Note – For a team of p threads and a loop of n iterations, let ⌈⌈n∕p⌉⌉ be the integer q that satisfies n = p * q - r, with 0 <= r < p. One compliant implementation of the static schedule (with no specified chunk_size) would behave as though chunk_size had been specified with value q. Another compliant implementation would assign q iterations to the first p - r threads, and q - 1 iterations to the remaining r threads. This illustrates why a conforming program must not rely on the details of a particular implementation.

A compliant implementation of the guided schedule with a chunk_size value of k would assign q = ⌈⌈n∕p⌉⌉ iterations to the first available thread and set n to the larger of n - q and p * k. It would then repeat this process until q is greater than or equal to the number of remaining iterations, at which time the remaining iterations form the final chunk. Another compliant implementation could use the same method, except with q = ⌈⌈n∕(2p)⌉⌉, and set n to the larger of n - q and 2 * p * k.

SVG-Viewer needed.


Table 2.6: schedule Clause modifier Values


monotonic When the monotonic modifier is specified then each thread executes the chunks that it is assigned in increasing logical iteration order.
nonmonotonic When the nonmonotonic modifier is specified then chunks are assigned to threads in any order and the behavior of an application that depends on any execution order of the chunks is unspecified.
simd When the simd modifier is specified and the loop is associated with a SIMD construct, the chunk_size for all chunks except the first and last chunks is new_chunk_size = ⌈⌈chunk_size∕simd_width⌉⌉*simd_width where simd_width is an implementation-defined value. The first chunk will have at least new_chunk_size iterations except if it is also the last chunk. The last chunk may have fewer iterations than new_chunk_size. If the simd modifier is specified and the loop is not associated with a SIMD construct, the modifier is ignored.


Execution Model EventsThe ws-loop-begin event occurs after an implicit task encounters a worksharing-loop construct but before the task starts execution of the structured block of the worksharing-loop region. The ws-loop-end event occurs after a worksharing-loop region finishes execution but before resuming execution of the encountering task.

The ws-loop-iteration-begin event occurs once for each iteration of a worksharing-loop before the iteration is executed by an implicit task.

Tool CallbacksA thread dispatches a registered ompt_callback_work callback with ompt_scope_begin as its endpoint argument and work_loop as its wstype argument for each occurrence of a ws-loop-begin event in that thread. Similarly, a thread dispatches a registered ompt_callback_work callback with ompt_scope_end as its endpoint argument and work_loop as its wstype argument for each occurrence of a ws-loop-end event in that thread. The callbacks occur in the context of the implicit task. The callbacks have type signature ompt_callback_work_t. A thread dispatches a registered ompt_callback_dispatch callback for each occurrence of a ws-loop-iteration-begin event in that thread. The callback occurs in the context of the implicit task. The callback has type signature ompt_callback_dispatch_t.

RestrictionsRestrictions to the worksharing-loop construct are as follows:

SVG-Viewer needed.

SVG-Viewer needed.

SVG-Viewer needed.

SVG-Viewer needed.

Cross References


SVG-Viewer needed.


Figure 2.1: Determining the schedule for a Worksharing-Loop

2.9.2.1 Determining the Schedule of a Worksharing-Loop

When execution encounters a worksharing-loop directive, the schedule clause (if any) on the directive, and the run-sched-var and def-sched-var ICVs are used to determine how loop iterations are assigned to threads. See Section 2.5 on page 171 for details of how the values of the ICVs are determined. If the worksharing-loop directive does not have a schedule clause then the current value of the def-sched-var ICV determines the schedule. If the worksharing-loop directive has a schedule clause that specifies the runtime schedule kind then the current value of the run-sched-var ICV determines the schedule. Otherwise, the value of the schedule clause determines the schedule. Figure 2.1 describes how the schedule for a worksharing-loop is determined.

Cross References