SummaryThe worksharing-loop construct specifies that the iterations of one or more associated loops will
be executed in parallel by threads in the team in the context of their implicit tasks. The iterations are
distributed across threads that already exist in the team that is executing the parallel region to which the
worksharing-loop region binds.
Syntax
The syntax of the worksharing-loop construct is as follows:
The for directive places restrictions on the structure of all associated for-loops. Specifically, all associated
for-loops must have canonical loop form (see Section 2.9.1 on page 271).
The syntax of the worksharing-loop construct is as follows:
If an enddo directive is not specified, an enddo directive is assumed at the end of the do-loops.
The do directive places restrictions on the structure of all associated do-loops. Specifically, all associated
do-loops must have canonical loop form (see Section 2.9.1 on page 271).
BindingThe binding thread set for a worksharing-loop region is the current team. A worksharing-loop
region binds to the innermost enclosing parallel region. Only the threads of the team executing the
binding parallel region participate in the execution of the loop iterations and the implied barrier of the
worksharing-loop region if the barrier is not eliminated by a nowait clause.
DescriptionThe worksharing-loop construct is associated with a loop nest that consists of one or more loops
that follow the directive.
There is an implicit barrier at the end of a worksharing-loop construct unless a nowait clause is
specified.
The collapse clause may be used to specify how many loops are associated with the worksharing-loop
construct. The parameter of the collapse clause must be a constant positive integer expression. If a
collapse clause is specified with a parameter value greater than 1, then the iterations of the
associated loops to which the clause applies are collapsed into one larger iteration space that is then
divided according to the schedule clause. The sequential execution of the iterations in these
associated loops determines the order of the iterations in the collapsed iteration space. If no
collapse clause is present or its parameter is 1, the only loop that is associated with the
worksharing-loop construct for the purposes of determining how the iteration space is divided
according to the schedule clause is the one that immediately follows the worksharing-loop
directive.
If more than one loop is associated with the worksharing-loop construct then the number of times that any
intervening code between any two associated loops will be executed is unspecified but will be at least once
per iteration of the loop enclosing the intervening code and at most once per iteration of the innermost
loop associated with the construct. If the iteration count of any loop that is associated with the
worksharing-loop construct is zero and that loop does not enclose the intervening code, the behavior is
unspecified.
The integer type (or kind, for Fortran) used to compute the iteration count for the collapsed loop is
implementation defined.
A worksharing-loop has logical iterations numbered 0,1,...,N-1 where N is the number of loop iterations,
and the logical numbering denotes the sequence in which the iterations would be executed if a set of
associated loop(s) were executed sequentially. At the beginning of each logical iteration, the loop iteration
variable of each associated loop has the value that it would have if the set of the associated loop(s) were
executed sequentially. The schedule clause specifies how iterations of these associated loops are divided
into contiguous non-empty subsets, called chunks, and how these chunks are distributed among threads of
the team. Each thread executes its assigned chunk(s) in the context of its implicit task. The iterations of a
given chunk are executed in sequential order by the assigned thread. The chunk_size expression is
evaluated using the original list items of any variables that are made private in the worksharing-loop
construct. It is unspecified whether, in what order, or how many times, any side effects of the
evaluation of this expression occur. The use of a variable in a schedule clause expression
of a worksharing-loop construct causes an implicit reference to the variable in all enclosing
constructs.
Different worksharing-loop regions with the same schedule and iteration count, even if they occur in the
same parallel region, can distribute iterations among threads differently. The only exception is for the
static schedule as specified in Table 2.5. Programs that depend on which thread executes a particular
iteration under any other circumstances are non-conforming.
See Section 2.9.2.1 on page 315 for details of how the schedule for a worksharing-loop region is
determined.
The schedule kind can be one of those specified in Table 2.5.
The schedule modifier can be one of those specified in Table 2.6. If the static schedule kind is specified
or if the ordered clause is specified, and if the nonmonotonic modifier is not specified, the effect is
as if the monotonic modifier is specified. Otherwise, unless the monotonic modifier is
specified, the effect is as if the nonmonotonic modifier is specified. If a schedule clause
specifies a modifier then that modifier overrides any modifier that is specified in the run-sched-var
ICV.
The ordered clause with the parameter may also be used to specify how many loops are associated with
the worksharing-loop construct. The parameter of the ordered clause must be a constant
positive integer expression if specified. The parameter of the ordered clause does not affect
how the logical iteration space is then divided. If an ordered clause with the parameter is
specified for the worksharing-loop construct, then those associated loops form a doacross loop
nest.
If the value of the parameter in the collapse or ordered clause is larger than the number of nested
loops following the construct, the behavior is unspecified.
If an order(concurrent) clause is present, then after assigning the iterations of the associated loops to
their respective threads, as specified in Table 2.5, the iterations may be executed in any order, including
concurrently.
Table 2.5:
schedule Clause kind Values
static
When kind is static, iterations are divided into chunks of size chunk_size,
and the chunks are assigned to the threads in the team in a round-robin
fashion in the order of the thread number. Each chunk contains chunk_size
iterations, except for the chunk that contains the sequentially last iteration,
which may have fewer iterations.
When no chunk_size is specified, the iteration space is divided into chunks
that are approximately equal in size, and at most one chunk is distributed to
each thread. The size of the chunks is unspecified in this case.
A compliant implementation of the static schedule must ensure that
the same assignment of logical iteration numbers to threads will be used in
two worksharing-loop regions if the following conditions are satisfied: 1)
both worksharing-loop regions have the same number of loop iterations, 2)
both worksharing-loop regions have the same value of chunk_size specified,
or both worksharing-loop regions have no chunk_size specified, 3) both
worksharing-loop regions bind to the same parallel region, and 4) neither
loop is associated with a SIMD construct. A data dependence between
the same logical iterations in two such loops is guaranteed to be satisfied
allowing safe use of the nowait clause.
dynamic
When kind is dynamic, the iterations are distributed to threads in the team
in chunks. Each thread executes a chunk of iterations, then requests another
chunk, until no chunks remain to be distributed.
Each chunk contains chunk_size iterations, except for the chunk that contains
the sequentially last iteration, which may have fewer iterations.
When no chunk_size is specified, it defaults to 1.
guided
When kind is guided, the iterations are assigned to threads in the team in
chunks. Each thread executes a chunk of iterations, then requests another
chunk, until no chunks remain to be assigned.
For a chunk_size of 1, the size of each chunk is proportional to the number
of unassigned iterations divided by the number of threads in the team,
decreasing to 1. For a chunk_size with value k (greater than 1), the size
of each chunk is determined in the same way, with the restriction that
the chunks do not contain fewer than k iterations (except for the chunk
that contains the sequentially last iteration, which may have fewer than k
iterations).
When no chunk_size is specified, it defaults to 1.
auto
When kind is auto, the decision regarding scheduling is delegated to the
compiler and/or runtime system. The programmer gives the implementation
the freedom to choose any possible mapping of iterations to threads in the
team.
runtime
When kind is runtime, the decision regarding scheduling is deferred until
run time, and the schedule and chunk size are taken from the run-sched-var
ICV. If the ICV is set to auto, the schedule is implementation defined.
Note – For a team of p threads and a loop of n iterations, let ⌈⌈n∕p⌉⌉ be the integer q that satisfies
n = p * q - r, with 0 <= r < p. One compliant implementation of the static schedule (with no
specified chunk_size) would behave as though chunk_size had been specified with value q. Another
compliant implementation would assign q iterations to the first p - r threads, and q - 1 iterations to the
remaining r threads. This illustrates why a conforming program must not rely on the details of a particular
implementation.
A compliant implementation of the guided schedule with a chunk_size value of k would assign q = ⌈⌈n∕p⌉⌉
iterations to the first available thread and set n to the larger of n - q and p * k. It would then repeat this
process until q is greater than or equal to the number of remaining iterations, at which time
the remaining iterations form the final chunk. Another compliant implementation could use
the same method, except with q = ⌈⌈n∕(2p)⌉⌉, and set n to the larger of n - q and 2 * p * k.
Table 2.6:
schedule Clause modifier Values
monotonic
When the monotonic modifier is specified then each thread executes
the chunks that it is assigned in increasing logical iteration order.
nonmonotonic
When the nonmonotonic modifier is specified then chunks are
assigned to threads in any order and the behavior of an application that
depends on any execution order of the chunks is unspecified.
simd
When the simd modifier is specified and the loop is associated with a
SIMD construct, the chunk_size for all chunks except the first and last
chunks is new_chunk_size = ⌈⌈chunk_size∕simd_width⌉⌉*simd_width
where simd_width is an implementation-defined value. The first chunk
will have at least new_chunk_size iterations except if it is also the last
chunk. The last chunk may have fewer iterations than new_chunk_size.
If the simd modifier is specified and the loop is not associated with a
SIMD construct, the modifier is ignored.
Execution Model EventsThe ws-loop-begin event occurs after an implicit task encounters a
worksharing-loop construct but before the task starts execution of the structured block of the
worksharing-loop region.
The ws-loop-end event occurs after a worksharing-loop region finishes execution but before resuming
execution of the encountering task.
The ws-loop-iteration-begin event occurs once for each iteration of a worksharing-loop before the iteration
is executed by an implicit task.
Tool CallbacksA thread dispatches a registered ompt_callback_work callback with
ompt_scope_begin as its endpoint argument and work_loop as its wstype argument
for each occurrence of a ws-loop-begin event in that thread. Similarly, a thread dispatches a
registered ompt_callback_work callback with ompt_scope_end as its endpoint argument
and work_loop as its wstype argument for each occurrence of a ws-loop-end event in that
thread. The callbacks occur in the context of the implicit task. The callbacks have type signature
ompt_callback_work_t.
A thread dispatches a registered ompt_callback_dispatch callback for each occurrence of a
ws-loop-iteration-begin event in that thread. The callback occurs in the context of the implicit task. The
callback has type signature ompt_callback_dispatch_t.
RestrictionsRestrictions to the worksharing-loop construct are as follows:
No
OpenMP
directive
may
appear
in
the
region
between
any
associated
loops.
If
a
collapse
clause
is
specified,
exactly
one
loop
must
occur
in
the
region
at
each
nesting
level
up
to
the
number
of
loops
specified
by
the
parameter
of
the
collapse
clause.
If
the
ordered
clause
is
present,
all
loops
associated
with
the
construct
must
be
perfectly
nested;
that
is
there
must
be
no
intervening
code
between
any
two
loops.
If
a
reduction
clause
with
the
inscan
modifier
is
specified,
neither
the
ordered
nor
schedule
clause
may
appear
on
the
worksharing-loop
directive.
The
values
of
the
loop
control
expressions
of
the
loops
associated
with
the
worksharing-loop
construct
must
be
the
same
for
all
threads
in
the
team.
Only
one
schedule
clause
can
appear
on
a
worksharing-loop
directive.
The
schedule
clause
must
not
appear
on
the
worksharing-loop
directive
if
the
associated
loop(s)
form
a
non-rectangular
loop
nest.
The
ordered
clause
must
not
appear
on
the
worksharing-loop
directive
if
the
associated
loop(s)
form
a
non-rectangular
loop
nest.
Only
one
collapse
clause
can
appear
on
a
worksharing-loop
directive.
chunk_size
must
be
a
loop
invariant
integer
expression
with
a
positive
value.
The
value
of
the
chunk_size
expression
must
be
the
same
for
all
threads
in
the
team.
The
value
of
the
run-sched-var
ICV
must
be
the
same
for
all
threads
in
the
team.
When
schedule(runtime)
or
schedule(auto)
is
specified,
chunk_size
must
not
be
specified.
A
modifier
may
not
be
specified
on
a
linear
clause.
Only
one
ordered
clause
can
appear
on
a
worksharing-loop
directive.
The
ordered
clause
must
be
present
on
the
worksharing-loop
construct
if
any
ordered
region
ever
binds
to
a
worksharing-loop
region
arising
from
the
worksharing-loop
construct.
The
nonmonotonic
modifier
cannot
be
specified
if
an
ordered
clause
is
specified.
Either
the
monotonic
modifier
or
the
nonmonotonic
modifier
can
be
specified
but
not
both.
The
loop
iteration
variable
may
not
appear
in
a
threadprivate
directive.
If
both
the
collapse
and
ordered
clause
with
a
parameter
are
specified,
the
parameter
of
the
ordered
clause
must
be
greater
than
or
equal
to
the
parameter
of
the
collapse
clause.
A
linear
clause
or
an
ordered
clause
with
a
parameter
can
be
specified
on
a
worksharing-loop
directive
but
not
both.
If
an
order(concurrent)
clause
is
present,
all
restrictions
from
the
loop
construct
with
an
order(concurrent)
clause
also
apply.
If
an
order(concurrent)
clause
is
present,
an
ordered
clause
may
not
appear
on
the
same
directive.
The
associated
for-loops
must
be
structured
blocks.
Only
an
iteration
of
the
innermost
associated
loop
may
be
curtailed
by
a
continue
statement.
No
statement
can
branch
to
any
associated
for
statement.
Only
one
nowait
clause
can
appear
on
a
for
directive.
A
throw
executed
inside
a
worksharing-loop
region
must
cause
execution
to
resume
within
the
same
iteration
of
the
worksharing-loop
region,
and
the
same
thread
that
threw
the
exception
must
catch
it.
The
associated
do-loops
must
be
structured
blocks.
Only
an
iteration
of
the
innermost
associated
loop
may
be
curtailed
by
a
CYCLE
statement.
No
statement
in
the
associated
loops
other
than
the
DO
statements
can
cause
a
branch
out
of
the
loops.
The
do-loop
iteration
variable
must
be
of
type
integer.
The
do-loop
cannot
be
a
DO
WHILE
or
a
DO
loop
without
loop
control.
Cross References
order(concurrent)
clause,
see
Section 2.9.5
on
page 363.
ordered
construct,
see
Section 2.17.9
on
page 717.
private,
firstprivate,
lastprivate,
linear,
and
reduction
clauses,
see
Section 2.19.4
on
page 842.
ompt_scope_begin
and
ompt_scope_end,
see
Section 4.4.4.11
on
page 1289.
ompt_work_loop,
see
Section 4.4.4.15
on
page 1292.
ompt_callback_work_t,
see
Section 4.5.2.5
on
page 1329.
OMP_SCHEDULE
environment
variable,
see
Section 6.1
on
page 1646.
Figure 2.1:
Determining the schedule for a Worksharing-Loop
2.9.2.1 Determining the Schedule of a Worksharing-Loop
When execution encounters a worksharing-loop directive, the schedule clause (if any) on
the directive, and the run-sched-var and def-sched-var ICVs are used to determine how loop
iterations are assigned to threads. See Section 2.5 on page 171 for details of how the values of the
ICVs are determined. If the worksharing-loop directive does not have a schedule clause then
the current value of the def-sched-var ICV determines the schedule. If the worksharing-loop
directive has a schedule clause that specifies the runtime schedule kind then the current value
of the run-sched-var ICV determines the schedule. Otherwise, the value of the schedule
clause determines the schedule. Figure 2.1 describes how the schedule for a worksharing-loop is
determined.