Summary
The worksharing-loop construct specifies that the iterations of one or more associated loops will be
executed in parallel by threads in the team in the context of their implicit tasks. The iterations are distributed
across threads that already exist in the team that is executing the parallel region to which the
worksharing-loop region binds.
Syntax
The syntax of the worksharing-loop construct is as follows:
If an enddo directive is not specified, an enddo directive is assumed at the end of the do-loops.
Binding
The binding thread set for a worksharing-loop region is the current team. A worksharing-loop region binds
to the innermost enclosing parallel region. Only the threads of the team executing the binding
parallel region participate in the execution of the loop iterations and the implied barrier of the
worksharing-loop region when that barrier is not eliminated by a nowait clause.
Description
An implicit barrier occurs at the end of a worksharing-loop region if a nowait clause is not
specified.
The collapse and ordered clauses may be used to specify the number of loops from the loop nest that
are associated with the worksharing-loop construct. If specified, their parameters must be constant positive
integer expressions.
The collapse clause specifies the number of loops that are collapsed into a logical iteration space that is
then divided according to the schedule clause. If the collapse clause is omitted, the behavior is as if a
collapse clause with a parameter value of one was specified.
If the ordered clause is specified with parameter n then the n outer loops from the associated loop nest
form a doacross loop nest. The parameter of the ordered clause does not affect how the logical iteration
space is divided.
At the beginning of each logical iteration, the loop iteration variable or the variable declared by range-decl
of each associated loop has the value that it would have if the set of the associated loops was executed
sequentially. The schedule clause specifies how iterations of these associated loops are divided into
contiguous non-empty subsets, called chunks, and how these chunks are distributed among threads of the
team. Each thread executes its assigned chunks in the context of its implicit task. The iterations of a given
chunk are executed in sequential order by the assigned thread. The chunk_size expression is
evaluated using the original list items of any variables that are made private in the worksharing-loop
construct. Whether, in what order, or how many times, any side effects of the evaluation of
this expression occur is unspecified. The use of a variable in a schedule clause expression
of a worksharing-loop construct causes an implicit reference to the variable in all enclosing
constructs.
See Section 2.11.4.1 for details of how the schedule for a worksharing-loop region is determined.
The schedule kind can be one of those specified in Table 2.5.
The schedule modifier can be one of those specified in Table 2.6. If the static schedule kind is specified
or if the ordered clause is specified, and if the nonmonotonic modifier is not specified, the effect is
as if the monotonic modifier is specified. Otherwise, unless the monotonic modifier is
specified, the effect is as if the nonmonotonic modifier is specified. If a schedule clause
specifies a modifier then that modifier overrides any modifier that is specified in the run-sched-var
ICV.
If an order clause is present then the semantics are as described in Section 2.11.3.
The schedule is reproducible if one of the following conditions is true:
The
order
clause
is
present
and
uses
the
reproducible
modifier;
or
The
schedule
clause
is
specified
with
static
as
the
kind
parameter
and
the
simdmodifier
is
not
present.
Programs can only depend on which thread executes a particular iteration if the schedule is reproducible.
Schedule reproducibility is also used for determining its consistency with other schedules (see
Section 2.11.2).
Table 2.5:
schedule Clause kind Values
static
When kind is static, iterations are divided into chunks of size chunk_size,
and the chunks are assigned to the threads in the team in a round-robin
fashion in the order of the thread number. Each chunk contains chunk_size
iterations, except for the chunk that contains the sequentially last iteration,
which may have fewer iterations.
When no chunk_size is specified, the iteration space is divided into chunks
that are approximately equal in size, and at most one chunk is distributed to
each thread. The size of the chunks is unspecified in this case.
dynamic
When kind is dynamic, the iterations are distributed to threads in the team
in chunks. Each thread executes a chunk of iterations, then requests another
chunk, until no chunks remain to be distributed.
Each chunk contains chunk_size iterations, except for the chunk that contains
the sequentially last iteration, which may have fewer iterations.
When no chunk_size is specified, it defaults to 1.
guided
When kind is guided, the iterations are assigned to threads in the team in
chunks. Each thread executes a chunk of iterations, then requests another
chunk, until no chunks remain to be assigned.
For a chunk_size of 1, the size of each chunk is proportional to the number
of unassigned iterations divided by the number of threads in the team,
decreasing to 1. For a chunk_size with value k (greater than 1), the size
of each chunk is determined in the same way, with the restriction that
the chunks do not contain fewer than k iterations (except for the chunk
that contains the sequentially last iteration, which may have fewer than k
iterations).
When no chunk_size is specified, it defaults to 1.
auto
When kind is auto, the decision regarding scheduling is delegated to the
compiler and/or runtime system. The programmer gives the implementation
the freedom to choose any possible mapping of iterations to threads in the
team.
runtime
When kind is runtime, the decision regarding scheduling is deferred until
run time, and the schedule and chunk size are taken from the run-sched-var
ICV. If the ICV is set to auto, the schedule is implementation defined.
Note – For a team of p threads and a loop of n iterations, let ⌈⌈n∕p⌉⌉ be the integer q that satisfies
n = p * q - r, with 0 <= r < p. One compliant implementation of the static schedule (with no
specified chunk_size) would behave as though chunk_size had been specified with value q. Another
compliant implementation would assign q iterations to the first p - r threads, and q - 1 iterations to the
remaining r threads. This illustrates why a conforming program must not rely on the details of a particular
implementation.
A compliant implementation of the guided schedule with a chunk_size value of k would assign q = ⌈⌈n∕p⌉⌉
iterations to the first available thread and set n to the larger of n - q and p * k. It would then repeat this
process until q is greater than or equal to the number of remaining iterations, at which time
the remaining iterations form the final chunk. Another compliant implementation could use
the same method, except with q = ⌈⌈n∕(2p)⌉⌉, and set n to the larger of n - q and 2 * p * k.
Table 2.6:
schedule Clause modifier Values
monotonic
When the monotonic modifier is specified then each thread executes
the chunks that it is assigned in increasing logical iteration order.
nonmonotonic
When the nonmonotonic modifier is specified then chunks are
assigned to threads in any order and the behavior of an application that
depends on any execution order of the chunks is unspecified.
simd
When the simd modifier is specified and the loop is associated with a
SIMD construct, the chunk_size for all chunks except the first and last
chunks is new_chunk_size = ⌈⌈chunk_size∕simd_width⌉⌉*simd_width
where simd_width is an implementation-defined value. The first chunk
will have at least new_chunk_size iterations except if it is also the last
chunk. The last chunk may have fewer iterations than new_chunk_size.
If the simd modifier is specified and the loop is not associated with a
SIMD construct, the modifier is ignored.
Execution Model Events
The ws-loop-begin event occurs after an implicit task encounters a worksharing-loop construct but before
the task starts execution of the structured block of the worksharing-loop region.
The ws-loop-end event occurs after a worksharing-loop region finishes execution but before resuming
execution of the encountering task.
The ws-loop-iteration-begin event occurs once for each iteration of a worksharing-loop before the iteration
is executed by an implicit task.
Tool Callbacks
A thread dispatches a registered ompt_callback_work callback with ompt_scope_begin as its
endpoint argument and work_loop as its wstype argument for each occurrence of a ws-loop-begin event in
that thread. Similarly, a thread dispatches a registered ompt_callback_work callback with
ompt_scope_end as its endpoint argument and work_loop as its wstype argument for each occurrence
of a ws-loop-end event in that thread. The callbacks occur in the context of the implicit task. The callbacks
have type signature ompt_callback_work_t.
A thread dispatches a registered ompt_callback_dispatch callback for each occurrence of a
ws-loop-iteration-begin event in that thread. The callback occurs in the context of the implicit task. The
callback has type signature ompt_callback_dispatch_t.
Restrictions
Restrictions to the worksharing-loop construct are as follows:
If
the
ordered
clause
with
a
parameter
is
present,
all
associated
loops
must
be
perfectly
nested.
If
a
reduction
clause
with
the
inscan
modifier
is
specified,
neither
the
ordered
nor
schedule
clause
may
appear
on
the
worksharing-loop
directive.
The
values
of
the
loop
control
expressions
of
the
loops
associated
with
the
worksharing-loop
construct
must
be
the
same
for
all
threads
in
the
team.
At
most
one
schedule
clause
can
appear
on
a
worksharing-loop
directive.
If
the
schedule
or
ordered
clause
is
present
then
none
of
the
associated
loops
may
be
non-rectangular
loops.
The
ordered
clause
must
not
appear
on
the
worksharing-loop
directive
if
the
associated
loops
include
the
generated
loops
of
a
tile
directive.
At
most
one
collapse
clause
can
appear
on
a
worksharing-loop
directive.
chunk_size
must
be
a
loop
invariant
integer
expression
with
a
positive
value.
The
value
of
the
chunk_size
expression
must
be
the
same
for
all
threads
in
the
team.
The
value
of
the
run-sched-var
ICV
must
be
the
same
for
all
threads
in
the
team.
When
schedule(runtime)
or
schedule(auto)
is
specified,
chunk_size
must
not
be
specified.
A
modifier
may
not
be
specified
on
a
linear
clause.
At
most
one
ordered
clause
can
appear
on
a
worksharing-loop
directive.
The
ordered
clause
must
be
present
on
the
worksharing-loop
construct
if
any
ordered
region
ever
binds
to
a
worksharing-loop
region
arising
from
the
worksharing-loop
construct.
The
nonmonotonic
modifier
cannot
be
specified
if
an
ordered
clause
is
specified.
Each
schedule
clause
modifier
may
be
specified
at
most
once
on
the
same
schedule
clause.
Either
the
monotonic
modifier
or
the
nonmonotonic
modifier
can
be
specified
but
not
both.
If
both
the
collapse
and
ordered
clause
with
a
parameter
are
specified,
the
parameter
of
the
ordered
clause
must
be
greater
than
or
equal
to
the
parameter
of
the
collapse
clause.
The
values
of
the
parameters
specified
by
the
collapse
and
ordered
clauses
must
not
exceed
the
depth
of
the
associated
loop
nest.
A
linear
clause
or
an
ordered
clause
with
a
parameter
can
be
specified
on
a
worksharing-loop
directive
but
not
both.
At
most
one
nowait
clause
can
appear
on
a
for
directive.
If
an
ordered
clause
with
a
parameter
is
specified,
none
of
the
associated
loops
may
be
a
range-based
for
loop.
OMP_SCHEDULE
environment
variable,
see
Section 6.1.
Figure 2.1:
Determining the schedule for a Worksharing-Loop
2.11.4.1 Determining the Schedule of a Worksharing-Loop
When execution encounters a worksharing-loop directive, the schedule clause (if any) on the directive,
and the run-sched-var and def-sched-var ICVs are used to determine how loop iterations are assigned to
threads. See Section 2.4 for details of how the values of the ICVs are determined. If the worksharing-loop
directive does not have a schedule clause then the current value of the def-sched-var ICV determines the
schedule. If the worksharing-loop directive has a schedule clause that specifies the runtime schedule
kind then the current value of the run-sched-var ICV determines the schedule. Otherwise, the value of the
schedule clause determines the schedule. Figure 2.1 describes how the schedule for a worksharing-loop
is determined.