001: I/O, IF, SCHEDULE, LASTPRIVATE, nesting, runtime routines
003: Local variables with the SAVE attribute and SHARED/PRIVATE
007: FIRSTPRIVATE and LASTPRIVATE Variables
008: PRIVATE and COPYIN clauses
011: F90 features: modules, array language, intrinsics
012: THREADPRIVATE COMMON data in modules
013: Privitization on Worksharing Directives
014: Syntax of Parallel DO Loops
021: ORDERED and the definition of static/dynamic extents
022: Serialization of nested parallel regions
023: Statement functions and data attribute clauses
024: Worksharing constructs and branches
030: SHARED array section actual arguments
Question:
The following questions were extracted from mail received:
This is both hard to implement efficiently under many systems and too vague to be useful. Interleaved at what level - records, bytes or bits? And what about open/close, positioning, I/O error handling and so on? It is a very hard problem indeed, and needs MUCH more attention than OpenMP has given it - better to say nothing at all than include that paragraph.
This doesn't mention what happens if the expression calls a function, which directly or indirectly contains parallel constructions. It should explicitly allow it, make it implementation dependent, or forbid it (make it undefined). I suggest the last.
It says that the region is executed in parallel only if the expression evaluates to .TRUE., but doesn't make it clear whether the construct is then available for binding purposes (2.7). This needs a clear statement one way or the other, as it affects the semantics of how an IF clause can be used very considerably.
I am completely confounded by this. What value does chunk decrease FROM? Also why is this bizarre special case useful, in the first place? Why can't I also have exponentially increasing, or varying according to some other rule (e.g. quadratic)?
What does "sequentially last" mean? Lexically last or chronologically last? This needs clarifying.
This does not forbid MASTER clauses within CRITICAL ones, which I think is an oversight. Certainly, it can create deadlock in otherwise correct programs, and I can see no good reason for allowing it.
The first paragraph says that a serialized parallel region is not considered to be parallel, and the last paragraph says the opposite! The latter is clearly a better specification, as otherwise there are two states that cannot be distinguished.
Why on earth should there be ANY integer type large enough to contain an address? Many hardware architectures and operating systems have pointers that are significantly larger than any integer. If OpenMP is going to require major language extensions to standard systems, it should make the fact much clearer.
Approved Response:
!$omp parallel if (logexpr()) . . . !$omp end parallel
is equivalent to:
logical_tmp = logexpr() !$omp parallel if (logical_tmp) . . . !$omp end parallel
Similarly, the CHUNK expression on a SCHEDULE clause is evaluated outside the context of the DO construct.
Edits:
"Unsynchronized use of Fortran I/O statements by multiple threads on the same unit has undefined behaviour."
"The IF expression is evaluated outside the context of the parallel region. Results are undefined if the IF expression contains a function reference which has side effects."
Add the following sentences to section 2.3.1:
"The CHUNK expression is evaluated outside the context of the DO construct. Results are undefined if the CHUNK expression contains a function reference which has side effects."
"A parallel region is available for binding purposes, whether it is serialized or executed in parallel."
"Guided scheduling is appropriate for the case in which the threads may arrive at varying times at a DO work-sharing construct, with each iteration requiring about the same amount of work. This can happen if, for example, the DO loop is preceded by one or more work-sharing SECTIONS or DO constructs with NOWAIT clauses. Like dynamic, the guided schedule guarantees that no thread waits at the barrier longer than it takes another thread to execute its final iteration, or final k iterations if a chunk size of k is specified. Among such schedules, the guided schedule is characterized by the property that it requires the fewest synchronizations."
Question:
From the example (A.13), I assume FLUSH synchronizes the local thread with (global) memory. From a programmers point of view, it is therefore a purely local operation. This is not at all clear from the description of FLUSH (2.5.5). Particularly since all the other directives in section 2.5 are concerned with synchronizing threads (and not necessarily memory, unless FLUSH is implied).
The example also seems to imply that a local FLUSH is required for memory reads as well as writes:
ISYNC(IAM) = 1 C$OMP FLUSH(ISYNC) DO WHILE(ISYNC(NEIGH).EQ.0) C$OMP FLUSH(ISYNC) ENDDO
Thus the FLUSH on thread NEIGH is presumably not sufficient for ISYNC(NEIGH) to be visable on thread IAM, a FLUSH on IAM after the FLUSH on NEIGH is also required.
This is not at all clear from section 2.5.5. Presumably, "Subsequent reads of thread-visible variables fetch the latest copy of the data" means "The first subsequent read of each thread-visible variable ON THE THREAD ISSUING THE FLUSH will fetch the latest copy of the data". In particular a SECOND read of a thread-visible variable may or may not fetch the latest copy.
This all seams to make sense from a compiler optimization point of view, but needs to be made clearer from a programmer's point of view.
Approved Response:
Yes, the thread which issues the flush will fetch the latest copy of the thread-visible variable. And a flush is required for reads as well as writes. The specification will be modified to clarify both points.
Edits:
In section 2.5.5, "FLUSH directive", replace the 2 paragraphs beginning with "The FLUSH directive identifies..." and "Implementations must ensure..." with the following text:
"The FLUSH directive, whether explicit or implied, identifies a cross-thread sequence point at which the implementation is required to ensure that each thread in the team has a consistent view of certain variables in memory.
A consistent view requires that all memory operations (both reads and writes) that occur before the FLUSH directive in the program be performed before the sequence point in the executing thread; similarly, all memory operations that occur after the FLUSH must be performed after the sequence point in the executing thread.
Implementations must ensure that modifications made to thread-visible variables within the executing thread are made visible to all other threads at the sequence point. For example, compilers must restore values from registers to memory, and hardware may need to flush write buffers. Furthermore, implementations must assume that thread-visible variables may have been updated by other threads at the sequence point, and must be retrieved from memory before their first use past the sequence point.
Finally, the FLUSH directive only provides consistency between operations within the executing thread and global memory. To achieve a globally consistent view across all threads, each thread must execute a FLUSH operation."
Question:
In section 2.6.3 of the Fortran API specification, item 7 defines default (actually mandatory) attributes (SHARED/PRIVATE) for variables in called subroutines within the dynamic extent of a parallel region. It specifies SHARED for common/module variables and PRIVATE for local non-SAVEd variables, but it leaves the attributes of SAVEd local variables undefined. This is going to create major portability problems. I would prefer PRIVATE SAVEd variables, but I assume SHARED is easier to implement - so I suggest the latter be the default. If there is some VERY good reason for leaving SAVEd variables undefined, then item 7 should explictly say they are implementation dependent and recommend that they therefore be avoided in portable programs. Note that this will likely make the vast majority of OpenMP programs with dynamic scope non-portable. There is simply no way that a program is going to work right with variables defined SHARED on one machine and PRIVATE on another. In fact, I think it likely that purchasers of OpenMP compilers (and the machines they run on) would then REQUIRE a particular behaviour for SAVEd variables, leaving the compilers/machines that chose the "wrong" attribute out in the cold.
Also, 2.6.2.3 needs to explain more clearly that it does not apply to called subroutines. I suggest adding the following after the 2nd sentence: Local variables, modules and common blocks in subroutines called within the parallel region are also not affected, see section 2.6.3 for data scope rules.
Approved Response:
Local variables with the SAVE attribute declared in procedures called from a parallel region are implicitly SHARED. The specification will be clarified.
The first sentence of section 2.6.2.3 states:
"The DEFAULT clause allows the user to specify a PRIVATE, SHARED, or NONE scope attribute for all variables in the lexical extent of any parallel region."The lexical extent does not include procedures called from a parallel region.
Edits:
To section 2.6.3, bullet 7, add the following statement:
"Local variables in called routines that have the SAVE attribute are SHARED."
Question:
Does the REDUCTION clause in OpenMP guarantee the same floating point result with different numbers of processors?
Approved Response:
Bit-wise identical results cannot be guaranteed for floating-point reductions, even if the same number of threads is used. The final result is dependent on the order in which partial results are globally accumulated and the order of accumulation may vary from one run to the next.
Edits:
In section 2.6.2.6, pg. 30, add the following sentence to the end of the second paragraph:
"Since the intermediate values of the REDUCTION variables may be combined in random order, there is no guarantee that bit-identical results will be obtained for floating point reductions from one parallel run to another."
Question:
I am concerned with the following statement in OpenMP:
2.6.3 Data environment rules...
7. ... Common blocks and modules in called routines in the dynamic extent of a parallel region always have an implicit SHARED attribute, unless they are THREADPRIVATE common blocks.
It seems to me, any variables defined in the data segments of Fortran 90 MODULEs would be limited to be SHARED.
If it is the case, would the following generally valid Fortran 90 MODULE be wrong under OpenMP?
module snglthrd private ! not the same OpenMP ``PRIVATE''! public :: settmp, usetmp real,save :: tmp contains subroutine settmp(a) tmp=a end subroutine settmp subroutine usetmp() print*,tmp end subroutine usetmp end module snglthrd program multithrd use snglthrd real a(8) do i=1,8 a(i)=i end do !$omp parallel do shared(a) private(i) do i=1,8 call settmp(a(i)) ! a race condition? call usetmp() end do !$omp end parallel do end program multithrd
Approved Response:
It is true that in the current specification, module data is restricted to being shared for a parallel region, and in your example, "settmp" does create a race condition. Many implementations of Fortran 90 treat module data as common block data, mapping the data declared in a module to a unique common block. The same race condition can occur if the variable tmp is placed in a common block.
The OpenMP committee intends to address issues pertaining to Fortran 90 and Fortran 95 in Version 2 of the Fortran specification. Facilities for specifying how module data should be treated will be considered for incorporation into Version 2."
Edits:
None.
Question:
At the bottom of page 12 of the OpenMP Fortran API, version 1-Oct '97, the GUIDED suboption of SCHEDULE is explained. The two sentences seem to conflict with one another in describing the chunk parameter.
I see two interpretations:
Are either of these correct, or does the GUIDED suboption work some other way?
Approved Response:
The second interpretation is what was intended.
Edits:
The description of GUIDED on pg. 12 of the OpenMP Fortran API, Ver 1.0, should read:
"When SCHEDULE(GUIDED,chunk) is specified, the iteration space is divided into pieces such that the size of each successive piece is exponentially decreasing. chunk specifies the size of the smallest piece, except possibly the last. The size of the initial piece is implementation dependent. As each thread finishes a piece of the iteration space, it dynamically obtains the next available piece. When no chunk is specified, it defaults to 1."
Question:
If a variable is declared as both FIRSTPRIVATE and LASTPRIVATE, is an OpenMP implementation required to ensure that the copy for FIRSTPRIVATE will happen before the shared variable write for LASTPRIVATE?
Approved Response:
Yes, an OpenMP implementation is required to ensure that the copy will happen before the write.
The ARB has also discussed the issue of a variable appearing in a chunk expression for the SCHEDULE clause as well as in a LASTPRIVATE clause. In this case, it will be the user's responsibility to ensure that the chunk parameter is the same for all threads. The OpenMP API will be amended to include this restriction.
Finally, in the case of LASTPRIVATE variables being used in a work-sharing DO which specifies NOWAIT, it is the user's responsibility to ensure that the values of such variables be used only after a barrier. A cautionary statement will be added to the specification.
Edits:
Add the following rule to the bulleted list on pg. 13, Section 2.3.1:
"The value of the chunk parameter must be the same for all of the threads in the team."
Add the following rule to section 2.6.3 "Data environment rules":
"Variables that are specified as LASTPRIVATE for a work-sharing directive for which NOWAIT appears, must not be used prior to a barrier."
Add the following to the end of section 2.6.3, "Data environment rules":
"An implementation that conforms to the OpenMP Fortran API must adhere to the following rules:
- If a variable is specified as FIRSTPRIVATE and LASTPRIVATE, the implementation must ensure that the update required for LASTPRIVATE occurs after all initializations for FIRSTPRIVATE."
Question:
In a few directives (notable "PARALLEL" and "DO"), it is possible to have a privatization clause which privatizes a variable, and to also use the value of that variable in the directive itself, for instance:
Example 1a.
logical logvar ... !$omp parallel if (logvar) private(logvar)
Example 1b.
!$omp do private(i) schedule(static, i)
There should be a rule stating whether this is legal, and if it is, stating whether the variable is evaluated in the shared scope or in the private scope (in which case the user would have to use FIRSTPRIVATE). I think the answer is either (1) this usage is undefined, or (2) the variable is evaluated in the enclosing scope when its value is needed; but it would be nice if this were explicitly stated.
The harder problem has to do with LASTPRIVATE. LASTPRIVATE is a write to shared storage, and it is a particularly interesting one because it isn't in the source code where the user can write synchronization code for it. The OpenMP document doesn't seem to promise anything about this case.
Also, the specification needs to state whether it is the responsibility of the implementation or the user to ensure that all COPYIN assignments are completed before the master's data is modified.
Approved Response:
Expressions for clauses on PARALLEL and DO directives are evaluated in the enclosing scope of the directive. See interpretation 001 as well.
See interpretation 007 for information about LASTPRIVATE variables.
It is the responsibility of the implementation to ensure that the value of each threadprivate copy is the same as the value of the master thread copy when the master thread reached the directive containing the COPYIN clause.
Edits:
See edits from interpretations 001 and 007.
Add the following sentence to section 2.6.2.7 "COPYIN":
"An OpenMP-compliant implementation is required to ensure that the value of each threadprivate copy is the same as the value of the master thread copy when the master thread reached the directive containing the COPYIN clause."
Question:
I had a scan through the OpenMP Fortran API doc v 1.0 and have a couple of comments. I write these from the perspective of an interest in portability.
p38, sec 3.2
"For all these routines, var should be of type integer and of a KIND large enough to hold an address."Why is this? Is it just to save you the overhead of a lookup?
"For example, on 64-bit addressable systems, the var may be declared as INTEGER(KIND=8)"
Why should it? There is no connection between KIND=8 and 64 bits. You could use SELECTED_INT_KIND().
p49, A.15
"In the following example, note that the argument to the lock routines should be of size POINTER".There is no "standard" fortran meaning to this. I don't think you should be implementation specific in the examples (same applies later in the stubs).
Approved Response:
You are correct that the use of INTEGER(KIND=8) is incorrect. The specification will be changed to use SELECTED_INT_KIND.
Edits:
In section 3.2 on pg. 36, change
INTEGER(KIND=8)
to
INTEGER(SELECTED_INT_KIND(18))
Question:
It is clear that at the FLUSH, all previous writes of the processor should be completed at memory. The semantics with respect to reads, however, are unclear. It appears to me that a FLUSH should also ensure the following (or some variation thereof):
Requirements such as the above are needed to ensure correct execution of a program that may do producer-consumer synchronization with ordinary variables such as the following:
P1 P2 A = 1 B = 1 FLUSH Flag = 1 while (Flag == 1) FLUSH tmp = A tmp = B
Approved Response:
Your assumptions about reads are correct. The specification needs to be clarified with respect to the semantics of FLUSH as it applies to reads.
Edits: See interpretation 002.
Questions:
Approved Response:
"Common blocks and modules in called routines in the dynamic extent of a parallel region always have an implicit SHARED attribute, unless they are THREADPRIVATE common blocks."
a = b + c
Edits:
None.
Question:
I have a Fortran90/95 question that may lead to a clarification in the OMP Fortran standard.
The question is whether or not common block names from common blocks declared in the declaration section of a module are visible in scopes which import that module via a "USE" statement.
Here's a small test case that makes this issue relevant to the OMP Fortran standard:
module foo common /t/ a !$omp threadprivate(/t/) end module foo subroutine bar use foo !$omp parallel copyin(/t/) ... !$omp end parallel end subroutine bar
This program is illegal if common block names are not exported from a module via a use statement because the COPYIN clause would be referring to an undeclared common block, /t/, not the /t/ declared in the module.
The F95 standard (Draft version, 1995) says the following about use statements:
11.3.2 The USE statement and use association
The USE statement provides the means by which a scoping unit accesses named data objects, derived types, interface blocks, procedures, generic identifiers, and namelist groups in a module.
Common block names do not seem to me to fit into any of the above catagories.
Furthermore, the F95 standard says this about common block storage sequences (emphasis is mine):
5.5.2.1 Common block storage sequence
Only COMMON statements and EQUIVALENCE statements appearing in the scoping unit contribute to common block storage sequences formed in that unit. Variables, in common blocks, made accessible by *use association* or host association do not contribute.
I interpret this to mean that for the following testcase, the common block storage sequence for /t/ visible in the subroutine scope consists of only variable "b", not "a" then "b":
module foo common /t/ a end module foo subroutine bar use foo common /t/ b end subroutine bar
This seems like further evidence that common block names from modules should not be visible via use association.
If all this is correct, then the original testcase should be modified as follows to be legal F90/F95:
module foo real a(100) common /t/ a !$omp threadprivate(/t/) end module foo subroutine bar use foo !$omp parallel copyin(a) ! note change to copyin list ... !$omp end parallel end subroutine bar
This syntax can be inconvenient, since all members of the common need to be listed instead of the common block name. Furthermore, since there is no way to declare a common block THREADPRIVATE by naming its members, the THREADPRIVATE directive must be present in the module, not in the subroutine which contains the USE statement.
If anyone disagrees with this analysis please let us all know. I am a novice user of F90/F95 so I am easily confused by the terminology in the standard.
If this analysis is correct, perhaps the OMP Fortran standard could be amended to clarify what happens for this case, since it doesn't seem obvious to me what the restrictions are w.r.t. common block names in COPYIN clauses from reading the OMP Fortran standard.
Approved Response:
The preceding analysis is correct. Common block names are not accessible by use association or host association. The following are further examples which are invalid:
Example 1:
module foo common /t/ a end module foo subroutine bar use foo !$omp threadprivate(/t/) !$omp parallel ... !$omp end parallel end subroutine bar
Example 2:
common /t/ a !$omp threadprivate(/t/) ... contains subroutine bar !$omp parallel copyin(/t/) ... !$omp end parallel end subroutine bar end program
Example 2 may be correctly rewritten as follows:
common /t/ a !$omp threadprivate(/t/) ... contains subroutine bar common /t/ a !$omp threadprivate(/t/) !$omp parallel copyin(/t/) ... !$omp end parallel end subroutine bar end program
Edits:
In the OpenMP Fortran API, version 1-Oct '97, the following additions should be made:
"This directive must appear in the declaration section of the routine after the declaration of the listed common blocks."
"Although variables in common blocks can be accessed by use association or host association, common block names cannot. This means that a common block name specified in a THREADPRIVATE directive must be declared to be a common block in the same scoping unit in which the THREADPRIVATE directive appears."
"Although variables in common blocks can be accessed by use association or host association, common block names cannot. This means that a common block name specified in a COPYIN clause must be declared to be a common block in the same scoping unit in which the COPYIN clause appears."
Question:
I have found a feature in the OpenMP API that seems to be hard to implement. It concerns data scope attribute clauses.
OpenMP API for FORTRAN, section 2.6.3, page 28:
"Variables that are privatized in a parallel region cannot be privatized again on an enclosed work-sharing directive. As a result, variables that appear in the PRIVATE, FIRSTPRIVATE, LASTPRIVATE and REDUCTION clauses on a work-sharing directive must have shared scope in the enclosing parallel region"
Consider the following example:
!$OMP PARALLEL PRIVATE(x) f(x) !$OMP END PARALLEL ... !$OMP PARALLEL SHARED(x) f(x) !$OMP END PARALLEL PROCEDURE F(x) !$OMP DO PRIVATE(x) ... RETURN END
In the first case we should not make a local copy of x in procedure F, we just use x (even without any synchronization, because x is a private for a thread). In the last case, though, we should use a local copy of x instead. If I properly understand, it must be some run-time decision whether make a local copy or not.
Approved Response:
The document section you quote says that a variable declared as PRIVATE on the parallel region cannot also be declared as PRIVATE on a work-sharing construct. Your example violates this rule. In the first case
!$OMP PARALLEL PRIVATE(x) f(x) !$OMP END PARALLEL
x is declared as PRIVATE and then, in the work-sharing constuct in the extended region
PROCEDURE F(x) !$OMP DO PRIVATE(x) ... RETURN END
x is declared as PRIVATE again. The rule prohibits this.
Edits:
None.
Question:
The syntax for most of the OpenMP constructs in Fortran enclose blocks of code, where a block is defined syntactically by the Fortran standard. The syntax diagrams for the DO and PARALLEL DO directives show do_loop's as part of the syntax diagram, but the Fortran standard doesn't define a non-terminal named do_loop.
Was it intended that the do_loop should be the block-do-construct of Fortran 90 and 95, or that it should be either the block-do-construct or the nonblock-do-construct? The difference is that a block-do-construct always ends with END DO or CONTINUE, and doesn't have a shared "termination" statement. For example,
DO 100 I = 1, 10 DO 100 J = 1, 10 A(I, J) = 1 100 CONTINUE
and
are nonblock-do-constructs.DO 200 I = 1, 10 200 B(I) = I
Here's an example that seems awkward if nonblock-do-constructs are allowed:
DO 100 I = 1, 10 <-------. !$OMP DO <----. | DO 100 J = 1, 10 <-. | | ... | | | 100 CONTINUE <-' |<-' !$OMP END DO NOWAIT <----' END
The set of statements and directives in the I loop intersects the set of statements and directives in the DO work-sharing construct, but neither is a superset of the other.
Approved Response:
The OpenMP ARB feels that it is important that both block-do and non-block-do loops be permitted with PARALLEL DO and work-sharing DO directives. However, if a user specifies an ENDDO directive for a non-block-do construct with shared termination, then the matching DO directive must precede the outermost DO.
The following are some examples:
Valid Example 1:
DO 100 I = 1,10 !$OMP DO DO 100 J = 1,10 ... 100 CONTINUE
Valid Example 2:
!$OMP DO DO 100 J = 1,10 ... 100 A(I) = I + 1 !$OMP ENDDO
Valid Example 3:
!$OMP DO DO 100 I = 1,10 DO 100 J = 1,10 ... 100 CONTINUE !$OMP ENDDO
Invalid Example 1:
DO 100 I = 1,10 !$OMP DO DO 100 J = 1,10 ... 100 CONTINUE !$OMP ENDDO
Edits:
In section 2.3.1, after the syntax diagram, add
"The do-loop may be a do-construct, an outer-shared-do-construct or an inner-shared-do-construct. A DO construct that contains several DO statements that share the same DO termination statement syntactically consist of a sequence of outer-shared-do-constructs, followed by a single inner-shared-do-construct. If an END DO directive follows such a DO construct, a DO directive can only be specified for the first (i.e., the outermost) outer-shared-do-construct".
In section 2.4.1, after the syntax diagram, add
"The do-loop may be a do-construct, an outer-shared-do-construct or an inner-shared-do-construct. A DO construct that contains several DO statements that share the same DO termination statement syntactically consist of a sequence of outer-shared-do-constructs, followed by a single inner-shared-do-construct. If an END PARALLEL DO directive follows such a DO construct, a PARALLEL DO directive can only be specified for the first (i.e., the outermost) outer-shared-do-construct".
Question:
I'm confused by the definition of FLUSH in v1.0 - - the descriptions of FLUSH seem to be inconsistent with the example.
The specification seems to suggest that after a FLUSH, a shared variable is updated in all the other threads, without the others performing any synchronization.
While in the following code, from example A.13 on page 48, the second flush suggests a thread has to perform a flush to obtain the up-to-date data. Is the second flush redundant?
!$OMP PARALLEL DEFAULT(PRIVATE) SHARED(ISYNC) CALL WORK() ISYNC(IAM) = 1 !$OMP FLUSH(ISYNC) DO WHILE (ISYNC(NEIGHBOR) .EQ. 0) !$OMP FLUSH(ISYNC) ENDDO !$OMP END PARALLEL
Approved Response: TBD.
The second flush in the example is required. See interpretation 002 for more information.
Edits:
See interpretation 002.
Question:
I am interested in the restrictions to the ORDERED Directive. Can a OpenMP Fortran program include two ordered sections in a DO, like the following?
!$OMP DO DO I = 1, N ... !$OMP ORDERED ... !$OMP END ORDERED ... !$OMP ORDERED ... !$OMP END ORDERED ... END DO
I want to verify my interpretation of the specification. Would you please let me know if this program is valid?
Approved Response:
It is possible to have multiple ORDERED sections within a DO specified with the ORDERED clause. However, the example above is invalid, because the API states the following:
An iteration of a loop with a DO directive must not execute the same ORDERED directive more than once, and it must not execute more than one ORDERED directive.
In your example, all iterations execute 2 ORDERED sections. The following is a valid example of a DO with more than one ORDERED section:
!$omp do ordered do i = 1,n . . . if (i <= 10) then . . . !$omp ordered write(4,*) i !$omp end ordered endif . . . if (i > 10) then . . . !$omp ordered write(3,*) i !$omp end ordered endif enddo
Edits:
None.
Question: I have a couple of questions regarding rules relating to iteration variables.
OpenMP states: "Sequential DO loop control variables in the lexical extent of a PARALLEL region that would otherwise be SHARED based on default rules are automatically made private on the PARALLEL directive"
Also, I believe we intended that parallel DO loop control variables should be default private for the work-sharing DO construct, although I can't find words to that effect. For example:
!$omp parallel . . . !$omp do ! "i" is private for the DO do i = 1,100 ... enddo !$omp end parallel
Can someone point me to the text that states this rule?
Finally, OpenMP states:
"Variables which are privitized in a parallel region cannot be privitized again on an enclosed work-sharing directive"
Assuming all of this is true, are the following examples OpenMP-conformant or invalid?
Example 1:
!$omp parallel ! "i" is assumed private here as it is . ! used as a sequential loop control var do i = 1,100 ... enddo . !$omp do do i = 1,100 ! Is this an implicit re-privitization? ... ! Is this code legal? enddo !$omp end parallel
Example 2:
!$omp parallel ! "i" is assumed private here as it is . ! used as a sequential loop control var do i = 1,100 ... enddo . !$omp do private(i) ! Is this a re-privitization? do i = 1,100 ! Is this code legal? ... enddo !$omp end parallel
Example 3:
!$omp parallel private(i) . . !$omp do do i = 1,100 ! Is this O.K.? Is this considered a ... ! reprivitization? enddo !$omp end parallel
Approved Response:
The text which describes scope information for iteration variables of work-sharing DO loops is in section 2.3.1 on pg. 13:
"Parallel DO loop control variables are block-level entities within the DO loop."
Examples 1 and 3 above are legal. Iteration variables for work-sharing DO loops are considered to be different variables than those in the parallel region, because of the rule above.
Example 2 is an illegal program, because there is in fact a re-privitization on the DO directive. The OpenMP ARB believes that it is desirable for code such as this to be legal, and will likely relax rules on reprivitization in Version 2 of the Fortran API.
Edits:
None.
Question:
We have come across a question concerning the MASTER directive. MASTER is mentioned in the subsection "synchronization constructs", however, the description says (Ver 1.0 - Oct 1997, page 18)
"The other threads in the team skip the enclosed section and CONTINUE EXECUTION. There is NO IMPLIED BARRIER either on entry or exit from master section."
We are now wondering whether in some sense the MASTER directive does a synchronization or not.
Approved Response:
MASTER is not a synchronization construct, and was incorrectly classified as such.
Edits:
In the introduction for Section 2, change the description of section 2.5 to:
"Section 2.5, page 16, describes synchronization constructs and the MASTER directive."
Change the heading of Section 2.5 to:
"Synchronization Constructs and the MASTER directive"
Change the first sentence of Section 2.5 to:
"The following sections describe the synchronization constructs and the MASTER directive:"
Question:
I have a question about OpenMP: Does OpenMP allow nested Parallel sections?
I'd like to know better the language, and I am in need of a demo to make some examples.
Approved Response:
OpenMP permits nested parallel regions. For example, the following is OpenMP-conformant:
!$omp parallel sections !$omp section !$omp parallel sections . . . !$omp end parallel sections . . . !$omp section . . . !$omp end parallel sections
Note that an OpenMP-compliant implementation is permitted to serialize a nested parallel region.
OpenMP does not permit the nesting of work-sharing constructs (such as SECTIONS) within other work-sharing constructs that bind to the same parallel region.
Edits:
None.
Question:
On p20 of the OpenMP Fortran API is the statement :
"The following restriction applies to the ATOMIC directive
I am baffled by this:
I would be pleased if you could provide an explanation for this restriction and examples that violate it.
Approved Response:
The statement you refer to on page 20 ought to read:
"The following restriction applies to the use of ATOMIC directives:
All atomic references to the storage location of variable x throughout the program are required to have the same type and type parameters."
So, "x" is a scalar variable and the restriction applies to references of it's storage. The restriction is required, as you point out, because of equivalence and other mechanisms for storage association.
The following are some examples.
Invalid Example 1:
integer:: i real:: r equivalence(i,r) !$omp parallel . . . !$omp atomic i = i + 1 . . . !$omp atomic r = r + 1.0 !$omp end parallel
Invalid Example 2:
subroutine fred() subroutine sub() common /blk/ i common /blk/ r integer:: i real:: r !$omp parallel . . . . . . !$omp atomic !$omp atomic i = i + 1 r = r + 1.0 . . end subroutine . call sub() !$omp end parallel
Invalid Example 3:
NB: Although the following example might work on some implementations, this is considered a non-conforming program.
integer:: i real:: r equivalence(i,r) !$omp parallel . . . !$omp atomic i = i + 1 !$omp end parallel . . . !$omp parallel . . . !$omp atomic r = r + 1 !$omp end parallel
Edits:
On p. 20 of the Fortran API, change the restriction for ATOMIC to read:
"The following restriction applies to the use of ATOMIC directives:All atomic references to the storage location of variable x throughout the program are required to have the same type and type parameters."
Question:
In section 1.2, there are definitions of static extent and dynamic extent...
"The statements enclosed lexically within a construct define the static extent. The dynamic extent further includes the routines called from within the construct."
I understood that this statement means that dynamic extent includes static extent. And, in section 2.5.6, page 22, line 1, a following statement appears...
"An ORDERED directive can appear only in dynamic extent of DO or PARALLEL DO directive."
If dynamic extent includes static extent as I wrote above, I think that an ORDERED directive can also appear in static extent and this statement have no mean. Otherwise, dose this statement mean that an ORDERED directive can appear only in the routines called from within the construct ? If so, what is the reason of this restriction ?
Approved Response:
You are correct in assuming that the dynamic extent of a construct includes the static extent as well. So, an ORDERED directive may appear in the static extent of a DO or PARALLEL DO, or within procedures called from such an extent.
In the statement: "An ORDERED directive can appear only in the dynamic extent of a DO or PARALLEL DO directive", the word "only" refers to directives "DO" and "PARALLEL DO", not to "dynamic extent" specifically.
In other words, the statement exists to restrict the use of ORDERED sections within parallel do loops. For example, the following is illegal:
!$omp parallel . . . !$omp ordered . . . !$omp end ordered !$omp end parallel
Edits:
None.
Question:
Last year we posted a question about the following example, and whether it was legal:
program p integer :: i, j, tmp, a(10, 10) !$omp parallel do private(j) do i = 1, 10 !$omp parallel do private(tmp) do j = 1, 10 tmp = i*j a(i,j) = tmp end do end do end program p
The issue has to do with whether it is legal for an implementation, when serializing an inner parallel construct, to "ignore" the private clause for tmp, and whether the user should in fact ensure that tmp is private on the outer loop as well.
We were left with the impression that the user was responsible for ensuring "tmp" was made explicitly private on both loops. Is this correct?
Approved Response:
The OpenMP committee intended that PRIVATE clause semantics be such that an implementation would be permitted (but not required) to re-use the global storage for a PRIVATE variable on one of the threads in the team executing a region, for efficiency reasons. This situation comes up when a parallel region has been serialized. In the example above, when serializing the inner region, an implementation is permitted to re-use the storage for "tmp" from the outer region as the storage for "tmp" in the inner region. "tmp" is considered shared in the outer region. Given this, there are data races on "tmp" with respect to the outer region since it was not declared private on that region. So, in the example above, the user must mark "tmp" as private on the outer region.
Edits:
Add the following sentences to the first paragraph of Section 2.6.2 "Data scope attribute clauses":
"Scope attribute clauses which appear on a PARALLEL directive indicate how the specified variables are to be treated with respect to the parallel region associated with the PARALLEL directive. They do not indicate the scope attributes of these variables for any enclosing parallel regions, if they exist."
"In determining the appropriate scope attribute for a variable used in the lexical extent of a parallel region, all references and definitions of the variable must be considered, including references and definitions which occur in any nested parallel regions."
Replace the first bullet in section 2.6.2.1 "PRIVATE clause" with:
"A new object of the same type is declared once for each thread in the team. One thread in the team is permitted, but not required, to re-use the existing storage as the storage for the new object. For all other threads, new storage is created for the new object."
See interpretation 026 for additional information about storage association of PRIVATE variables.
Question:
The OpenMP specification isn't clear about how statement functions should be treated with respect to data scope attribute clauses. For the following example, does the PRIVATE(J) clause affect the reference to J in IFUNC?
integer :: arr(10), j = 17 ifunc() = j !$omp parallel do, private(j) do i = 1, 10 arr(i) = ifunc() end do print *, arr end
Different implementation strategies for statement functions and PRIVATE could lead to different expectations about whether this is valid.
Namelist and variable format expressions pose similar issues.
Approved Response:
The Fortran committee has decided that the example above is invalid and has undefined results. This may be revisited in a future specification.
Edits:
In section 2.6.3, "Data environment rules", add the following bullet:
"Variables which appear in namelist statements, variable format expressions and in expressions for statement function definitions should not be specified in PRIVATE, FIRSTPRIVATE or LASTPRIVATE clauses."
Question:
When some members of the team encounter a work-sharing construct by a branch ( for example IF statement ), is the execution of the enclosed code region divided among these threads that encounter a work-sharing construct? Or, is it divided among all threads of the team?
Approved Response:
In Section 2.3, the first bullet states:
"Work-sharing constructs and BARRIER directives must be encountered by all threads in a team, or by none at all". So, the situation described above is considered "invalid" or "non-conforming". The results of such a program are undefined.
Edits:
None.
Question:
Regarding point 8 of section 2.6.3 on p30 :
e.g. is the following program legal?
common /c/ x,y !$omp parallel private (/c/) ... !$omp end parallel ... !$omp parallel shared (x,y) ... !$omp end parallel
common /c/ x,y !$omp parallel private (/c/) ... !$omp end parallel ... !$omp parallel shared (/c/) ... !$omp end parallel
Is this legal or not? It seems to me it should have the same legality as the case above.
Approved Response:
The restriction should read: "When a named common block is specified in a PRIVATE, FIRSTPRIVATE or LASTPRIVATE clause of a directive, none of its constituent elements may be declared in another scope attribute clause in that directive".
So the example above is valid, as is the following example:
common /c/ x,y !$omp parallel ... !$omp do private(/c/) ... !$omp end do !$omp do private(x) ... !$omp end do !$omp end parallel
Here is an invalid example:
!$omp parallel private(/c/), shared(x) ... !$omp end parallel
!$omp parallel private(/c/), shared(/c/) ... !$omp end parallel
This should be covered by bullet 10, but isn't quite. Bullet 10 ought to read:
"Clauses can be repeated as needed, but each variable and each named common block can appear explicitly in only one clause per directive ..."
Edits:
Change the 1st sentence of point 8 of 2.6.3 on p30 to:
"When a named common block is specified in a PRIVATE, FIRSTPRIVATE or LASTPRIVATE clause of a directive, none of its constituent elements may be declared in another scope attribute clause in that directive".
Change the 1st sentence of point 10 of 2.6.3 on p30 to:
"Clauses can be repeated as needed, but each variable and each named common block can appear explicitly in only one clause per directive ..."
Question:
What is the rule about aliased or overlapping variables ?
For example, a F77 program has an equivalence as follows:
integer a(100), b(100) equivalence (a(51), b(1)) !$OMP PARALLEL DO DEFAULT(PRIVATE) PRIVATE(i,j) & !$OMP& LASTPRIVATE(a) DO i=1,100 DO j=1,100 b(j) = j - 1 ENDDO DO j=1,100 a(j) = j ENDDO DO j=1,50 b(j) = b(j) + 1 ENDDO ENDDO !$OMP END PARALLEL DO print *, b end
For i in [1, 50], is b(i) equals 51 + i or i ?
Approved Response:
a and b are not associated inside the parallel region. The association only holds outside of the parallel region. The results of this program are undefined. See the edits below for additional information and examples.
Edits:
Replace bullet 4 in section 2.6.2.1 "PRIVATE clause" with the following:
"A variable declared as PRIVATE may be storage-associated with other variables when the PRIVATE clause is encountered. Storage association may exist because of constructs such as EQUIVALENCE, COMMON, etc. If a is a variable appearing in a PRIVATE clause and b is a variable which was storage-associated with a, then:
- The contents, allocation and association status of b are undefined on entry to the parallel construct.
- Any definition of a, or of its allocation or association status causes the contents, allocation and association status of b to become undefined.
- Any definition of b, or of its allocation or association status causes the contents, allocation and association status of a to become undefined."
Add the following invalid examples to the Appendix:
Example 1:
common /block/ x x = 1.0 !$omp parallel private (x) x = 2.0 call sub() ... !$omp end parallel ... subroutine sub() common /block/ x ... print *,x ! "x" is undefined. The result of the ! print is undefined. ... end subroutine sub
Example 2:
common /block/ x x = 1.0 !$omp parallel private (x) x = 2.0 call sub() ... !$omp end parallel ... contains subroutine sub() common /block/ y ... print *,x ! "x" is undefined. print *,y ! "y" is undefined. ... end subroutine end program
Example 3:
equivalence (x,y) x = 1.0 !$omp parallel private(x) ... print *,y ! "y" is undefined y = 10 print *,x ! "x" is undefined !$omp end parallel
Example 4:
integer a(100), b(100) equivalence (a(51), b(1)) !$omp parallel do default(private) private(i,j) lastprivate(a) do i=1,100 do j=1,100 b(j) = j - 1 enddo do j=1,100 a(j) = j ! "b" becomes undefined at this point enddo do j=1,50 b(j) = b(j) + 1 ! reference to "b" is not defined. "a" ! becomes undefined at this point. enddo enddo !$omp end parallel do ! The LASTPRIVATE write for "a" has ! undefined results. print *, b ! "b" is undefined since the LASTPRIVATE ! write of "a" was not defined. end
Question:
We would like your views on the following examples.
Example 1:
integer :: x x = 0; !$omp parallel . . !$omp parallel reduction (+:x) x = x + 1 . . . !$omp end parallel !$omp critical x = x + 2 !$omp end critical !$omp end parallel
We assume this example is invalid for at least 1 reason: the statement "x = x + 1" is not protected with respect to the outer parallel region.
Now, suppose a user modified the example as follows (not that they *would* of course - this is just to illustrate a point):
Example 2:
integer :: x x = 0; !$omp parallel . . !$omp parallel reduction (+:x) !$omp critical x = x + 1 !$omp end critical . . . !$omp end parallel !$omp critical x = x + 2 !$omp end critical !$omp end parallel
This code would *appear* to adhere to the letter of the specification, since "x = x + 1" is now protected for the outer parallel region. However, the "global reduction" that the compiler inserts at the end of the nested parallel region may not be protected by the same lock that is being used for the second critical section. We assume this example is invalid for this reason. Is this correct? If so, then we believe an implementation is allowed to use a distinct locking mechanism for each global reduction. Now, is the compiler *required* to use a different lock for each global reduction? Consider the following example. "s" is a procedure which may be invoked from both serial and parallel parts of the program:
Example 3:
subroutine s(x) !$omp parallel reduction(+:x) x = x + 1 !$omp end parallel end subroutine s program main . . !$omp parallel !$omp critical call s(x) !$omp end critical . . . !$omp end parallel end program
If the same lock is used for the "critical" and the "reduction", a deadlock could occur.
Approved Response:
Edits:
Add the following statement to Section 2.6.3, "Data environment rules":
"The shared variables that are specified in REDUCTION or LASTPRIVATE clauses become defined at the end of the construct. Any concurrent uses or definitions of those variables must be synchronized with the definition that occurs at the end of the construct to avoid race conditions."
Question:
Is it permissible to use a parameter (fortran) in a SHARED clause? I tried the following directive with the parameter "ndim" using SGI F90 compiler version 7.2.1
!$OMP PARALLEL DO & !$OMP SHARED( ndim, a ) & !$OMP PRIVATE( i, j ) DO j = 1, ndim DO i = 1, ndim a(i,j) = 0.73*real(i) + 0.15*real(j) END DO END DO
and I get the following compilation error:
!$OMP SHARED( ndim, a ) &
^ f90-1473 mfef90: ERROR TRANSPOSE, File = transpose.f90, Line = 96, Column = 15 Object NDIM must be a variable to be in the SHARED clause of the PARALLEL DO directive.
If parameters are not permitted in SHARED clauses, what it the reason for this restriction?
Approved Response:
A Fortran parameter isn't a variable, it's more like a macro in C:
#define ndim 3
In OpenMP you can only declare the scope of something that's a variable, because variables have storage associated with them. Parameters don't necessarily have any storage associated with them.
Edits:
None.
Question:
There seems to be a need for clarification of the OpenMP rules about DEFAULT(NONE). In particular, is the specification in an enclosed data scope attribute clause sufficient to satisfy the requirement for explicit attribute specification (as described in section 2.6.2.3 of the Fortran API)?
An additional question is whether a loop index variable is implicitly typed PRIVATE when there is an IMPLICIT(NONE) in effect. (Item 1 in section 2.6.3 only covers the case where the loop control variable would be SHARED by default, not where it would be NONE.)
For example, is the following program legal? (This one test case covers both of the sub-issues mentioned above.)
program test common /com1/ y, z(1000) !$omp parallel default(none) shared(z) ! Question: does i need to be specified as private and/or ! y need to be specified as shared in the region? !$omp do firstprivate(y) do i=0,9 z(i) = y enddo !$omp end parallel end
The API states in 2.6.2.3 that:
"Specifying DEFAULT(NONE) declares that there is no implicit default as to whether variables are PRIVATE or SHARED. In this case, the PRIVATE, SHARED, FIRSTPRIVATE, LASTPRIVATE, or REDUCTION attribute of each variable used in the lexical extent of the parallel region must be specified."
The trouble is, the API doesn't seem to say where it must be specified. On the PARALLEL directive, or is it enough to specify it on any enclosed work-sharing construct(s) that encompass all uses? Different implementations are interpreting this rule differently, resulting in user confusion. So there seems to be a need for an official interpretation.
Note that the C/C++ API did clean this up is section 2.7.2.5 of that API by saying that variables could be "specified in an enclosed data scope attribute clause, or used as a loop control variable referenced only in a corresponding for or parallel for."
I believe that this is a case where the Fortran API needs to catch up with the C/C++ API. I would like to see this done as an interpretation (rather than wait for the next revision) so as to clear up the confusion as quickly as possible.
Approved Response:
It is enough that the data scope attribute be specified for the variable on work-sharing constructs which encompass all uses of the variable. See the edits below.
Edits:
On pg. 24, Section 2.6.2.3 DEFAULT Clause, replace the bullet describing DEFAULT(NONE) with the following bullet:
"Specifying DEFAULT(NONE) requires that each variable used in the lexical extent of the parallel region be explicitly listed in a data scope attribute clause on the parallel region, unless it is:
- THREADPRIVATE, or,
- a Cray pointee, or
- a loop iteration variable used only as a loop iteration variable for sequential loops in the lexical extent of the region or parallel DO loops which bind to the region, or,
- only used in work-sharing constructs that bind to the region, and is specified in a data scope attribute clause for each such construct."
Question:
There is an issue we need to deal with in the OpenMP directives regarding F90 style (dope vector based) arrays which are declared shared which are passed as actual arguments to procedures from the parallel region declaring them shared. The problem involves F90 pointers, assumed shape arrays, array sections of a shared array, and possibly other cases which may occur in some implementations.
We have a common problem here in that we have maintained calling sequence compatability with F77 compilers when F90 routines call routines with unknown interfaces by performing copy in/out at call sites in cases where interfaces are not known by the caller. Similarly, copy in/out may be performed where interfaces are known and the above mentioned array types are associated with explicit shape dummy arguments (contiguous space). If an array is shared, but passed as an argument to an routine which modifies the array, copy in/out semantics provide limited success if more than one thread is updating the same array at the same time (everyone modifies their copy, no one elses).
I believe there are number of ways to deal with this issue. We could:
Approved Response:
Restrictions on the uses of such actual arguments will be introduced. See the edits below for additional information.
Edits:
Add the following bullet to section 2.6.3, "Data environment rules":
"If a SHARED variable, subobject of a SHARED variable, or an object associated with a SHARED variable or subobject of a SHARED variable appears as an actual argument in a reference to a non-intrinsic procedure, and the actual argument is an array section with a vector subscript, or the actual argument is an array section, an assumed-shape array or a pointer array, and the associated dummy argument is an explicit-shape or assumed-size array, any references to or definitions of the shared storage that is associated with the dummy argument by any other thread must be synchronized with the procedure reference, to avoid possible race conditions.The situations described above may result in the value of the shared variable being copied into temporary storage before the procedure reference, and back out of the temporary storage into the actual argument storage after the procedure reference, effectively resulting in references to and definitions of the storage during the procedure reference."
Question:
For an F90 code like this:
A(:,:) = B(:,:) + C(:,:)
To use OpenMP on this, do I need to use explicit indexing as in F77? i.e. convert the code back to f77:
and insert !$omp parallel do on this? Or, are there other ways to handle f90 array operations in OpenMP without giving it back to f77 syntax?Do j = 1, M Do i = 1,N A(i,j) = B(i,j) + C(i,j) enddo enddo
Approved Response:
You are correct, to use OpenMP on f90 array syntax, you are responsible for adding in the loop that you want parallelized.
Some OpenMP compilers may offer the option of automatically parallelizing such loops as a service to the user, but the OpenMP standard does not require this parallelization.
Edits:
None.
Question:
I have several questions about the interpretation of OPENMP features :
Approved Response:
The set of OMP routines named OMP_*_LOCK can be used to explicitly manipulate locks to create and control nested critical regions
Edits:
See interpretation 003 for edits to the specification.
This is a placeholder for an interpretation currently under review by the OpenMP ARB.
Question:
Section 2.1.2 (conditional compilation) states:
"The sentinel must be followed by a legal Fortran statement on the same line."
Does "a Fortran statement" mean "one Fortran statement"?
Is the following invalid?
!$ id = omp_get_thread_num() ; print *, id
Approved Response:
The Fortran committee intended multiple statements to be permitted on a conditional compilation line. Also, although the text refers to statements, the committee also intended the following to be permitted as conditional:
The specification will be changed to reflect this.
Edits:
Section 2.1.2:
1st paragraph: Change the first two sentences to:
"The OpenMP Fortran API permits Fortran lines to be compiled conditionally. The directive sentinels for conditional compilation that are accepted by an OpenMP-compliant compiler depend on the Fortran source form being used."
2nd paragraph: Remove the 1st sentence:
"The sentinel must be followed by a legal Fortran statement on the same line."
2nd paragraph: Change 2nd sentence to:
"During OpenMP compilation, the sentinel is replaced by two spaces, and the rest of the line is treated as a normal Fortran line."
Question:
Are fortran 90 intrinsics guaranteed to be parallel within a parallel section? How about computation expressed in array notation. Would that be also parallel?
for example:
c$omp parallel C = matmul(A,B) + C d = dot_product( e(:), f(:) ) aijmax = maxval( A, dim=1) c$omp end parallel
I hope we don't need to rewrite code in explicit do-loops to get parallelism when the compiler already knows a lot in the array notation or the f90 intrinsic.
Approved Response:
The OpenMP Fortran specification, Version 1, does not require that intrinsics or array language be automatically parallelized, nor does it provide facilities for specifying such parallelism.
The focus of Version 1 was on the existing practice of parallelizing loops. Parallelizing Fortran 90 intrinsics and array notation is one of the recommendations for inclusion in version 2 of OpenMP Fortran.
Edits:
None.