OpenMP Fortran Interpretations Version 1.0

Interpretations Submitted Prior to Feb. 1999

Table of Contents


001: I/O, IF, SCHEDULE, LASTPRIVATE, nesting, runtime routines

  • Date: Nov.10, 1997
  • Status: Approved

    Question:

    The following questions were extracted from mail received:

    1. 1.3 Compliance. Last paragraph, on I/O.

      This is both hard to implement efficiently under many systems and too vague to be useful. Interleaved at what level - records, bytes or bits? And what about open/close, positioning, I/O error handling and so on? It is a very hard problem indeed, and needs MUCH more attention than OpenMP has given it - better to say nothing at all than include that paragraph.

    2. 2.2 Parallel Region Construct. IF clause.

      This doesn't mention what happens if the expression calls a function, which directly or indirectly contains parallel constructions. It should explicitly allow it, make it implementation dependent, or forbid it (make it undefined). I suggest the last.

    3. 2.2 Parallel Region Construct. IF clause.

      It says that the region is executed in parallel only if the expression evaluates to .TRUE., but doesn't make it clear whether the construct is then available for binding purposes (2.7). This needs a clear statement one way or the other, as it affects the semantics of how an IF clause can be used very considerably.

    4. 2.3.1 DO Directive. SCHEDULE(GUIDED).

      I am completely confounded by this. What value does chunk decrease FROM? Also why is this bizarre special case useful, in the first place? Why can't I also have exponentially increasing, or varying according to some other rule (e.g. quadratic)?

    5. 2.6.2.5 LASTPRIVATE clause.

      What does "sequentially last" mean? Lexically last or chronologically last? This needs clarifying.

    6. 2.8 Directive Nesting.

      This does not forbid MASTER clauses within CRITICAL ones, which I think is an oversight. Certainly, it can create deadlock in otherwise correct programs, and I can see no good reason for allowing it.

    7. 3.1.6 OMP_IN_PARALLEL.

      The first paragraph says that a serialized parallel region is not considered to be parallel, and the last paragraph says the opposite! The latter is clearly a better specification, as otherwise there are two states that cannot be distinguished.

    8. 3.2 Lock routines.

      Why on earth should there be ANY integer type large enough to contain an address? Many hardware architectures and operating systems have pointers that are significantly larger than any integer. If OpenMP is going to require major language extensions to standard systems, it should make the fact much clearer.

    Approved Response:

    1. OpenMP does not require a vendor to provide a fully thread-safe I/O library. Depending on the implementation, data may be interleaved within records, or records may be interleaved within the file. It is the user's responsibility to ensure that any I/O operations (reads, writes, opens, etc.) to the same unit are serialized. The specification will be modified to make this clear.

    2. The IF expression on a parallel region is permitted to call a procedure which contains parallel constructs. The IF expression is evaluated outside the context of the parallel region. In other words,
      !$omp parallel if (logexpr()) . . . !$omp end parallel

      is equivalent to:

      logical_tmp = logexpr() !$omp parallel if (logical_tmp) . . . !$omp end parallel

      Similarly, the CHUNK expression on a SCHEDULE clause is evaluated outside the context of the DO construct.

    3. A parallel region is available for binding even if it is serialized. The specification will be clarified.

    4. The initial value of chunk is implementation dependent. GUIDED scheduling provides dynamic load balancing while reducing overhead. See the edits below, and interpretation 006 for additional information about GUIDED scheduling.

    5. By the sequentially last iteration, we mean the iteration which would be executed last if the loop were run sequentially.

    6. MASTER sections within CRITICAL sections should not cause deadlock, since there is no barrier on entry or exit from the MASTER section, and threads other than the MASTER thread skip the section.

    7. The statements in this section mean the following: if the function is in the dynamic extent of some region executing in parallel, it will return true. It is true that a serialized region is not considered to be a region executing in parallel. If the function is in the dynamic extent of a single serialized region, the function will return false. It will also return false if the function is in the dynamic extent of a set of nested regions in which each region is serialized. The OMP_IN_PARALLEL function does not bind to the closest enclosing region; the statement about its having global scope is intended to make this clear.

    8. In retrospect, we agree that lock variables should not have been defined in terms of a type whose size is that of an address. It is our intention to fix this in Version 2 of the specification.

    Edits:

    1. Add the following paragraph to the end of section 2.2:
      "Unsynchronized use of Fortran I/O statements by multiple threads on the same unit has undefined behaviour."
    2. Add the following sentences to the second-last paragraph of section 2.2:
      "The IF expression is evaluated outside the context of the parallel region. Results are undefined if the IF expression contains a function reference which has side effects."

      Add the following sentences to section 2.3.1:

      "The CHUNK expression is evaluated outside the context of the DO construct. Results are undefined if the CHUNK expression contains a function reference which has side effects."
    3. Add the following as the first bullet to section 2.7, "Binding":
      "A parallel region is available for binding purposes, whether it is serialized or executed in parallel."
    4. Add the following description of GUIDED scheduling to an appendix in the Fortran specification:
      "Guided scheduling is appropriate for the case in which the threads may arrive at varying times at a DO work-sharing construct, with each iteration requiring about the same amount of work. This can happen if, for example, the DO loop is preceded by one or more work-sharing SECTIONS or DO constructs with NOWAIT clauses. Like dynamic, the guided schedule guarantees that no thread waits at the barrier longer than it takes another thread to execute its final iteration, or final k iterations if a chunk size of k is specified. Among such schedules, the guided schedule is characterized by the property that it requires the fewest synchronizations."

    002: FLUSH

  • Date: Nov.12, 1997
  • Status: Approved.

    Question:

    From the example (A.13), I assume FLUSH synchronizes the local thread with (global) memory. From a programmers point of view, it is therefore a purely local operation. This is not at all clear from the description of FLUSH (2.5.5). Particularly since all the other directives in section 2.5 are concerned with synchronizing threads (and not necessarily memory, unless FLUSH is implied).

    The example also seems to imply that a local FLUSH is required for memory reads as well as writes:

    ISYNC(IAM) = 1 C$OMP FLUSH(ISYNC) DO WHILE(ISYNC(NEIGH).EQ.0) C$OMP FLUSH(ISYNC) ENDDO

    Thus the FLUSH on thread NEIGH is presumably not sufficient for ISYNC(NEIGH) to be visable on thread IAM, a FLUSH on IAM after the FLUSH on NEIGH is also required.

    This is not at all clear from section 2.5.5. Presumably, "Subsequent reads of thread-visible variables fetch the latest copy of the data" means "The first subsequent read of each thread-visible variable ON THE THREAD ISSUING THE FLUSH will fetch the latest copy of the data". In particular a SECOND read of a thread-visible variable may or may not fetch the latest copy.

    This all seams to make sense from a compiler optimization point of view, but needs to be made clearer from a programmer's point of view.

    Approved Response:

    Yes, the thread which issues the flush will fetch the latest copy of the thread-visible variable. And a flush is required for reads as well as writes. The specification will be modified to clarify both points.

    Edits:

    In section 2.5.5, "FLUSH directive", replace the 2 paragraphs beginning with "The FLUSH directive identifies..." and "Implementations must ensure..." with the following text:

    "The FLUSH directive, whether explicit or implied, identifies a cross-thread sequence point at which the implementation is required to ensure that each thread in the team has a consistent view of certain variables in memory.

    A consistent view requires that all memory operations (both reads and writes) that occur before the FLUSH directive in the program be performed before the sequence point in the executing thread; similarly, all memory operations that occur after the FLUSH must be performed after the sequence point in the executing thread.

    Implementations must ensure that modifications made to thread-visible variables within the executing thread are made visible to all other threads at the sequence point. For example, compilers must restore values from registers to memory, and hardware may need to flush write buffers. Furthermore, implementations must assume that thread-visible variables may have been updated by other threads at the sequence point, and must be retrieved from memory before their first use past the sequence point.

    Finally, the FLUSH directive only provides consistency between operations within the executing thread and global memory. To achieve a globally consistent view across all threads, each thread must execute a FLUSH operation."


    003: Local variables with the SAVE attribute and SHARED/PRIVATE

  • Date: Nov. 18, 1997
  • Status: Approved

    Question:

    In section 2.6.3 of the Fortran API specification, item 7 defines default (actually mandatory) attributes (SHARED/PRIVATE) for variables in called subroutines within the dynamic extent of a parallel region. It specifies SHARED for common/module variables and PRIVATE for local non-SAVEd variables, but it leaves the attributes of SAVEd local variables undefined. This is going to create major portability problems. I would prefer PRIVATE SAVEd variables, but I assume SHARED is easier to implement - so I suggest the latter be the default. If there is some VERY good reason for leaving SAVEd variables undefined, then item 7 should explictly say they are implementation dependent and recommend that they therefore be avoided in portable programs. Note that this will likely make the vast majority of OpenMP programs with dynamic scope non-portable. There is simply no way that a program is going to work right with variables defined SHARED on one machine and PRIVATE on another. In fact, I think it likely that purchasers of OpenMP compilers (and the machines they run on) would then REQUIRE a particular behaviour for SAVEd variables, leaving the compilers/machines that chose the "wrong" attribute out in the cold.

    Also, 2.6.2.3 needs to explain more clearly that it does not apply to called subroutines. I suggest adding the following after the 2nd sentence: Local variables, modules and common blocks in subroutines called within the parallel region are also not affected, see section 2.6.3 for data scope rules.

    Approved Response:

    Local variables with the SAVE attribute declared in procedures called from a parallel region are implicitly SHARED. The specification will be clarified.

    The first sentence of section 2.6.2.3 states:

    "The DEFAULT clause allows the user to specify a PRIVATE, SHARED, or NONE scope attribute for all variables in the lexical extent of any parallel region."
    The lexical extent does not include procedures called from a parallel region.

    Edits:

    To section 2.6.3, bullet 7, add the following statement:

    "Local variables in called routines that have the SAVE attribute are SHARED."

    004: REDUCTION

  • Date: Nov. 28, 1997
  • Status: Approved

    Question:

    Does the REDUCTION clause in OpenMP guarantee the same floating point result with different numbers of processors?

    Approved Response:

    Bit-wise identical results cannot be guaranteed for floating-point reductions, even if the same number of threads is used. The final result is dependent on the order in which partial results are globally accumulated and the order of accumulation may vary from one run to the next.

    Edits:

    In section 2.6.2.6, pg. 30, add the following sentence to the end of the second paragraph:

    "Since the intermediate values of the REDUCTION variables may be combined in random order, there is no guarantee that bit-identical results will be obtained for floating point reductions from one parallel run to another."

    005: Module data

  • Date: Nov. 28, 1997
  • Status: Approved

    Question:

    I am concerned with the following statement in OpenMP:

    2.6.3 Data environment rules

    ...

    7. ... Common blocks and modules in called routines in the dynamic extent of a parallel region always have an implicit SHARED attribute, unless they are THREADPRIVATE common blocks.

    It seems to me, any variables defined in the data segments of Fortran 90 MODULEs would be limited to be SHARED.

    If it is the case, would the following generally valid Fortran 90 MODULE be wrong under OpenMP?

    module snglthrd private ! not the same OpenMP ``PRIVATE''! public :: settmp, usetmp real,save :: tmp contains subroutine settmp(a) tmp=a end subroutine settmp subroutine usetmp() print*,tmp end subroutine usetmp end module snglthrd program multithrd use snglthrd real a(8) do i=1,8 a(i)=i end do !$omp parallel do shared(a) private(i) do i=1,8 call settmp(a(i)) ! a race condition? call usetmp() end do !$omp end parallel do end program multithrd

    Approved Response:

    It is true that in the current specification, module data is restricted to being shared for a parallel region, and in your example, "settmp" does create a race condition. Many implementations of Fortran 90 treat module data as common block data, mapping the data declared in a module to a unique common block. The same race condition can occur if the variable tmp is placed in a common block.

    The OpenMP committee intends to address issues pertaining to Fortran 90 and Fortran 95 in Version 2 of the Fortran specification. Facilities for specifying how module data should be treated will be considered for incorporation into Version 2."

    Edits:

    None.


    006: GUIDED

  • Date: Jan. 29, 1998
  • Status: Approved

    Question:

    At the bottom of page 12 of the OpenMP Fortran API, version 1-Oct '97, the GUIDED suboption of SCHEDULE is explained. The two sentences seem to conflict with one another in describing the chunk parameter.

    I see two interpretations:

    1. The chunk parameter is the initial value of a variable. chunk iterations are dispatched to the first entering thread, chunk is then multiplied by some factor between 0 and 1. chunk iterations are dispatched to the next thread. chunk is again multiplied by the same factor and so on until all iterations are exhausted, or until chunk becomes 1, at which time it looks like SCHEDULE(dynamic,1)
    2. The chunk parameter is a constant. Some initial value is used for the chunk variable and this value multiplied by the factor after each set of chunk iterations until it becomes less than the chunk parameter. At this time the behavior is equivalent to SCHEDULE(dynamic,chunk_parameter).

    Are either of these correct, or does the GUIDED suboption work some other way?

    Approved Response:

    The second interpretation is what was intended.

    Edits:

    The description of GUIDED on pg. 12 of the OpenMP Fortran API, Ver 1.0, should read:

    "When SCHEDULE(GUIDED,chunk) is specified, the iteration space is divided into pieces such that the size of each successive piece is exponentially decreasing. chunk specifies the size of the smallest piece, except possibly the last. The size of the initial piece is implementation dependent. As each thread finishes a piece of the iteration space, it dynamically obtains the next available piece. When no chunk is specified, it defaults to 1."

    007: FIRSTPRIVATE and LASTPRIVATE Variables

  • Date: Jan. 29, 1998
  • Status: Approved

    Question:

    If a variable is declared as both FIRSTPRIVATE and LASTPRIVATE, is an OpenMP implementation required to ensure that the copy for FIRSTPRIVATE will happen before the shared variable write for LASTPRIVATE?

    Approved Response:

    Yes, an OpenMP implementation is required to ensure that the copy will happen before the write.

    The ARB has also discussed the issue of a variable appearing in a chunk expression for the SCHEDULE clause as well as in a LASTPRIVATE clause. In this case, it will be the user's responsibility to ensure that the chunk parameter is the same for all threads. The OpenMP API will be amended to include this restriction.

    Finally, in the case of LASTPRIVATE variables being used in a work-sharing DO which specifies NOWAIT, it is the user's responsibility to ensure that the values of such variables be used only after a barrier. A cautionary statement will be added to the specification.

    Edits:

    Add the following rule to the bulleted list on pg. 13, Section 2.3.1:

    "The value of the chunk parameter must be the same for all of the threads in the team."

    Add the following rule to section 2.6.3 "Data environment rules":

    "Variables that are specified as LASTPRIVATE for a work-sharing directive for which NOWAIT appears, must not be used prior to a barrier."

    Add the following to the end of section 2.6.3, "Data environment rules":

    "An implementation that conforms to the OpenMP Fortran API must adhere to the following rules:
    • If a variable is specified as FIRSTPRIVATE and LASTPRIVATE, the implementation must ensure that the update required for LASTPRIVATE occurs after all initializations for FIRSTPRIVATE."

    008: PRIVATE and COPYIN clauses

  • Date: Unknown
  • Status: Approved

    Question:

    In a few directives (notable "PARALLEL" and "DO"), it is possible to have a privatization clause which privatizes a variable, and to also use the value of that variable in the directive itself, for instance:

    Example 1a.

    logical logvar ... !$omp parallel if (logvar) private(logvar)

    Example 1b.

    !$omp do private(i) schedule(static, i)

    There should be a rule stating whether this is legal, and if it is, stating whether the variable is evaluated in the shared scope or in the private scope (in which case the user would have to use FIRSTPRIVATE). I think the answer is either (1) this usage is undefined, or (2) the variable is evaluated in the enclosing scope when its value is needed; but it would be nice if this were explicitly stated.

    The harder problem has to do with LASTPRIVATE. LASTPRIVATE is a write to shared storage, and it is a particularly interesting one because it isn't in the source code where the user can write synchronization code for it. The OpenMP document doesn't seem to promise anything about this case.

    Also, the specification needs to state whether it is the responsibility of the implementation or the user to ensure that all COPYIN assignments are completed before the master's data is modified.

    Approved Response:

    Expressions for clauses on PARALLEL and DO directives are evaluated in the enclosing scope of the directive. See interpretation 001 as well.

    See interpretation 007 for information about LASTPRIVATE variables.

    It is the responsibility of the implementation to ensure that the value of each threadprivate copy is the same as the value of the master thread copy when the master thread reached the directive containing the COPYIN clause.

    Edits:

    See edits from interpretations 001 and 007.

    Add the following sentence to section 2.6.2.7 "COPYIN":

    "An OpenMP-compliant implementation is required to ensure that the value of each threadprivate copy is the same as the value of the master thread copy when the master thread reached the directive containing the COPYIN clause."


    009: Lock routines

  • Date: Unknown
  • Status: Approved

    Question:

    I had a scan through the OpenMP Fortran API doc v 1.0 and have a couple of comments. I write these from the perspective of an interest in portability.

    1. p38, sec 3.2

      "For all these routines, var should be of type integer and of a KIND large enough to hold an address."

      Why is this? Is it just to save you the overhead of a lookup?

      "For example, on 64-bit addressable systems, the var may be declared as INTEGER(KIND=8)"

      Why should it? There is no connection between KIND=8 and 64 bits. You could use SELECTED_INT_KIND().

    2. p49, A.15

      "In the following example, note that the argument to the lock routines should be of size POINTER".

      There is no "standard" fortran meaning to this. I don't think you should be implementation specific in the examples (same applies later in the stubs).

    Approved Response:

    1. Lock variables were defined as addresses for convenience of implementation. In retrospect, we feel that this wasn't the best way to define lock variables, since there is no concept of "address" in Fortran. We will attempt to clean this up in Version 2 of the OpenMP specification.

      You are correct that the use of INTEGER(KIND=8) is incorrect. The specification will be changed to use SELECTED_INT_KIND.

    2. Again, we agree that there is no standard meaning to "of size POINTER" in Fortran. This will be addressed in Version 2.

    Edits:

    In section 3.2 on pg. 36, change

    INTEGER(KIND=8)

    to

    INTEGER(SELECTED_INT_KIND(18))

    010: FLUSH

  • Date: Feb. 7, 1998
  • Status: Approved.

    Question:

    It is clear that at the FLUSH, all previous writes of the processor should be completed at memory. The semantics with respect to reads, however, are unclear. It appears to me that a FLUSH should also ensure the following (or some variation thereof):

    Requirements such as the above are needed to ensure correct execution of a program that may do producer-consumer synchronization with ordinary variables such as the following:

    P1 P2 A = 1 B = 1 FLUSH Flag = 1 while (Flag == 1) FLUSH tmp = A tmp = B

    Approved Response:

    Your assumptions about reads are correct. The specification needs to be clarified with respect to the semantics of FLUSH as it applies to reads.

    Edits: See interpretation 002.


    011: F90 features: modules, array language, intrinsics

  • Date: Unknown
  • Status: Approved

    Questions:

    1. Should array syntax, foralls, and intrinsics be auto-parallelized? I.e. are they automatically treated like "PARALLEL DO", or does the programmer have to insert directives before they're parallelized?

    2. What about modules? What if a module is used within a subroutine that's called inside a parallel region? PGI makes all variables in a module static, so they'd be treated like common. However, it's possible that users might expect them to be private just as if they'd been declared in line. I don't think that's correct, but treatment of modules needs to be spelled out clearly in the standard.

    3. How do intrinsics like MATMUL and TRANSPOSE behave when called from within a parallel region or a parallel construct? Do they split up the work between the processors, do the work redundantly, or is the mode of operation determined by context (e.g. do it redundantly if the result is private but share the work otherwise). What if one of the input arrays is private, but the result is not? Coding error? Presumably, but it needs to be spelled out in the standard.

    Approved Response:

    1. The OpenMP specification does not specify that auto-parallelization should occur for any constructs, including array syntax, forall and intrinsics. Further, the directives in the specification cannot be used with these constructs. The OpenMP ARB will be looking at some of these issues in the future, potentially for incorportation in a future specification.

    2. Bullet 7 of section 2.6.3 states:
      "Common blocks and modules in called routines in the dynamic extent of a parallel region always have an implicit SHARED attribute, unless they are THREADPRIVATE common blocks."

    3. The work for MATMUL and TRANSPOSE are done redundantly by all threads executing the parallel region. As for the question about private array arguments and a shared result, this issue isn't really any different from any assignment to a shared variable; you may need to be careful about how the assignment is done. For example, the same issue exists for the following, where "a" is shared and "b" and "c" are private:
      a = b + c

    Edits:

    None.


    012: THREADPRIVATE COMMON data in modules

  • Date: Mar. 24, 1998
  • Status: Approved

    Question:

    I have a Fortran90/95 question that may lead to a clarification in the OMP Fortran standard.

    The question is whether or not common block names from common blocks declared in the declaration section of a module are visible in scopes which import that module via a "USE" statement.

    Here's a small test case that makes this issue relevant to the OMP Fortran standard:

    module foo common /t/ a !$omp threadprivate(/t/) end module foo subroutine bar use foo !$omp parallel copyin(/t/) ... !$omp end parallel end subroutine bar

    This program is illegal if common block names are not exported from a module via a use statement because the COPYIN clause would be referring to an undeclared common block, /t/, not the /t/ declared in the module.

    The F95 standard (Draft version, 1995) says the following about use statements:

    11.3.2 The USE statement and use association

    The USE statement provides the means by which a scoping unit accesses named data objects, derived types, interface blocks, procedures, generic identifiers, and namelist groups in a module.

    Common block names do not seem to me to fit into any of the above catagories.

    Furthermore, the F95 standard says this about common block storage sequences (emphasis is mine):

    5.5.2.1 Common block storage sequence

    Only COMMON statements and EQUIVALENCE statements appearing in the scoping unit contribute to common block storage sequences formed in that unit. Variables, in common blocks, made accessible by *use association* or host association do not contribute.

    I interpret this to mean that for the following testcase, the common block storage sequence for /t/ visible in the subroutine scope consists of only variable "b", not "a" then "b":

    module foo common /t/ a end module foo subroutine bar use foo common /t/ b end subroutine bar

    This seems like further evidence that common block names from modules should not be visible via use association.

    If all this is correct, then the original testcase should be modified as follows to be legal F90/F95:

    module foo real a(100) common /t/ a !$omp threadprivate(/t/) end module foo subroutine bar use foo !$omp parallel copyin(a) ! note change to copyin list ... !$omp end parallel end subroutine bar

    This syntax can be inconvenient, since all members of the common need to be listed instead of the common block name. Furthermore, since there is no way to declare a common block THREADPRIVATE by naming its members, the THREADPRIVATE directive must be present in the module, not in the subroutine which contains the USE statement.

    If anyone disagrees with this analysis please let us all know. I am a novice user of F90/F95 so I am easily confused by the terminology in the standard.

    If this analysis is correct, perhaps the OMP Fortran standard could be amended to clarify what happens for this case, since it doesn't seem obvious to me what the restrictions are w.r.t. common block names in COPYIN clauses from reading the OMP Fortran standard.

    Approved Response:

    The preceding analysis is correct. Common block names are not accessible by use association or host association. The following are further examples which are invalid:

    Example 1:

    module foo common /t/ a end module foo subroutine bar use foo !$omp threadprivate(/t/) !$omp parallel ... !$omp end parallel end subroutine bar

    Example 2:

    common /t/ a !$omp threadprivate(/t/) ... contains subroutine bar !$omp parallel copyin(/t/) ... !$omp end parallel end subroutine bar end program

    Example 2 may be correctly rewritten as follows:

    common /t/ a !$omp threadprivate(/t/) ... contains subroutine bar common /t/ a !$omp threadprivate(/t/) !$omp parallel copyin(/t/) ... !$omp end parallel end subroutine bar end program

    Edits:

    In the OpenMP Fortran API, version 1-Oct '97, the following additions should be made:

    1. In Section 2.6.1, THREADPRIVATE, paragraph 2, after the sentence:
      "This directive must appear in the declaration section of the routine after the declaration of the listed common blocks."

      add the following statement:
      "Although variables in common blocks can be accessed by use association or host association, common block names cannot. This means that a common block name specified in a THREADPRIVATE directive must be declared to be a common block in the same scoping unit in which the THREADPRIVATE directive appears."
    2. In Section 2.6.2.7, COPYIN clause, add the following sentence before the example:
      "Although variables in common blocks can be accessed by use association or host association, common block names cannot. This means that a common block name specified in a COPYIN clause must be declared to be a common block in the same scoping unit in which the COPYIN clause appears."

    013: Privitization on Worksharing Directives

  • Date: Apr. 8, 1998
  • Status: Approved

    Question:

    I have found a feature in the OpenMP API that seems to be hard to implement. It concerns data scope attribute clauses.

    OpenMP API for FORTRAN, section 2.6.3, page 28:

    "Variables that are privatized in a parallel region cannot be privatized again on an enclosed work-sharing directive. As a result, variables that appear in the PRIVATE, FIRSTPRIVATE, LASTPRIVATE and REDUCTION clauses on a work-sharing directive must have shared scope in the enclosing parallel region"

    Consider the following example:

    !$OMP PARALLEL PRIVATE(x) f(x) !$OMP END PARALLEL ... !$OMP PARALLEL SHARED(x) f(x) !$OMP END PARALLEL PROCEDURE F(x) !$OMP DO PRIVATE(x) ... RETURN END

    In the first case we should not make a local copy of x in procedure F, we just use x (even without any synchronization, because x is a private for a thread). In the last case, though, we should use a local copy of x instead. If I properly understand, it must be some run-time decision whether make a local copy or not.

    Approved Response:

    The document section you quote says that a variable declared as PRIVATE on the parallel region cannot also be declared as PRIVATE on a work-sharing construct. Your example violates this rule. In the first case

    !$OMP PARALLEL PRIVATE(x) f(x) !$OMP END PARALLEL

    x is declared as PRIVATE and then, in the work-sharing constuct in the extended region

    PROCEDURE F(x) !$OMP DO PRIVATE(x) ... RETURN END

    x is declared as PRIVATE again. The rule prohibits this.

    Edits:

    None.


    014: Syntax of Parallel DO Loops

  • Date: May 4, 1998
  • Status: Approved

    Question:

    The syntax for most of the OpenMP constructs in Fortran enclose blocks of code, where a block is defined syntactically by the Fortran standard. The syntax diagrams for the DO and PARALLEL DO directives show do_loop's as part of the syntax diagram, but the Fortran standard doesn't define a non-terminal named do_loop.

    Was it intended that the do_loop should be the block-do-construct of Fortran 90 and 95, or that it should be either the block-do-construct or the nonblock-do-construct? The difference is that a block-do-construct always ends with END DO or CONTINUE, and doesn't have a shared "termination" statement. For example,

    DO 100 I = 1, 10 DO 100 J = 1, 10 A(I, J) = 1 100 CONTINUE

    and

    DO 200 I = 1, 10 200 B(I) = I
    are nonblock-do-constructs.

    Here's an example that seems awkward if nonblock-do-constructs are allowed:

    DO 100 I = 1, 10 <-------. !$OMP DO <----. | DO 100 J = 1, 10 <-. | | ... | | | 100 CONTINUE <-' |<-' !$OMP END DO NOWAIT <----' END

    The set of statements and directives in the I loop intersects the set of statements and directives in the DO work-sharing construct, but neither is a superset of the other.

    Approved Response:

    The OpenMP ARB feels that it is important that both block-do and non-block-do loops be permitted with PARALLEL DO and work-sharing DO directives. However, if a user specifies an ENDDO directive for a non-block-do construct with shared termination, then the matching DO directive must precede the outermost DO.

    The following are some examples:

    Valid Example 1:

    DO 100 I = 1,10 !$OMP DO DO 100 J = 1,10 ... 100 CONTINUE

    Valid Example 2:

    !$OMP DO DO 100 J = 1,10 ... 100 A(I) = I + 1 !$OMP ENDDO

    Valid Example 3:

    !$OMP DO DO 100 I = 1,10 DO 100 J = 1,10 ... 100 CONTINUE !$OMP ENDDO

    Invalid Example 1:

    DO 100 I = 1,10 !$OMP DO DO 100 J = 1,10 ... 100 CONTINUE !$OMP ENDDO

    Edits:

    In section 2.3.1, after the syntax diagram, add

    "The do-loop may be a do-construct, an outer-shared-do-construct or an inner-shared-do-construct. A DO construct that contains several DO statements that share the same DO termination statement syntactically consist of a sequence of outer-shared-do-constructs, followed by a single inner-shared-do-construct. If an END DO directive follows such a DO construct, a DO directive can only be specified for the first (i.e., the outermost) outer-shared-do-construct".

    In section 2.4.1, after the syntax diagram, add

    "The do-loop may be a do-construct, an outer-shared-do-construct or an inner-shared-do-construct. A DO construct that contains several DO statements that share the same DO termination statement syntactically consist of a sequence of outer-shared-do-constructs, followed by a single inner-shared-do-construct. If an END PARALLEL DO directive follows such a DO construct, a PARALLEL DO directive can only be specified for the first (i.e., the outermost) outer-shared-do-construct".

    015: FLUSH

  • Date: May 5, 1998
  • Status: Approved. Edits Pending.

    Question:

    I'm confused by the definition of FLUSH in v1.0 - - the descriptions of FLUSH seem to be inconsistent with the example.

    The specification seems to suggest that after a FLUSH, a shared variable is updated in all the other threads, without the others performing any synchronization.

    While in the following code, from example A.13 on page 48, the second flush suggests a thread has to perform a flush to obtain the up-to-date data. Is the second flush redundant?

    !$OMP PARALLEL DEFAULT(PRIVATE) SHARED(ISYNC) CALL WORK() ISYNC(IAM) = 1 !$OMP FLUSH(ISYNC) DO WHILE (ISYNC(NEIGHBOR) .EQ. 0) !$OMP FLUSH(ISYNC) ENDDO !$OMP END PARALLEL

    Approved Response: TBD.

    The second flush in the example is required. See interpretation 002 for more information.

    Edits:

    See interpretation 002.


    016: ORDERED

  • Date: May 14, 1998
  • Status: Approved

    Question:

    I am interested in the restrictions to the ORDERED Directive. Can a OpenMP Fortran program include two ordered sections in a DO, like the following?

    !$OMP DO DO I = 1, N ... !$OMP ORDERED ... !$OMP END ORDERED ... !$OMP ORDERED ... !$OMP END ORDERED ... END DO

    I want to verify my interpretation of the specification. Would you please let me know if this program is valid?

    Approved Response:

    It is possible to have multiple ORDERED sections within a DO specified with the ORDERED clause. However, the example above is invalid, because the API states the following:

    An iteration of a loop with a DO directive must not execute the same ORDERED directive more than once, and it must not execute more than one ORDERED directive.

    In your example, all iterations execute 2 ORDERED sections. The following is a valid example of a DO with more than one ORDERED section:

    !$omp do ordered do i = 1,n . . . if (i <= 10) then . . . !$omp ordered write(4,*) i !$omp end ordered endif . . . if (i > 10) then . . . !$omp ordered write(3,*) i !$omp end ordered endif enddo

    Edits:

    None.


    017: Loop Control Variables

  • Date: May 19, 1998
  • Status: Approved

    Question: I have a couple of questions regarding rules relating to iteration variables.

    OpenMP states: "Sequential DO loop control variables in the lexical extent of a PARALLEL region that would otherwise be SHARED based on default rules are automatically made private on the PARALLEL directive"

    Also, I believe we intended that parallel DO loop control variables should be default private for the work-sharing DO construct, although I can't find words to that effect. For example:

    !$omp parallel . . . !$omp do ! "i" is private for the DO do i = 1,100 ... enddo !$omp end parallel

    Can someone point me to the text that states this rule?

    Finally, OpenMP states:

    "Variables which are privitized in a parallel region cannot be privitized again on an enclosed work-sharing directive"

    Assuming all of this is true, are the following examples OpenMP-conformant or invalid?

    Example 1:

    !$omp parallel ! "i" is assumed private here as it is . ! used as a sequential loop control var do i = 1,100 ... enddo . !$omp do do i = 1,100 ! Is this an implicit re-privitization? ... ! Is this code legal? enddo !$omp end parallel

    Example 2:

    !$omp parallel ! "i" is assumed private here as it is . ! used as a sequential loop control var do i = 1,100 ... enddo . !$omp do private(i) ! Is this a re-privitization? do i = 1,100 ! Is this code legal? ... enddo !$omp end parallel

    Example 3:

    !$omp parallel private(i) . . !$omp do do i = 1,100 ! Is this O.K.? Is this considered a ... ! reprivitization? enddo !$omp end parallel

    Approved Response:

    The text which describes scope information for iteration variables of work-sharing DO loops is in section 2.3.1 on pg. 13:

    "Parallel DO loop control variables are block-level entities within the DO loop."

    Examples 1 and 3 above are legal. Iteration variables for work-sharing DO loops are considered to be different variables than those in the parallel region, because of the rule above.

    Example 2 is an illegal program, because there is in fact a re-privitization on the DO directive. The OpenMP ARB believes that it is desirable for code such as this to be legal, and will likely relax rules on reprivitization in Version 2 of the Fortran API.

    Edits:

    None.


    018: MASTER

  • Date: June 2, 1998
  • Status: Approved

    Question:

    We have come across a question concerning the MASTER directive. MASTER is mentioned in the subsection "synchronization constructs", however, the description says (Ver 1.0 - Oct 1997, page 18)

    "The other threads in the team skip the enclosed section and CONTINUE EXECUTION. There is NO IMPLIED BARRIER either on entry or exit from master section."

    We are now wondering whether in some sense the MASTER directive does a synchronization or not.

    Approved Response:

    MASTER is not a synchronization construct, and was incorrectly classified as such.

    Edits:

    In the introduction for Section 2, change the description of section 2.5 to:

    "Section 2.5, page 16, describes synchronization constructs and the MASTER directive."

    Change the heading of Section 2.5 to:

    "Synchronization Constructs and the MASTER directive"

    Change the first sentence of Section 2.5 to:

    "The following sections describe the synchronization constructs and the MASTER directive:"

    019: Nested constructs

  • Date: June 10, 1998
  • Status: Approved

    Question:

    I have a question about OpenMP: Does OpenMP allow nested Parallel sections?

    I'd like to know better the language, and I am in need of a demo to make some examples.

    Approved Response:

    OpenMP permits nested parallel regions. For example, the following is OpenMP-conformant:

    !$omp parallel sections !$omp section !$omp parallel sections . . . !$omp end parallel sections . . . !$omp section . . . !$omp end parallel sections

    Note that an OpenMP-compliant implementation is permitted to serialize a nested parallel region.

    OpenMP does not permit the nesting of work-sharing constructs (such as SECTIONS) within other work-sharing constructs that bind to the same parallel region.

    Edits:

    None.


    020: Definition of ATOMIC

  • Date: June 30, 1998
  • Status: Approved

    Question:

    On p20 of the OpenMP Fortran API is the statement :

    "The following restriction applies to the ATOMIC directive

    I am baffled by this:

    1. From a pedantic point of view it is incorrect since the ATOMIC directive itself does not reference x.
    2. At the bottom of p19 it specifies that x is a scalar variable, yet the restriction refers to the storage location x which is quite a different thing. So what is x?
    3. If x is interpreted as a variable then the restriction does not seem to be meaningful.
    4. If x is interpreted as a storage location then this could be accessed through equivalence with different type and type parameters. Is this the case in question?
    5. What is the scope of the restriction? Just the assignment or globally? i.e. what does "all" mean?

    I would be pleased if you could provide an explanation for this restriction and examples that violate it.

    Approved Response:

    The statement you refer to on page 20 ought to read:

    "The following restriction applies to the use of ATOMIC directives:

    All atomic references to the storage location of variable x throughout the program are required to have the same type and type parameters."

    So, "x" is a scalar variable and the restriction applies to references of it's storage. The restriction is required, as you point out, because of equivalence and other mechanisms for storage association.

    The following are some examples.

    Invalid Example 1:

    integer:: i real:: r equivalence(i,r) !$omp parallel . . . !$omp atomic i = i + 1 . . . !$omp atomic r = r + 1.0 !$omp end parallel

    Invalid Example 2:

    subroutine fred() subroutine sub() common /blk/ i common /blk/ r integer:: i real:: r !$omp parallel . . . . . . !$omp atomic !$omp atomic i = i + 1 r = r + 1.0 . . end subroutine . call sub() !$omp end parallel

    Invalid Example 3:

    NB: Although the following example might work on some implementations, this is considered a non-conforming program.

    integer:: i real:: r equivalence(i,r) !$omp parallel . . . !$omp atomic i = i + 1 !$omp end parallel . . . !$omp parallel . . . !$omp atomic r = r + 1 !$omp end parallel

    Edits:

    On p. 20 of the Fortran API, change the restriction for ATOMIC to read:

    "The following restriction applies to the use of ATOMIC directives:

    All atomic references to the storage location of variable x throughout the program are required to have the same type and type parameters."


    021: ORDERED and definition of static/dynamic extents

  • Date: July 7, 1998
  • Status: Approved

    Question:

    In section 1.2, there are definitions of static extent and dynamic extent...

    "The statements enclosed lexically within a construct define the static extent. The dynamic extent further includes the routines called from within the construct."

    I understood that this statement means that dynamic extent includes static extent. And, in section 2.5.6, page 22, line 1, a following statement appears...

    "An ORDERED directive can appear only in dynamic extent of DO or PARALLEL DO directive."

    If dynamic extent includes static extent as I wrote above, I think that an ORDERED directive can also appear in static extent and this statement have no mean. Otherwise, dose this statement mean that an ORDERED directive can appear only in the routines called from within the construct ? If so, what is the reason of this restriction ?

    Approved Response:

    You are correct in assuming that the dynamic extent of a construct includes the static extent as well. So, an ORDERED directive may appear in the static extent of a DO or PARALLEL DO, or within procedures called from such an extent.

    In the statement: "An ORDERED directive can appear only in the dynamic extent of a DO or PARALLEL DO directive", the word "only" refers to directives "DO" and "PARALLEL DO", not to "dynamic extent" specifically.

    In other words, the statement exists to restrict the use of ORDERED sections within parallel do loops. For example, the following is illegal:

    !$omp parallel . . . !$omp ordered . . . !$omp end ordered !$omp end parallel

    Edits:

    None.


    022: Serialization of nested parallel regions

  • Date: July 10, 1998
  • Status: Approved

    Question:

    Last year we posted a question about the following example, and whether it was legal:

    program p integer :: i, j, tmp, a(10, 10) !$omp parallel do private(j) do i = 1, 10 !$omp parallel do private(tmp) do j = 1, 10 tmp = i*j a(i,j) = tmp end do end do end program p

    The issue has to do with whether it is legal for an implementation, when serializing an inner parallel construct, to "ignore" the private clause for tmp, and whether the user should in fact ensure that tmp is private on the outer loop as well.

    We were left with the impression that the user was responsible for ensuring "tmp" was made explicitly private on both loops. Is this correct?

    Approved Response:

    The OpenMP committee intended that PRIVATE clause semantics be such that an implementation would be permitted (but not required) to re-use the global storage for a PRIVATE variable on one of the threads in the team executing a region, for efficiency reasons. This situation comes up when a parallel region has been serialized. In the example above, when serializing the inner region, an implementation is permitted to re-use the storage for "tmp" from the outer region as the storage for "tmp" in the inner region. "tmp" is considered shared in the outer region. Given this, there are data races on "tmp" with respect to the outer region since it was not declared private on that region. So, in the example above, the user must mark "tmp" as private on the outer region.

    Edits:

    Add the following sentences to the first paragraph of Section 2.6.2 "Data scope attribute clauses":

    "Scope attribute clauses which appear on a PARALLEL directive indicate how the specified variables are to be treated with respect to the parallel region associated with the PARALLEL directive. They do not indicate the scope attributes of these variables for any enclosing parallel regions, if they exist."
    "In determining the appropriate scope attribute for a variable used in the lexical extent of a parallel region, all references and definitions of the variable must be considered, including references and definitions which occur in any nested parallel regions."

    Replace the first bullet in section 2.6.2.1 "PRIVATE clause" with:

    "A new object of the same type is declared once for each thread in the team. One thread in the team is permitted, but not required, to re-use the existing storage as the storage for the new object. For all other threads, new storage is created for the new object."

    See interpretation 026 for additional information about storage association of PRIVATE variables.


    023: Statement functions and data attribute clauses

  • Date: July 17, 1998
  • Status: Approved

    Question:

    The OpenMP specification isn't clear about how statement functions should be treated with respect to data scope attribute clauses. For the following example, does the PRIVATE(J) clause affect the reference to J in IFUNC?

    integer :: arr(10), j = 17 ifunc() = j !$omp parallel do, private(j) do i = 1, 10 arr(i) = ifunc() end do print *, arr end

    Different implementation strategies for statement functions and PRIVATE could lead to different expectations about whether this is valid.

    Namelist and variable format expressions pose similar issues.

    Approved Response:

    The Fortran committee has decided that the example above is invalid and has undefined results. This may be revisited in a future specification.

    Edits:

    In section 2.6.3, "Data environment rules", add the following bullet:

    "Variables which appear in namelist statements, variable format expressions and in expressions for statement function definitions should not be specified in PRIVATE, FIRSTPRIVATE or LASTPRIVATE clauses."

    024: Worksharing constructs and branches

  • Date: July 25, 1998
  • Status: Approved

    Question:

    When some members of the team encounter a work-sharing construct by a branch ( for example IF statement ), is the execution of the enclosed code region divided among these threads that encounter a work-sharing construct? Or, is it divided among all threads of the team?

    Approved Response:

    In Section 2.3, the first bullet states:

    "Work-sharing constructs and BARRIER directives must be encountered by all threads in a team, or by none at all". So, the situation described above is considered "invalid" or "non-conforming". The results of such a program are undefined.

    Edits:

    None.


    025: Data attribute clauses

  • Date: July 30, 1998
  • Status: Approved

    Question:

    Regarding point 8 of section 2.6.3 on p30 :

    1. What is the scope of this restriction? the same directive, the same directive nest, a routine, a program unit, a program or what?

      e.g. is the following program legal?

      common /c/ x,y !$omp parallel private (/c/) ... !$omp end parallel ... !$omp parallel shared (x,y) ... !$omp end parallel
    2. The restriction refers to constituent elements and does not seem to apply to the following very similar program :
      common /c/ x,y !$omp parallel private (/c/) ... !$omp end parallel ... !$omp parallel shared (/c/) ... !$omp end parallel

      Is this legal or not? It seems to me it should have the same legality as the case above.

    Approved Response:

    1. The restriction applies to clauses on the same directive, for a particular construct.

      The restriction should read: "When a named common block is specified in a PRIVATE, FIRSTPRIVATE or LASTPRIVATE clause of a directive, none of its constituent elements may be declared in another scope attribute clause in that directive".

      So the example above is valid, as is the following example:

      common /c/ x,y !$omp parallel ... !$omp do private(/c/) ... !$omp end do !$omp do private(x) ... !$omp end do !$omp end parallel

      Here is an invalid example:

      !$omp parallel private(/c/), shared(x) ... !$omp end parallel
    2. Your second example is also valid. However, the following is invalid:
      !$omp parallel private(/c/), shared(/c/) ... !$omp end parallel

      This should be covered by bullet 10, but isn't quite. Bullet 10 ought to read:

      "Clauses can be repeated as needed, but each variable and each named common block can appear explicitly in only one clause per directive ..."

    Edits:

    Change the 1st sentence of point 8 of 2.6.3 on p30 to:

    "When a named common block is specified in a PRIVATE, FIRSTPRIVATE or LASTPRIVATE clause of a directive, none of its constituent elements may be declared in another scope attribute clause in that directive".

    Change the 1st sentence of point 10 of 2.6.3 on p30 to:

    "Clauses can be repeated as needed, but each variable and each named common block can appear explicitly in only one clause per directive ..."

    026: Storage association

  • Date: August 20, 1998
  • Status: Approved

    Question:

    What is the rule about aliased or overlapping variables ?

    For example, a F77 program has an equivalence as follows:

    integer a(100), b(100) equivalence (a(51), b(1)) !$OMP PARALLEL DO DEFAULT(PRIVATE) PRIVATE(i,j) & !$OMP& LASTPRIVATE(a) DO i=1,100 DO j=1,100 b(j) = j - 1 ENDDO DO j=1,100 a(j) = j ENDDO DO j=1,50 b(j) = b(j) + 1 ENDDO ENDDO !$OMP END PARALLEL DO print *, b end

    For i in [1, 50], is b(i) equals 51 + i or i ?

    Approved Response:

    a and b are not associated inside the parallel region. The association only holds outside of the parallel region. The results of this program are undefined. See the edits below for additional information and examples.

    Edits:

    Replace bullet 4 in section 2.6.2.1 "PRIVATE clause" with the following:

    "A variable declared as PRIVATE may be storage-associated with other variables when the PRIVATE clause is encountered. Storage association may exist because of constructs such as EQUIVALENCE, COMMON, etc. If a is a variable appearing in a PRIVATE clause and b is a variable which was storage-associated with a, then:

    1. The contents, allocation and association status of b are undefined on entry to the parallel construct.
    2. Any definition of a, or of its allocation or association status causes the contents, allocation and association status of b to become undefined.
    3. Any definition of b, or of its allocation or association status causes the contents, allocation and association status of a to become undefined."

    Add the following invalid examples to the Appendix:

    Example 1:

    common /block/ x x = 1.0 !$omp parallel private (x) x = 2.0 call sub() ... !$omp end parallel ... subroutine sub() common /block/ x ... print *,x ! "x" is undefined. The result of the ! print is undefined. ... end subroutine sub

    Example 2:

    common /block/ x x = 1.0 !$omp parallel private (x) x = 2.0 call sub() ... !$omp end parallel ... contains subroutine sub() common /block/ y ... print *,x ! "x" is undefined. print *,y ! "y" is undefined. ... end subroutine end program

    Example 3:

    equivalence (x,y) x = 1.0 !$omp parallel private(x) ... print *,y ! "y" is undefined y = 10 print *,x ! "x" is undefined !$omp end parallel

    Example 4:

    integer a(100), b(100) equivalence (a(51), b(1)) !$omp parallel do default(private) private(i,j) lastprivate(a) do i=1,100 do j=1,100 b(j) = j - 1 enddo do j=1,100 a(j) = j ! "b" becomes undefined at this point enddo do j=1,50 b(j) = b(j) + 1 ! reference to "b" is not defined. "a" ! becomes undefined at this point. enddo enddo !$omp end parallel do ! The LASTPRIVATE write for "a" has ! undefined results. print *, b ! "b" is undefined since the LASTPRIVATE ! write of "a" was not defined. end

    027: Locks and REDUCTION

  • Date: August 24, 1998
  • Status: Approved

    Question:

    We would like your views on the following examples.

    Example 1:

    integer :: x x = 0; !$omp parallel . . !$omp parallel reduction (+:x) x = x + 1 . . . !$omp end parallel !$omp critical x = x + 2 !$omp end critical !$omp end parallel

    We assume this example is invalid for at least 1 reason: the statement "x = x + 1" is not protected with respect to the outer parallel region.

    Now, suppose a user modified the example as follows (not that they *would* of course - this is just to illustrate a point):

    Example 2:

    integer :: x x = 0; !$omp parallel . . !$omp parallel reduction (+:x) !$omp critical x = x + 1 !$omp end critical . . . !$omp end parallel !$omp critical x = x + 2 !$omp end critical !$omp end parallel

    This code would *appear* to adhere to the letter of the specification, since "x = x + 1" is now protected for the outer parallel region. However, the "global reduction" that the compiler inserts at the end of the nested parallel region may not be protected by the same lock that is being used for the second critical section. We assume this example is invalid for this reason. Is this correct? If so, then we believe an implementation is allowed to use a distinct locking mechanism for each global reduction. Now, is the compiler *required* to use a different lock for each global reduction? Consider the following example. "s" is a procedure which may be invoked from both serial and parallel parts of the program:

    Example 3:

    subroutine s(x) !$omp parallel reduction(+:x) x = x + 1 !$omp end parallel end subroutine s program main . . !$omp parallel !$omp critical call s(x) !$omp end critical . . . !$omp end parallel end program

    If the same lock is used for the "critical" and the "reduction", a deadlock could occur.

    Approved Response:

    1. The analysis for Example 1 is correct. The example is invalid because the statement "x = x + 1" is not protected (ie. synchronized) with respect to the outer parallel region.
    2. Example 2 is also invalid. The user is required to put the nested parallel region within a critical section to make the example valid.
    3. Example 3 is valid. Implementations need to be able to handle such code and not cause a deadlock.

    Edits:

    Add the following statement to Section 2.6.3, "Data environment rules":

    "The shared variables that are specified in REDUCTION or LASTPRIVATE clauses become defined at the end of the construct. Any concurrent uses or definitions of those variables must be synchronized with the definition that occurs at the end of the construct to avoid race conditions."

    028: Named constants

  • Date: Unknown
  • Status: Approved

    Question:

    Is it permissible to use a parameter (fortran) in a SHARED clause? I tried the following directive with the parameter "ndim" using SGI F90 compiler version 7.2.1

    !$OMP PARALLEL DO & !$OMP SHARED( ndim, a ) & !$OMP PRIVATE( i, j ) DO j = 1, ndim DO i = 1, ndim a(i,j) = 0.73*real(i) + 0.15*real(j) END DO END DO

    and I get the following compilation error:

    !$OMP SHARED( ndim, a ) &

    ^ f90-1473 mfef90: ERROR TRANSPOSE, File = transpose.f90, Line = 96, Column = 15 Object NDIM must be a variable to be in the SHARED clause of the PARALLEL DO directive.

    If parameters are not permitted in SHARED clauses, what it the reason for this restriction?

    Approved Response:

    A Fortran parameter isn't a variable, it's more like a macro in C:

    #define ndim 3

    In OpenMP you can only declare the scope of something that's a variable, because variables have storage associated with them. Parameters don't necessarily have any storage associated with them.

    Edits:

    None.


    029: DEFAULT(NONE)

  • Date: Nov. 1998
  • Status: Approved

    Question:

    There seems to be a need for clarification of the OpenMP rules about DEFAULT(NONE). In particular, is the specification in an enclosed data scope attribute clause sufficient to satisfy the requirement for explicit attribute specification (as described in section 2.6.2.3 of the Fortran API)?

    An additional question is whether a loop index variable is implicitly typed PRIVATE when there is an IMPLICIT(NONE) in effect. (Item 1 in section 2.6.3 only covers the case where the loop control variable would be SHARED by default, not where it would be NONE.)

    For example, is the following program legal? (This one test case covers both of the sub-issues mentioned above.)

    program test common /com1/ y, z(1000) !$omp parallel default(none) shared(z) ! Question: does i need to be specified as private and/or ! y need to be specified as shared in the region? !$omp do firstprivate(y) do i=0,9 z(i) = y enddo !$omp end parallel end

    The API states in 2.6.2.3 that:

    "Specifying DEFAULT(NONE) declares that there is no implicit default as to whether variables are PRIVATE or SHARED. In this case, the PRIVATE, SHARED, FIRSTPRIVATE, LASTPRIVATE, or REDUCTION attribute of each variable used in the lexical extent of the parallel region must be specified."

    The trouble is, the API doesn't seem to say where it must be specified. On the PARALLEL directive, or is it enough to specify it on any enclosed work-sharing construct(s) that encompass all uses? Different implementations are interpreting this rule differently, resulting in user confusion. So there seems to be a need for an official interpretation.

    Note that the C/C++ API did clean this up is section 2.7.2.5 of that API by saying that variables could be "specified in an enclosed data scope attribute clause, or used as a loop control variable referenced only in a corresponding for or parallel for."

    I believe that this is a case where the Fortran API needs to catch up with the C/C++ API. I would like to see this done as an interpretation (rather than wait for the next revision) so as to clear up the confusion as quickly as possible.

    Approved Response:

    It is enough that the data scope attribute be specified for the variable on work-sharing constructs which encompass all uses of the variable. See the edits below.

    Edits:

    On pg. 24, Section 2.6.2.3 DEFAULT Clause, replace the bullet describing DEFAULT(NONE) with the following bullet:

    "Specifying DEFAULT(NONE) requires that each variable used in the lexical extent of the parallel region be explicitly listed in a data scope attribute clause on the parallel region, unless it is:
    • THREADPRIVATE, or,
    • a Cray pointee, or
    • a loop iteration variable used only as a loop iteration variable for sequential loops in the lexical extent of the region or parallel DO loops which bind to the region, or,
    • only used in work-sharing constructs that bind to the region, and is specified in a data scope attribute clause for each such construct."

    030: SHARED array section actual arguments

  • Date: Unknown
  • Status: Approved

    Question:

    There is an issue we need to deal with in the OpenMP directives regarding F90 style (dope vector based) arrays which are declared shared which are passed as actual arguments to procedures from the parallel region declaring them shared. The problem involves F90 pointers, assumed shape arrays, array sections of a shared array, and possibly other cases which may occur in some implementations.

    We have a common problem here in that we have maintained calling sequence compatability with F77 compilers when F90 routines call routines with unknown interfaces by performing copy in/out at call sites in cases where interfaces are not known by the caller. Similarly, copy in/out may be performed where interfaces are known and the above mentioned array types are associated with explicit shape dummy arguments (contiguous space). If an array is shared, but passed as an argument to an routine which modifies the array, copy in/out semantics provide limited success if more than one thread is updating the same array at the same time (everyone modifies their copy, no one elses).

    I believe there are number of ways to deal with this issue. We could:

    1. prohibit passing assumed shape arrays, pointers to non-pointers, sections of shared arrays
    2. allow them to be passed only as intent in arguments (read only)
    3. add restrictions on these cases stating when it is legal to pass such arguments

    Approved Response:

    Restrictions on the uses of such actual arguments will be introduced. See the edits below for additional information.

    Edits:

    Add the following bullet to section 2.6.3, "Data environment rules":

    "If a SHARED variable, subobject of a SHARED variable, or an object associated with a SHARED variable or subobject of a SHARED variable appears as an actual argument in a reference to a non-intrinsic procedure, and the actual argument is an array section with a vector subscript, or the actual argument is an array section, an assumed-shape array or a pointer array, and the associated dummy argument is an explicit-shape or assumed-size array, any references to or definitions of the shared storage that is associated with the dummy argument by any other thread must be synchronized with the procedure reference, to avoid possible race conditions.

    The situations described above may result in the value of the shared variable being copied into temporary storage before the procedure reference, and back out of the temporary storage into the actual argument storage after the procedure reference, effectively resulting in references to and definitions of the storage during the procedure reference."


    031: F90 array syntax

  • Date: Dec. 16, 1998
  • Status: Approved

    Question:

    For an F90 code like this:

    A(:,:) = B(:,:) + C(:,:)

    To use OpenMP on this, do I need to use explicit indexing as in F77? i.e. convert the code back to f77:

    Do j = 1, M Do i = 1,N A(i,j) = B(i,j) + C(i,j) enddo enddo
    and insert !$omp parallel do on this? Or, are there other ways to handle f90 array operations in OpenMP without giving it back to f77 syntax?

    Approved Response:

    You are correct, to use OpenMP on f90 array syntax, you are responsible for adding in the loop that you want parallelized.

    Some OpenMP compilers may offer the option of automatically parallelizing such loops as a service to the user, but the OpenMP standard does not require this parallelization.

    Edits:

    None.


    032: Nested critical sections and data scope attributes

  • Date: Jan. 5, 1999
  • Status: Approved

    Question:

    I have several questions about the interpretation of OPENMP features :

    1. May a critical section contain (in the lexical extent or in the dynamic extent) a critical section with the same name? With another name? IF YES, how is the included critical section interpreted?

    2. Let a subroutine be called from a parallel region. This subroutine contains a parallel DO (specified with the !$OMP DO clause).

      1. According to point 7 from 2.6.3 section, the local variables of the subroutine are PRIVATE. How can I make some of these variables SHARED? Do I have to create a new COMMON block that contains them?
      2. Is SAVEd data declared in this subroutine SHARED by default as COMMON is?
      3. Are data initialized with the DATA statement SHARED by default?

    Approved Response:

    1. If a named critical section contains (either lexically or dynamically) another named critical section with the same name, the user has created a deadlock situation which makes his program an illegal OMP program. If the contained critical section has a different name, there is still a possibility of deadlock from another piece of code with the names in the opposite order. It is the user's responsibility to avoid deadlocks.

      The set of OMP routines named OMP_*_LOCK can be used to explicitly manipulate locks to create and control nested critical regions

    2. A local variable that has the SAVE attribute {which includes DATA initialized variables} and variables in COMMON are SHARED by default.

    Edits:

    See interpretation 003 for edits to the specification.


    033: FIRSTPRIVATE semantics

  • Date: Jan. 18, 1999
  • Status: Response Pending.

    This is a placeholder for an interpretation currently under review by the OpenMP ARB.


    034: Conditional Compilation

  • Date: Jan. 21, 1999
  • Status: Approved

    Question:

    Section 2.1.2 (conditional compilation) states:

    "The sentinel must be followed by a legal Fortran statement on the same line."

    Does "a Fortran statement" mean "one Fortran statement"?

    Is the following invalid?

    !$ id = omp_get_thread_num() ; print *, id

    Approved Response:

    The Fortran committee intended multiple statements to be permitted on a conditional compilation line. Also, although the text refers to statements, the committee also intended the following to be permitted as conditional:

    The specification will be changed to reflect this.

    Edits:

    Section 2.1.2:

    1st paragraph: Change the first two sentences to:

    "The OpenMP Fortran API permits Fortran lines to be compiled conditionally. The directive sentinels for conditional compilation that are accepted by an OpenMP-compliant compiler depend on the Fortran source form being used."

    2nd paragraph: Remove the 1st sentence:

    "The sentinel must be followed by a legal Fortran statement on the same line."

    2nd paragraph: Change 2nd sentence to:

    "During OpenMP compilation, the sentinel is replaced by two spaces, and the rest of the line is treated as a normal Fortran line."

    035: Parallel intrinsics and array notation

  • Date: Jan. 26, 1999
  • Status: Approved

    Question:

    Are fortran 90 intrinsics guaranteed to be parallel within a parallel section? How about computation expressed in array notation. Would that be also parallel?

    for example:

    c$omp parallel C = matmul(A,B) + C d = dot_product( e(:), f(:) ) aijmax = maxval( A, dim=1) c$omp end parallel

    I hope we don't need to rewrite code in explicit do-loops to get parallelism when the compiler already knows a lot in the array notation or the f90 intrinsic.

    Approved Response:

    The OpenMP Fortran specification, Version 1, does not require that intrinsics or array language be automatically parallelized, nor does it provide facilities for specifying such parallelism.

    The focus of Version 1 was on the existing practice of parallelizing loops. Parallelizing Fortran 90 intrinsics and array notation is one of the recommendations for inclusion in version 2 of OpenMP Fortran.

    Edits:

    None.