Abbreviated Contents

Detailed Contents

This FAQ Document

Document.01
What is this document for?

This document answers Frequently Asked Questions and provides background information about the OpenMP Architecture Review Board (ARB) and the OpenMP API.
The OpenMP ARB hopes that this material will help you understand the advantages of using the OpenMP API and of joining the OpenMP ARB.

Document.02
How is this document organized?

This document is organized as a series of questions and answers. It is structured in two sections: one with questions concerning the OpenMP
ARB, and one with questions concerning the OpenMP API.

Document.03
Who wrote this document?

This document is a major update of an earlier FAQ document. Contributors include:

  • Matthijs van Waveren, marketing coordinator of the OpenMP ARB;
  • Members of the OpenMP Marketing Committee: Michael Wong (IBM), Jay Hoeflinger (Intel), Andy Fritsch (TI),
    Kathleen Mattson (Miller & Mattson) and Richard Friedman.
  • and others who reviewed this document or had the wherewithal to ask the questions in the first place.

Document.04
Where is the latest version of this document?

The latest version of this document may be found at
http://www.openmp.org/wp/openmp-faq/.

The OpenMP ARB Organization

OMPARB.01

What is the OpenMP ARB ?

The OpenMP ARB is the OpenMP Architecture Review Board. It is a
nonprofit organization set up to specify, manage, support, and promote the OpenMP API in the
supercomputing and other industries.

The strength of the OpenMP ARB comes from the diverse representation from across its member
companies, all working together to ensure that the OpenMP API continues to grow and provide
the stable basis for computing that it has provided for more than 15 years.
Any organization providing products or services which support or depend upon the OpenMP
API should consider becoming a member of the OpenMP ARB. Everyone is invited to participate,
regardless of means and experience.

OMPARB.02
How old is the OpenMP ARB ?

The OpenMP ARB was founded in 1997. It celebrated its 15th birthday at the SC12 conference in Salt Lake City.

OMPARB.03
Who are the members of the OpenMP ARB ?

The ARB is composed of permanent and auxiliary members. Permanent members are vendors who have a long-term interest in creating products for OpenMP. Auxiliary members are normally organizations with an interest in the standard but that do not create or sell OpenMP products. The list of permanent and auxiliary members can be found here.

OMPARB.04
What are the benefits of joining the OpenMP ARB?

One of the best reasons for joining the OpenMP ARB is the OpenMP API itself-an important industry standard with a commitment to innovate within the industry. Members of the OpenMP ARB enjoy the following benefits:

  • Developer resources. While the general public has access to the OpenMP Forum for help with using the API, members have
    additional insider-access to the framers of the specification through a members-only wiki, email lists, and technical meetings.
  • Shape the future of the API. Members work together to shape the API specification, keeping it relevant in the changing
    technological landscape. Not only does this make for a better API for the community, it means you can safeguard and develop
    the aspects of the API on which your implementation depends.
  • Access to the API roadmap. Members have unique access to the roadmap of the API, allowing them to stay a step ahead of the
    competition in integrating OpenMP API support in their own implementations.
  • Co-marketing opportunities. The OpenMP ARB participates in events throughout the world, and members are invited to provide
    literature and demos of their OpenMP API implementations, and participate in userfacing presentations.
  • Use the OpenMP logo. Members may use the OpenMP trademark and logo to promote their membership in the organization. They may use the OpenMP trademark
    to promote their OpenMP API-supported products and services.
  • Network in the OpenMP community. The OpenMP ARB consists of many individuals from more than two dozen influential
    commercial and research organizations-join this active community.

OMPARB.05
Which subcommittees are there in the OpenMP ARB ?

The OpenMP ARB today has the following subcommittees:

  • Accelerator subcommittee. This subcommittee deals with the development of mechanisms to describe regions of code where data and/or computation should be moved to another computing device.
  • Error Model subcommittee. This subcommittee defines error handling capabilities to improve the resiliency and stability of OpenMP applications in the presence of system-level, runtime-level, and user-defined errors. Features to abort parallel OpenMP execution cleanly have been defined, based on conditional cancellation and user-defined cancellation points.
  • Task subcommittee. This subcommittee with the tasking model in the OpenMP API.
  • Tools subcommittee. This subcommittee deals with the tools used around OpenMP.
  • Affinity subcommittee. This committee deals with the control of OpenMP thread affinity.
  • Fortran 2003 subcommittee. This committee deals with the support of Fortran 2003 features.

OMPARB.06
What is IWOMP ?

The International Workshop on OpenMP (IWOMP) is an annual workshop dedicated to the promotion and advancement of all aspects of parallel programming with OpenMP. It is the premier forum to present and discuss issues, trends, recent research ideas and results related to parallel programming with OpenMP. The international workshop affords an opportunity for OpenMP users as well as developers to come together for discussions and sharing new ideas and information on this topic.

The location of the workshop rotates between the USA, Europe and Asia. IWOMP is held annually in September. This means that you can expect the Call for Proposals in February and the paper acceptance in April. The website of the most recent IWOMP can be viewed at www.iwomp.org.

OMPARB.07
What is the cost of membership ?

For membership in the OpenMP ARB, your organization pays a one-time initiation fee of
US$5,000, plus an annual fee of US$3,000. Dues are on a calendar-year basis and are not prorated.

In addition to this financial cost, Language Committee members are encouraged to attend
weekly Language Committee teleconferences and optional subgroup teleconferences, travel
to three face-to-face meetings per year, prepare presentations, collaborate, and contribute work
to further the group’s technical documents.

The face-to-face meeting venues are chosen based on the location of OpenMP members,
users, researchers, and communities. As OpenMP is a global community, these meetings
are distributed globally: at this time one is in Europe and another is in North America. As the
global membership increases, we expect to host meetings in Asia, Oceania, and South America.

When you join as a member, you can invite OpenMP to your location and host a meeting of
the OpenMP experts at a face-to-face meeting, while demonstrating your product, facility, and
local hospitality.

OMPARB.08
How do I join the OpenMP ARB?

Contact the OpenMP ARB at the e-mail address info@openmp.org
to obtain a membership form.

OMPARB.09
How do I contact the OpenMP ARB for more information ?

The OpenMP ARB can be contacted in several ways. For general information, including other means of contacting the ARB,
please see OpenMP’s Web Site at:

http://www.openmp.org/

General questions can be emailed to:
info@openmp.org

Technical Questions can be posted on the OpenMP Forum. You need to register on this Forum.
The Forum has the following discussions:

  • Using OpenMP
  • OpenMP 4.0 API Specification
  • OpenMP 3.1 API Specification
  • Using OpenMP – The Book and Examples

OMPARB.10
Where do I find more information on the OpenMP ARB ?

The website http://www.openmp.org/ contains information on membership, specifications, books, tutorials, compilers, users, and more.

The OpenMP API

General

OMPAPI.General.01
What is OpenMP ?

OpenMP is a specification for a set of compiler directives, library routines, and
environment variables that can be used to specify high-level parallelism
in Fortran and C/C++ programs.

OMPAPI.General.02
What problem does OpenMP solve ?

In order to parallelize a code, programmers look for regions of code whose
instructions can be shared among the processors. Much of the time, they focus
on distributing the work in loop nests to the processors.
In most programs, code executed on one processor require results that have been
calculated on another one. In principle, this is not a problem because a value
produced by one processor can be stored in main memory and retrieved from
there by code running on other processors as needed. However, the programmer
needs to ensure that the value is retrieved after it has been produced, that is,
that the accesses occur in the required order. Since the processors operate
independently of one another, this is a nontrivial difficulty: their clocks are not
synchronized, and they can and do execute their portions of the code at slightly
different speeds.

To solve this problem, the vendors of SMPs in the 1980s provided special notation to specify
how the work of a program was to be parceled out to the individual processors of
an SMP, as well as to enforce an ordering of accesses by different threads to shared
data. The notation mainly took the form of special instructions, or directives, that
could be added to programs written in sequential languages, especially Fortran.
The compiler used this information to create the actual code for execution by each
processor. Although this strategy worked, it had the obvious deficiency that a
program written for one SMP did not necessarily execute on another one.

In order to solve this problem of non-standardization, OpenMP was defined by the OpenMP ARB, a group of vendors who joined forces during the
latter half of the 1990s to provide a common means for programming a broad
range of SMP architectures. The first version, consisting of a set of directives that could be used with Fortran, was
introduced to the public in late 1997.

(Quote from Using OpenMP: Portable Shared Memory Parallel Programming, by Barbara Chapman,
Gabriele Jost and Ruud van der Pas, MIT Press, Cambridge, MA, 2008)

With the following versions, OpenMP was extended so that also DSP systems and accelerators
could be programmed.

OMPAPI.General.03
Why should I use OpenMP ?

There are several reasons why you should use OpenMP: (a) OpenMP is the most widely standard for SMP systems,
it supports 3 different languages (Fortran, C, C++), and it has been implemented by many vendors. (b) OpenMP is a
relatively small and simple specification, and it supports incremental parallelism. (c) A lot of research is done
on OpenMP, keeping it up to date with the latest hardware developments.

OMPAPI.General.04
Which compilers support OpenMP ?

A number of compilers from various vendors or open source communities implement the OpenMP API. The full list of compilers can be found
here.

OMPAPI.General.05
Who uses OpenMP ?

The users of OpenMP are working in industry and academia, in fields varying from aeronautics, automotive, pharmaceutics to finance,
and on devices varying from accelerators, embedded multicore systems to high-end supercomputing systems.
cOMPunity is the community of OpenMP
researchers and developers in academia and industry. It is a forum for the dissemination
and exchange of information about OpenMP.

The OpenMP ARB has also started compiling a list of users of OpenMP.

OMPAPI.General.06
What languages does OpenMP support ?

OpenMP is designed for Fortran, C and C++. OpenMP can be supported by compilers that
support one of Fortran 77, Fortran 90, Fortran 95, Fortran 2003, ANSI 89 C or ANSI C++, but the OpenMP specification
does not introduce any constructs that require specific Fortran 90 or C++ features

OMPAPI.General.07
Is OpenMP scalable ?

OpenMP can deliver scalability for applications using shared-memory parallel programming. Significant effort was spent to ensure that
OpenMP can be used for scalable applications. However, ultimately, scalability is a property of the application and the algorithms used. The
parallel programming language can only support the scalability by providing constructs that simplify the specification of the
parallelism and can be implemented with low overhead by compiler vendors. OpenMP certainly delivers these kinds of constructs.

OMPAPI.General.08
Where can I get tutorials on using the API ?

You can find lists of tutorials on the OpenMP web site, and on the blog of Christian Terboven.

OMPAPI.General.09
Where do I find more information on using the API ?

There are several sources of information on using the API:

  • The book Using OpenMP provides an introduction to parallel programming and to OpenMP.
    It covers the language, the performance of OpenMP programs, common sources of errors, OpenMP implementation issues, and scalability via
    nested parallelism and combined OpenMP/MPI programs. The book has examples and a Forum associated with it.
  • Lawrence Livermore Lab pusblished an OpenMP tutorial.
  • Other resources can be found here.

    Contents of the API

    OMPAPI.Contents.01
    Can I use loop-level parallelism ?

    OpenMP fully supports loop-level parallelism. Loop-level parallelism is useful for applications
    which have lots of coarse loop-level parallelism, especially those that will never be run on large
    numbers of processors or for which restructuring the source code is either impractical or disallowed.
    Typically, though, the amount of loop-level parallelism in an application is limited, and this in turn
    limits the scalability of the application.

    OpenMP allows you to use loop-level parallelism as a way to start scaling your application for
    multiple processors, but then move into coarser grain parallelism, while maintaining the value of
    your earlier investment. This incremental development strategy avoids the all-or-none risks involved
    in moving to message-passing or other parallel programming models.

    OMPAPI.Contents.02
    Can I use nested parallelism ?

    If a thread in a team executing a parallel region
    encounters another parallel construct, it creates a new team and becomes the master of that team.
    This is generally referred to in OpenMP as nested parallelism, and it is supported by the OpenMP specification.
    Certain recursive algorithms can take advantage of nested parallelism in a natural way.

    (Quote from Using OpenMP: Portable Shared Memory Parallel Programming, by Barbara Chapman,
    Gabriele Jost and Ruud van der Pas, MIT Press, Cambridge, MA, 2008)

    OMPAPI.Contents.03
    Can I use task parallelism ?

    Parallelization strategies that parcel out pieces of work to different threads are generally
    referred to as task parallelism. This is supported by the OpenMP specification.

    (Quote from Using OpenMP: Portable Shared Memory Parallel Programming, by Barbara Chapman,
    Gabriele Jost and Ruud van der Pas, MIT Press, Cambridge, MA, 2008)

    OMPAPI.Contents.04
    Is it better to parallelize the outer loop ?

    In general it should be fastest to parallelize the outer loop only, provided there is sufficient
    parallelism to keep the threads busy and load balanced. Parallelizing the inner loop only adds an
    overhead for every parallel region encountered which (although dependent on the implementation and
    the number of threads) is typically of the order of tens of microseconds.
    Parallelizing both loops is unlikely to be the most efficient solution, except maybe in some corner cases.

    Sometimes, if the outer loop doesn’t have much parallelism but the inner loop does, there is the option to put the
    #pragma omp parallel outside the outer loop, and the #pragma omp for before the inner loop. This amortizes
    the overhead of creating the parallelism outside the outer loop, runs the outer loop ‘redundantly’ on all
    threads, and work-shares the iterations of the inner loop, which can be the better than parallelizing either
    the outer or the inner loop alone.

    OMPAPI.Contents.05
    Can I use OpenMP to program accelerators ?

    OpenMP provides mechanisms to describe regions of code where data and/or computation should be moved to another computing device. OpenMP supports a broad array of accelerators, and plans to continue adding new features based on user feedback.

    OMPAPI.Contents.06
    Can I use OpenMP to program SIMD units ?

    OpenMP offers an industry-first high-level support for vectors that allows you to parallelize vector computations. A loop can be transformed
    into a SIMD loop (that is, multiple iterations of the loop can be executed concurrently
    using SIMD vector instructions). It has specific constructs to specify that during execution of functions (C, C++ and Fortran) or
    subroutines (Fortran) specialized kernel functions are sent to the vector unit.

    OMPAPI.Contents.07
    What type of reductions are possible ?

    OpenMP provides the reduction clause for specifying some form of recurrence calculations so that they can be perfomed in parallel without code modification.
    In OpenMP 3.1, OpenMP supports reductions with base language operators and intrinsic procedures. With OpenMP 4.0, user-defined reductions are also supported.

    (The first sentence is a quote from Using OpenMP: Portable Shared Memory Parallel Programming, by Barbara Chapman,
    Gabriele Jost and Ruud van der Pas, MIT Press, Cambridge, MA, 2008)

    Variations Between OpenMP Releases

    OMPAPI.Versions.01
    How does Version 4.0 of OpenMP differ from Version 3.1 ?

    OpenMP 4.0 was released in July 2013, 2 years after the release of OpenMP version 3.1. The new features included in OpenMP 4.0 are:

    • Support for accelerators. The OpenMP 4.0 specification effort included significant participation by all the major vendors in order to support a wide variety of compute devices. OpenMP provides mechanisms to describe regions of code where data and/or computation should be moved to another computing device. Several prototypes for the accelerator proposal have already been implemented.
    • SIMD constructs to vectorize both serial as well as parallelized loops. With the advent of SIMD units in all major processor chips, portable support for accessing them is essential. OpenMP 4.0 provides mechanisms to describe when multiple iterations of the loop can be executed concurrently using SIMD instructions and to describe how to create versions of functions that can be invoked across SIMD lanes.
    • Error handling. OpenMP 4.0 defines error handling capabilities to improve the resiliency and stability of OpenMP applications in the presence of system-level, runtime-level, and user-defined errors. Features to abort parallel OpenMP execution cleanly have been defined, based on conditional cancellation and user-defined cancellation points.
    • Thread affinity. OpenMP 4.0 provides mechanisms to define where to execute OpenMP threads. Platform-specific data and algorithm-specific properties are separated, offering a deterministic behavior and simplicity in use. The advantages for the user are better locality, less false sharing and more memory bandwidth.
    • Tasking extensions. OpenMP 4.0 provides several extensions to its task-based parallelism support. Tasks can be grouped to support deep task synchronization and task groups can be aborted to reflect completion of cooperative tasking activities such as search. Task-to-task synchronization is now supported through the specification of task dependency.
    • Support for Fortran 2003. The Fortran 2003 standard adds many modern computer language features. Having these features in the specification allows users to parallelize Fortran 2003 compliant programs. This includes interoperability of Fortran and C, which is one of the most popular features in Fortran 2003.
    • User-defined reductions. Previously, OpenMP only supported reductions with base language operators and intrinsic procedures. With OpenMP 4.0, user-defined reductions are now also supported.
    • Sequentially consistent atomics. A clause has been added to allow a programmer to enforce sequential consistency when a specific storage location is accessed atomically.

    OMPAPI.Versions.02
    How did Version 3.1 of OpenMP differ from Version 3.0 ?

    OpenMP version 3.1 was released in July 2011, 3 years after the release of OpenMP version 3.0.
    The major changes between these versions are:

    • The final and mergeable clauses were added to
      the task construct to support optimization of task data environments.
    • The taskyield construct was added to allow userdefined
      task switching points.
    • The atomic construct was extended to include
      read, write, and capture forms, and an update clause was added to apply
      the already existing form of the atomic construct.
    • Data environment restrictions were changed to allow intent(in) and constqualified
      types for the firstprivate clause.
    • Data environment restrictions were changed to allow Fortran pointers in
      firstprivate and lastprivate.
    • New reduction operators min and max were added for C and C++.
    • The nesting restrictions in Section 2.10 on page 111 were clarified to disallow
      closely-nested OpenMP regions within an atomic region. This allows an atomic
      region to be consistently defined with other OpenMP regions so that they include all
      the code in the atomic construct.
    • The omp_in_final runtime library routine was
      added to support specialization of final task regions.

    OMPAPI.Versions.03
    How did Version 3.0 of OpenMP differ from Version 2.5 ?

    OpenMP version 3.0 was released in May 2008, 3 years after the release of OpenMP version 2.5.
    The major changes between these versions are:

    • The concept of tasks has been added to the OpenMP execution model.
    • The task construct has been added, which provides a
      mechanism for creating tasks explicitly.
    • The taskwait construct has been added, which
      causes a task to wait for all its child tasks to complete.
    • The OpenMP memory model now covers atomicity of memory accesses.
      The description of the behavior of volatile in terms of
      flush was removed.
    • In Version 2.5, there was a single copy of of the nest-var, dyn-var, nthreads-var and
      run-sched-var internal control variables (ICVs) for the whole program. In Version
      3.0, there is one copy of these ICVs per task. As a result,
      the omp_set_num_threads, omp_set_nested and omp_set_dynamic
      runtime library routines now have specified effects when called from inside a
      parallel region.
    • The definition of active parallel region has been changed: in Version 3.0 a
      parallel region is active if it is executed by a team consisting of more than one
      thread.
    • The rules for determining the number of threads used in a parallel region have
      been modified.
    • In Version 3.0, the assignment of iterations to threads in a loop construct with a
      static schedule kind is deterministic.
    • In Version 3.0, a loop construct may be associated with more than one perfectly
      nested loop. The number of associated loops may be controlled by the collapse
      clause.
    • Random access iterators, and variables of unsigned integer type, may now be used as
      loop iterators in loops associated with a loop construct.
    • The schedule kind auto has been added, which gives the implementation the
      freedom to choose any possible mapping of iterations in a loop construct to threads in
      the team.
    • Fortran assumed-size arrays now have predetermined data-sharing attributes.
    • In Fortran, firstprivate is now permitted as an argument to the default
      clause.
    • For list items in the private clause, implementations are no longer permitted to use
      the storage of the original list item to hold the new list item on the master thread. If
      no attempt is made to reference the original list item inside the parallel region, its
      value is well defined on exit from the parallel region.
    • In Version 3.0, Fortran allocatable arrays may appear in private,
      firstprivate, lastprivate, reduction, copyin and copyprivate
      clauses.
    • In Version 3.0, static class members variables may appear in a threadprivate
      directive.
    • Version 3.0 makes clear where, and with which arguments, constructors and
      destructors of private and threadprivate class type variables are called.
    • The runtime library routines omp_set_schedule and omp_get_schedule
      have been added; these routines respectively set and retrieve the value of the
      run_sched_var ICV.
    • The thread-limit-var ICV has been added, which controls the maximum number of
      threads participating in the OpenMP program. The value of this ICV can be set with
      the OMP_THREAD_LIMIT environment variable and retrieved with the
      omp_get_thread_limit runtime library routine.
    • The max-active-levels-var ICV has been added, which controls the number of nested
      active parallel regions. The value of this ICV can be set with the
      OMP_MAX_ACTIVE_LEVELS environment variable and the
      omp_set_max_active_levels runtime library routine, and it can be retrieved
      with the omp_get_max_active_levels runtime library routine.
    • The stacksize-var ICV has been added, which controls the stack size for threads that
      the OpenMP implementation creates. The value of this ICV can be set with the
      OMP_STACKSIZE environment variable.
    • The wait-policy-var ICV has been added, which controls the desired behavior of
      waiting threads. The value of this ICV can be set with the OMP_WAIT_POLICY
      environment variable.
    • The omp_get_level runtime library routine has been added, which returns the
      number of nested parallel regions enclosing the task that contains the call.
    • The omp_get_ancestor_thread_num runtime library routine has been added,
      which returns, for a given nested level of the current thread, the thread number of the
      ancestor.
    • The omp_get_team_size runtime library routine has been added, which returns,
      for a given nested level of the current thread, the size of the thread team to which the
      ancestor belongs.
    • The omp_get_active_level runtime library routine has been added, which
      returns the number of nested, active parallel regions enclosing the task that
      contains the call.
    • In Version 3.0, locks are owned by tasks, not by threads.

    Relation to other standards

    OMPAPI.Relatives.01
    How does OpenMP relate to OpenACC ?

    OpenMP and OpenACC are actively merging their specification while continuing to evolve. A first step at merging
    has been made with the release of OpenMP 4.0. OpenACC implementations can be considered
    to be a beta test of the OpenMP accelerator specification. They give early implementation experience.

    OpenACC has been created and implemented by several members of the OpenMP ARB in order to
    address their immediate customer needs. These members are NVIDIA, PGI, Cray, and CAPS.

    OMPAPI.Relatives.02
    How does OpenMP compare with MPI ?

    Message-passing has become accepted as a portable style of parallel programming,
    but has several significant weaknesses that limit its effectiveness and scalability.
    Message-passing in general is difficult to program and doesn’t support incremental
    parallelization of an existing sequential program. Message-passing was initially
    defined for client/server applications running across a network, and so includes costly
    semantics (including message queuing and selection and the assumption of wholly separate
    memories) that are often not required by tightly-coded scientific applications running
    on modern scalable systems with globally addressable and cache coherent distributed memories.

    OMPAPI.Relatives.03
    How does OpenMP compare with Pthreads ?

    Pthreads have never been targeted toward the technical/HPC market. This is reflected in the
    minimal Fortran support, and its lack of support for data parallelism. Even for C applications,
    pthreads requires programming at a level lower than most technical developers would prefer.

    OMPAPI.Relatives.04
    How does OpenMP compare with MIPI ?

    The Mobile Industry Processor Interface (MIPI) is addressing a range of debug interface
    efforts for multi-core devices. However, its specifications are focused on mobile devices
    and not multi-core processors in general.

    (Quote from: Kenn R. Luecke (2012). Software Development for Parallel and Multi-Core Processing, Embedded Systems
    in High Performance Systems, Applications and Projects; Dr. Kiyofumi Tanaka (Ed.), ISBN: 978-953-51-0350-9,
    Available from Intech.)

     

    Version 2.0

    Last updated: November 12, 2013 MvW