(Click on an item above, to go to the detailed contents about that item.)
Document.01: What is this document for ?
Document.02: How is this document organized ?
Document.03: Who wrote this document ?
Document.04: Where is the latest version of this document ?
OMPARB.01: What is the OpenMP ARB ?
OMPARB.02: How old is the OpenMP ARB ?
OMPARB.03: Who are the members of the OpenMP ARB ?
OMPARB.04: What are the benefits of joining the OpenMP ARB ?
OMPARB.05: What is the cost of membership ?
OMPARB.06: How do I join the OpenMP ARB ?
OMPARB.07: How do I contact the OpenMP ARB for more information ?
OMPARB.08: Where do I find more information on the OpenMP ARB ?
OMPAPI.General.01: What is OpenMP ?
OMPAPI.General.02: What problem does OpenMP solve ?
OMPAPI.General.03: Why should I use OpenMP ?
OMPAPI.General.04: Which compilers support OpenMP ?
OMPAPI.General.05: Who uses OpenMP ?
OMPAPI.General.06: What languages does OpenMP support ?
OMPAPI.General.07: Is OpenMP scalable ?
OMPAPI.General.08: Where do I find more information on using the API ?
OMPAPI.Contents.01: Can I use loop-level parallelism ?
OMPAPI.Contents.02: Can I use nested parallelism ?
OMPAPI.Contents.03: Can I use task parallelism ?
OMPAPI.Contents.04: Is it better to parallelize the outer loop ?
Variations Between OpenMP Releases
OMPAPI.Versions.02: How does Version 3.1 of OpenMP differ from Version 3.0 ?
OMPAPI.Versions.03: How did Version 3.0 of OpenMP differ from Version 2.5 ?
OMPAPI.Relatives.01: How does OpenMP relate to OpenACC ?
OMPAPI.Relatives.02: How does OpenMP compare with MPI ?
OMPAPI.Relatives.03: How does OpenMP compare with Pthreads ?
OMPAPI.Relatives.04: How does OpenMP compare with MIPI ?
This document answers Frequently Asked Questions and provides background information about the OpenMP Architecture Review Board (ARB) and the OpenMP API. The OpenMP ARB hopes that this material will help you understand the advantages of using the OpenMP API and of joining the OpenMP ARB.
This document is organized as a series of questions and answers. It is structured in two sections: one with questions concerning the OpenMP ARB, and one with questions concerning the OpenMP API.
This document is a major update of an earlier FAQ document. Contributors include:
The latest version of this document may be found at http://www.openmp.org/openmp-faq.html.
The OpenMP ARB is the OpenMP Architecture Review Board. It is a nonprofit organization set up to specify, manage, support, and promote the OpenMP API in the supercomputing and other industries.
The strength of the OpenMP ARB comes from the diverse representation from across its member companies, all working together to ensure that the OpenMP API continues to grow and provide the stable basis for computing that it has provided for more than 15 years. Any organization providing products or services which support or depend upon the OpenMP API should consider becoming a member of the OpenMP ARB. Everyone is invited to participate, regardless of means and experience.
The OpenMP ARB was founded in 1997. It celebrated its 15th birthday at the SC12 conference in Salt Lake City.
The ARB is composed of permanent and auxiliary members. Permanent members are vendors who have a long-term interest in creating products for OpenMP. Auxiliary members are normally organizations with an interest in the standard but that do not create or sell OpenMP products. The list of permanent and auxiliary members can be found here.
One of the best reasons for joining the OpenMP ARB is the OpenMP API itself-an important industry standard with a commitment to innovate within the industry. Members of the OpenMP ARB enjoy the following benefits:
For membership in the OpenMP ARB, your organization pays a one-time initiation fee of US$5,000, plus an annual fee of US$3,000. Dues are on a calendar-year basis and are not prorated.
In addition to this financial cost, Language Committee members are encouraged to attend weekly Language Committee teleconferences and optional subgroup teleconferences, travel to three face-to-face meetings per year, prepare presentations, collaborate, and contribute work to further the group's technical documents.
The face-to-face meeting venues are chosen based on the location of OpenMP members, users, researchers, and communities. As OpenMP is a global community, these meetings are distributed globally: at this time one is in Europe and another is in North America. As the global membership increases, we expect to host meetings in Asia, Oceania, and South America.
When you join as a member, you can invite OpenMP to your location and host a meeting of the OpenMP experts at a face-to-face meeting, while demonstrating your product, facility, and local hospitality.
Contact the OpenMP ARB at the e-mail address info@openmp.org to obtain a membership form.
The OpenMP ARB can be contacted in several ways. For general information, including other means of contacting the ARB, please see OpenMP's Web Site at:
General questions can be emailed to:
info@openmp.org
Technical Questions can be posted on the OpenMP Forum. You need to register on this Forum.
The Forum has the following discussions:
The website http://www.openmp.org/ contains information on membership, specifications, books, tutorials, compilers, users, and more.
OpenMP is a specification for a set of compiler directives, library routines, and environment variables that can be used to specify shared memory parallelism in Fortran and C/C++ programs.
In order to parallelize a code, programmers look for regions of code whose instructions can be shared among the processors. Much of the time, they focus on distributing the work in loop nests to the processors. In most programs, code executed on one processor require results that have been calculated on another one. In principle, this is not a problem because a value produced by one processor can be stored in main memory and retrieved from there by code running on other processors as needed. However, the programmer needs to ensure that the value is retrieved after it has been produced, that is, that the accesses occur in the required order. Since the processors operate independently of one another, this is a nontrivial difficulty: their clocks are not synchronized, and they can and do execute their portions of the code at slightly different speeds.
To solve this problem, the vendors of SMPs in the 1980s provided special notation to specify how the work of a program was to be parceled out to the individual processors of an SMP, as well as to enforce an ordering of accesses by different threads to shared data. The notation mainly took the form of special instructions, or directives, that could be added to programs written in sequential languages, especially Fortran. The compiler used this information to create the actual code for execution by each processor. Although this strategy worked, it had the obvious deficiency that a program written for one SMP did not necessarily execute on another one.
In order to solve this problem of non-standardization, OpenMP was defined by the OpenMP ARB, a group of vendors who joined forces during the latter half of the 1990s to provide a common means for programming a broad range of SMP architectures. The first version, consisting of a set of directives that could be used with Fortran, was introduced to the public in late 1997. (§rewrite)
(Quote from Using OpenMP: Portable Shared Memory Parallel Programming, by Barbara Chapman, Gabriele Jost and Ruud van der Pas, MIT Press, Cambridge, MA, 2008)
There are several reasons why you should use OpenMP: (a) OpenMP is the most widely standard for SMP systems, it supports 3 different languages (Fortran, C, C++), and it has been implemented by many vendors. (b) OpenMP is a relatively small and simple specification, and it supports incremental parallelism. (c) A lot of research is done on OpenMP, keeping it up to date with the latest hardware developments.
A number of compilers from various vendors or open source communities implement the OpenMP API. The full list of compilers can be found here.
The users of OpenMP are working in industry and academia, in fields varying from aeronautics, automotive, pharmaceutics to finance, and on devices varying from embedded multicore systems to high-end supercomputing systems. cOMPunity is the community of OpenMP researchers and developers in academia and industry. It is a forum for the dissemination and exchange of information about OpenMP.
The OpenMP ARB has also started compiling a list of users of OpenMP.
OpenMP is designed for Fortran, C and C++. OpenMP can be supported by compilers that support one of Fortran 77, Fortran 90, Fortran 95, Fortran 2003, ANSI 89 C or ANSI C++, but the OpenMP specification does not introduce any constructs that require specific Fortran 90 or C++ features
OpenMP can deliver scalability for applications using shared-memory parallel programming. Significant effort was spent to ensure that OpenMP can be used for scalable applications. However, ultimately, scalability is a property of the application and the algorithms used. The parallel programming language can only support the scalability by providing constructs that simplify the specification of the parallelism and can be implemented with low overhead by compiler vendors. OpenMP certainly delivers these kinds of constructs.
There are several sources of information on using the API:
OpenMP fully supports loop-level parallelism. Loop-level parallelism is useful for applications which have lots of coarse loop-level parallelism, especially those that will never be run on large numbers of processors or for which restructuring the source code is either impractical or disallowed. Typically, though, the amount of loop-level parallelism in an application is limited, and this in turn limits the scalability of the application.
OpenMP allows you to use loop-level parallelism as a way to start scaling your application for multiple processors, but then move into coarser grain parallelism, while maintaining the value of your earlier investment. This incremental development strategy avoids the all-or-none risks involved in moving to message-passing or other parallel programming models.
If a thread in a team executing a parallel region encounters another parallel construct, it creates a new team and becomes the master of that team. This is generally referred to in OpenMP as nested parallelism, and it is supported by the OpenMP specification. Certain recursive algorithms can take advantage of nested parallelism in a natural way.
(Quote from Using OpenMP: Portable Shared Memory Parallel Programming, by Barbara Chapman, Gabriele Jost and Ruud van der Pas, MIT Press, Cambridge, MA, 2008)
Parallelization strategies that parcel out pieces of work to different threads are generally referred to as task parallelism. This is supported by the OpenMP specification.
(Quote from Using OpenMP: Portable Shared Memory Parallel Programming, by Barbara Chapman, Gabriele Jost and Ruud van der Pas, MIT Press, Cambridge, MA, 2008)
In general it should be fastest to parallelize the outer loop only, provided there is sufficient parallelism to keep the threads busy and load balanced. Parallelizing the inner loop only adds an overhead for every parallel region encountered which (although dependent on the implementation and the number of threads) is typically of the order of tens of microseconds. Parallelizing both loops is unlikely to be the most efficient solution, except maybe in some corner cases.
Sometimes, if the outer loop doesn't have much parallelism but the inner loop does, there is the option to put the #pragma omp parallel outside the outer loop, and the #pragma omp for before the inner loop. This amortizes the overhead of creating the parallelism outside the outer loop, runs the outer loop 'redundantly' on all threads, and work-shares the iterations of the inner loop, which can be the better than parallelizing either the outer or the inner loop alone.
OpenMP version 3.1 was released in July 2011, 3 years after the release of OpenMP version 3.0. The major changes between these versions are:
final and mergeable clauses were added to
the task construct to support optimization of task data environments.
taskyield construct was added to allow userdefined
task switching points.
atomic construct was extended to include
read, write, and capture forms, and an update clause was added to apply
the already existing form of the atomic construct.
intent(in) and constqualified
types for the firstprivate clause.
firstprivate and lastprivate.
min and max were added for C and C++.
atomic region. This allows an atomic
region to be consistently defined with other OpenMP regions so that they include all
the code in the atomic construct.
omp_in_final runtime library routine was
added to support specialization of final task regions.
OpenMP version 3.0 was released in May 2008, 3 years after the release of OpenMP version 2.5. The major changes between these versions are:
task construct has been added, which provides a
mechanism for creating tasks explicitly.
taskwait construct has been added, which
causes a task to wait for all its child tasks to complete.
volatile in terms of
flush was removed.
omp_set_num_threads, omp_set_nested and omp_set_dynamic
runtime library routines now have specified effects when called from inside a
parallel region.
parallel region has been changed: in Version 3.0 a
parallel region is active if it is executed by a team consisting of more than one
thread.
parallel region have
been modified.
static schedule kind is deterministic.
collapse
clause.
auto has been added, which gives the implementation the
freedom to choose any possible mapping of iterations in a loop construct to threads in
the team.
firstprivate is now permitted as an argument to the default
clause.
private clause, implementations are no longer permitted to use
the storage of the original list item to hold the new list item on the master thread. If
no attempt is made to reference the original list item inside the parallel region, its
value is well defined on exit from the parallel region.
private,
firstprivate, lastprivate, reduction, copyin and copyprivate
clauses.
threadprivate
directive.
omp_set_schedule and omp_get_schedule
have been added; these routines respectively set and retrieve the value of the
run_sched_var ICV.
OMP_THREAD_LIMIT environment variable and retrieved with the
omp_get_thread_limit runtime library routine.
OMP_MAX_ACTIVE_LEVELS environment variable and the
omp_set_max_active_levels runtime library routine, and it can be retrieved
with the omp_get_max_active_levels runtime library routine.
OMP_STACKSIZE environment variable.
OMP_WAIT_POLICY
environment variable.
omp_get_level runtime library routine has been added, which returns the
number of nested parallel regions enclosing the task that contains the call.
omp_get_ancestor_thread_num runtime library routine has been added,
which returns, for a given nested level of the current thread, the thread number of the
ancestor.
omp_get_team_size runtime library routine has been added, which returns,
for a given nested level of the current thread, the size of the thread team to which the
ancestor belongs.
omp_get_active_level runtime library routine has been added, which
returns the number of nested, active parallel regions enclosing the task that
contains the call.
The OpenACC API makes it feasible to run loops and regions of code on an attached accelerator. Compiler directives for Fortran, C, and C++ are specified for this purpose in the API.
OpenACC is meant to be folded back into OpenMP. OpenACC implementations can be considered to be a beta test of the OpenMP accelerator specification. They give early implementation experience.
OpenACC has been created and implemented by several members of the OpenMP ARB in order to address their immediate customer needs. These members are NVIDIA, PGI, Cray, and CAPS.
Message-passing has become accepted as a portable style of parallel programming, but has several significant weaknesses that limit its effectiveness and scalability. Message-passing in general is difficult to program and doesn't support incremental parallelization of an existing sequential program. Message-passing was initially defined for client/server applications running across a network, and so includes costly semantics (including message queuing and selection and the assumption of wholly separate memories) that are often not required by tightly-coded scientific applications running on modern scalable systems with globally addressable and cache coherent distributed memories.
Pthreads have never been targeted toward the technical/HPC market. This is reflected in the minimal Fortran support, and its lack of support for data parallelism. Even for C applications, pthreads requires programming at a level lower than most technical developers would prefer.
The Mobile Industry Processor Interface (MIPI) is addressing a range of debug interface efforts for multi-core devices. However, its specifications are focused on mobile devices and not multi-core processors in general.
(Quote from: Kenn R. Luecke (2012). Software Development for Parallel and Multi-Core Processing, Embedded Systems - in High Performance Systems, Applications and Projects; Dr. Kiyofumi Tanaka (Ed.), ISBN: 978-953-51-0350-9, Available from Intech.)
Version 1.0
Last updated: December 21, 2012 MvW
Copyright © 1997-2012 OpenMP ARB Corporation
All Rights Reserved