A number of compilers and tools from various vendors or open source community initiatives implement the OpenMP API. If we are missing any please Contact Us with your suggestions.
|Absoft Pro Fortran||Fortran||Versions 11.1 and later of the Absoft Fortran 95 compiler for Linux, Windows and Mac OS X include integrated OpenMP 3.0 support. Version 18.0 supports OpenMP 3.1. Compile with -openmp. More information|
Available on Linux
|C/C++ – Support for OpenMP 3.1 and all non-offloading features of OpenMP 4.0/4.5. Offloading features are under development. Fortran – Full support for OpenMP 3.1 and limited support for OpenMP 4.0/4.5. Compile and link your code with -fopenmp More information|
|Barcelona Supercomputing Center||Mercurium
|Mercurium is a source-to-source research compiler that is available to download at https://github.com/bsc-pm/mcxx. OpenMP 3.1 is almost fully supported for C, C++, Fortran. Apart from that, almost all tasking features introduced in newer versions of OpenMP are also supported. » More Information|
|Cray Compiling Environment (CCE) 8.6 (June 2017) supports OpenMP 4.5 for C, C++ and Fortran. OpenMP is on by default.|
|Fortran for LLVM. Substantially full OpenMP 4.5 on Linux/x86-64, Linux/ARM, Linux/OpenPOWER.
TARGET regions are mapped to the multicore host CPU as the target with PARALLEL and DISTRIBUTE loops parallelized across all OpenMP threads. Known limitations: SIMD and DECLARE SIMD have no effect on SIMD code generation; TASK DEPEND/PRIORITY, TASKLOOP FIRSTPRIVATE/LASTPRIVATE, DECLARE REDUCTION and the LINEAR/SCHEDULE/ORDERED(N) clauses on the LOOP construct are not supported.
Compile with -mp to enable OpenMP on all platforms.
|Free and open source – Linux, Solaris, AIX, MacOSX, Windows, FreeBSD, NetBSD, OpenBSD, DragonFly BSD, HPUX, RTEMS
From GCC 4.2.0, OpenMP 2.5 is fully supported for C/C++/Fortran.
|XL C/C++ for Linux V13.1.6 and XL Fortran for Linux V15.1.6 support a subset of OpenMP 4.5 features that include the target constructs.
Compile with -qsmp=omp to enable OpenMP directives and with -qoffload for offloading the target regions to GPUs.
For more information, please visit IBM XL C/C++ for Linux and IBM XL Fortran for Linux.
|Intel||C/C++/Fortran||Windows, Linux, and MacOSX.
OpenMP 3.1 C/C++/Fortran fully supported in version 12.0, 13.0, 14.0 compilers
|Lahey/Fujitsu Fortran 95||C/C++/Fortran||The compilers in the software package of ‘Technical Computing Suite for the PRIMEHPC FX100′ support OpenMP 3.1.|
|LLNL Rose Research Compiler||C/C++/Fortran||ROSE is a source-to-source research compiler supporting OpenMP 3.0 and some OpenMP 4.0 accelerator features targeting NVIDIA GPUs.
» More information
|Clang is an open-source (permissively licensed) C / C++ compiler that is available to download at http://llvm.org/releases/download.html. Since Clang 3.9 and later support all non-offloading features of OpenMP 4.5. Offloading support is under development and is expected to become available in a future version.
Compile and link your code with -fopenmp
|NAG Fortran Compiler 6.1 supports OpenMP 3.1 on x86 and x64, for Linux, Mac and Windows. Compile with –openmp.
» More Information
|OpenUH Research Compiler||C/C++/Fortran||The OpenUH 3.x compiler has a full open-source implementation of OpenMP 2.5 and near-complete support for OpenMP 3.0 (including explicit task constructs) on Linux 32-bit or 64-bit platforms. For more information or to download: https://github.com/uhhpctools/openuh|
|Oracle||C/C++/Fortran||Oracle Developer Studio 12.5 compilers (C, C++, and Fortran) support OpenMP 4.0 features.
More information at http://www.oracle.com/technetwork/server-storage/developerstudio/overview/index.html
Compile with -xopenmp. Use -xvpara for static correctness checking and -xloopinfo for loop level messages.
Use the er_src tool to get more detailed information.
|PGI||C/C++/Fortran||Support for substantially full OpenMP 4.5 in Fortran/C/C++ on Linux/x86-64 and Linux/OpenPOWER. TARGET regions are implemented with default support for the multicore host as the target, and PARALLEL and DISTRIBUTE loops are parallelized across all OpenMP threads. Known limitations: SIMD and DECLARE SIMD have no effect on SIMD code generation; TASKLOOP, TASK DEPEND/PRIORITY, DECLARE REDUCTION and the LINEAR/SCHEDULE/ORDERED(N) clauses on the LOOP construct are not supported.
Support for full OpenMP 3.1 in Fortran/C/C++ on MacOS/x86-64, and in Fortran/C on Windows/x86-64. Compile with -mp to enable OpenMP on all platforms.
|Texas Instruments||C||The TI cl6x compiler v8.x supports OpenMP 3.0 for multicore C66x on TI’s Keystone I family of Multicore C667x/C665x Digital Signal Processor (DSP) SoCs using the Processor-SDK-RTOS.
The Linaro toolchain (gcc) 6.2.1 supports OpenMP 4.5 for multicore Cortex-A15 on TI’s AM572x and Keystone II family (K2H/K2K, K2E, K2L, K2G) SoCs using the Processor-SDK-Linux.
The TI clacc v1.x compiler supports OpenMP 3.0 and device constructs from OpenMP 4.0 heterogeneous multicore Cortex-A15+C66x-DSP on TI’s AM572x and Keystone II family (K2H/K2K, K2E, K2L, K2G) SoCs using both the Processor-SDK-Linux (A15) and Processor-SDK-RTOS (C66x).
See here for the latest versions of the Processor-SDKs for various TI SoCs: http://processors.wiki.ti.com/index.php/Processor_SDK_Supported_Platforms_and_Versions
|DDT, Map / C, C++, Fortran||Arm||Arm DDT is a powerful, easy-to-use graphical debugger. It includes static analysis that highlights potential problems in the source code, integrated memory debugging that can catch reads and writes outside of array bounds, integration with MPI message queues and much more. It provides a complete solution for finding and fixing problems whether on a single thread or thousands of threads. Debug with Arm DDT (https://developer.arm.com/products/software-development-tools/hpc/arm-forge/arm-ddt)
Arm MAP is a parallel profiler that shows you which lines of code took the most time and why. It supports both interactive and batch modes for gathering profile data, and supports MPI, OpenMP and single-threaded programs. Syntax-highlighted source code with performance annotations, enable you to drill down to the performance of a single line, and has a rich set of zero-configuration metrics, showing memory usage, floating-point calculations and MPI usage across processes. Profile with Arm MAP (https://developer.arm.com/products/software-development-tools/hpc/arm-forge/arm-map)
|Extrae, Paraver / C, C++. Fortran, Java, Python||BSC||Extrae is an instrumentation package that collects performance data and saves it in Paraver trace format. It supports the instrumentation of MPI, OpenMP, pthreads, OmpSs, CUDA, OpenCL, with C, C++, Fortran, Java and Python. With respect to OpenMP, it recognizes the main runtime calls for Intel and GNU compilers allowing instrumentation at loading time with the production binary. Extrae also supports the OMPT interface that would enable to intercept other OpenMP runtimes. More information
Paraver is a performance analyzer based on traces with a great flexibility to explore the collected data. It was developed to respond to the need to have a qualitative global perception of the application behavior by visual inspection and then to be able to focus on the detailed quantitative analysis of the problems. The tool can be considered a data browser that can explore any information expressed on its trace format. Extrae is the main provider of Paraver traces despite the trace format is public and it has been used to collect information of system behavior, power metrics and user customized metrics. More information
|HPCToolkit||RICE University||HPCToolkit is an integrated suite of tools for measurement and analysis of program performance on computers ranging from multicore desktop systems to the nation’s largest supercomputers. HPCToolkit provides accurate measurements of a program’s work, resource consumption, and inefficiency, correlates these metrics with the program’s source code, works with multilingual, fully optimized binaries, has very low measurement overhead, and scales to large parallel systems. HPCToolkit’s measurements provide support for analyzing a program execution cost, inefficiency, and scaling characteristics both within and across nodes of a parallel system. More Information.|
|ParaFormance||ParaFormance Technologies||ParaFormance is a software tool-chain that allows software developers to quickly and easily write multi-core software. ParaFormance enables software developers to find the sources of parallelism within their code, automatically (through user-controlled guidance) inserting the parallel business logic (using OpenMP and TBB), and checking that the parallelised code is thread-safe. More information|
|Parallware C/C++||Appentra Solutions||The Parallware tools include the Parallware Trainer, an interactive, real-time desktop tool that facilitates teaching, learning, and the usage of parallel programming using directives of OpenMP 4.5. More Information|
|Reveal||CRAY||Reveal is Cray’s performance analysis and code optimization tool that combinines run time performance statistics and program source code visualization with Cray Compiling Environment (CCE) compile-time optimization feedback. Reveal supports source code navigation using whole-program analysis data provided by the Cray Compiling Environment, coupled with performance data collected during program execution by the Cray performance tools, to understand which high-level serial loops could benefit from improved parallelism.|
|Scalasca Trace Tools||Juelich Supercomputing Centre||The Scalasca Trace Tools are a collection of trace-based performance analysis tools that have been specifically designed for use on large-scale systems. A distinctive feature is the scalable automatic trace-analysis component which provides the ability to identify wait states that occur, e.g., as a result of unevenly distributed workloads. Besides merely identifying wait states, the trace analyzer is also able to pinpoint their root causes and to identify the activities on the critical path of the target application, highlighting those routines which determine the length of the program execution and therefore constitute the best candidates for optimization. The Scalasca Trace Tools process traces generated by the Score-P measurement infrastructure and produce reports that can be explored with Cube or TAU ParaProf/PerfExplorer. More information|
|Score-P||Score-P Developer Community||The Score-P measurement infrastructure is an extremely scalable and easy-to-use tool suite for call-path profiling, event tracing, and online analysis of applications written in C, C++, or Fortran. It supports a wide range of HPC platforms and programming models; besides OpenMP, Score-P can hook into other common models, including MPI, SHMEM, Pthreads, CUDA, OpenCL, OpenACC, and their valid combinations. Score-P is capable of gathering performance information through automatic instrumentation of functions, library interception/wrapping, source-to-source instrumentation, event- and interrupt-based sampling, and hardware performance counters. Score-P measurements are the primary input for a range of specialized analysis tools, such as: Cube, Vampir, Scalasca Trace Tools, TAU, or Periscope. More information.|
|TAU / C, C++, Fortran, Java, Python, Spark||University of Oregon||TAU is a performance evaluation tool that supports both profiling and tracing for programs written in C, C++, Fortran, Java, Python, and Spark. For instrumentation of OpenMP programs, TAU includes source-level instrumentation (Opari), a runtime “collector” API (called ORA) built into an OpenMP compiler (OpenUH), a wrapped OpenMP runtime library (GOMP using ORA), and an OpenMP runtime library supporting an OMPT prototype (Intel/LLVM). View technical paper. TAU supports both direct probe based measurements as well as event-based sampling modes for profiling. For tracing, TAU provides an open-source trace visualizer (Jumpshot) and can generate native OTF2 trace files that may be visualized in the Vampir trace visualizer. TAU Commander simplifies the TAU workflow and installation. TAU supports both PAPI and LIKWID toolkits to access low-level processor specific hardware performance counter data to correlate it to the OpenMP code regions. TAU ships with a BSD style license. More Information.|
|TotalView for HPC||RogueWave||With TotalView for HPC, simultaneous debug many processes and threads in a single window to get complete control over program execution: Running, stepping, and halting line-by-line through code within a single thread or arbitrary groups of processes or threads. Work backwards from failure through reverse debugging, isolating the root cause faster by eliminating repeated restarts of the application. Reproduce difficult problems that occur in concurrent programs that use threads, OpenMP. More Information.|
|Vampir||Technische Universität Dresden||Vampir provides an easy-to-use framework that enables developers to quickly display and analyze arbitrary program behavior at any level of detail. The tool suite implements optimized event analysis algorithms and customizable displays that enable fast and interactive rendering of very complex performance monitoring data. Score-P is the primary code instrumentation and run-time measurement framework for Vampir and supports various instrumentation methods, including instrumentation at source level and at compile/link time. More Information.|
|VTune Amplifier XE||Intel||Whether you’re tuning for the first time or doing advanced performance optimization, Intel VTune Amplifier provides accurate profiling data―collected with very low overhead. But good data isn’t enough. Intel VTune Amplifier gives you the tools to mine it and interpret it. Quickly turn raw profiling data into performance insight using the graphical interface to sort, filter, and visualize data from a local or remote target. Or use the command line interface to automate analysis. More Information.|
|CIM™ Heterogeneous Programming / C, C++||Signalogic||CIM™ enables code generation for combined Intel x86 and Texas Instruments c66x platforms. Within C/C++ source code, OpenMP pragmas can be used to mark sections of code that should be compiled and built for c66x run-time. c66x I/O functions are supported, allowing c66x to “front” incoming data for high capacity media and streaming applications.|