OpenMP API Speeds Up Autonomous Driving Codes

Author: Matthijs van Waveren

The execution time of an autonomous driving system is essential for driver safety. In case of an emergency, the system should react within 100 msec, compared with a 300-msec reaction time for a human driver. Thus a lot of effort goes into optimizing autonomous driving software—with mapping, object recognition and motion planning modules—and this includes the use of the OpenMP API in parallelizing key parts of the software. This blog describes how OpenMP has been used to parallelize mapping, object detection, motion planning, and visualization modules of the open source Autoware and Apollo software running on embedded systems.

Autoware platform

The Autoware platform [1] is an autonomous driving open source project founded in 2015 by Shinpei Kato of Nagoya University.  Sixty organizations are members of the project, and the software is used in more than thirty vehicle models.  It has been qualified to pilot driverless vehicles on public roads in Japan since 2017. Figure 1 shows the design of the Autoware platform.

Figure 1: Design of the Autoware platform

Lukas Sommer at TU Darmstadt extracted two modules from the Autoware platform: points2image and euclidean_clustering, and added them to the DAPHNE benchmark suite [2]. The points2image module is used in the visualization of maps. The euclidean clustering module implements the Euclidean clustering algorithm used for object detection.

Lukas parallelized the modules using these OpenMP actions [3]:

  • Added OpenMP offloading constructs to the code, e.g. omp target teams distribute.
  • Modified the LLVM OpenMP runtime to allow for allocation of memory that can be accessed by both the host CPU and the GPU . Note that this optimization is supported directly by recent OpenMP standards.
  • Defined a specialized function for the atomic update using the declare variant construct.

The target platform is NVIDIA’s Jetson AGX Xavier platform. This embedded system integrates 8- Arm CPUs and a 512-core NVIDIA Volta GPU with 64 Tensor cores.  These optimizations increased the speed of the points2image module by a factor of 2.5x, and the euclidean_clustering module by a factor of 3.25x.

Apollo AI platform

The Apollo AI platform [4] is an open source autonomous driving project founded in 2017 by Baidu, a Chinese technology company specializing in internet services and artificial intelligence.  The company uses the platform in their Apollo autonomous driving vehicles, including minibuses and valet parking vehicles.  Baidu has been testing its vehicles in several major Chinese cities, including Beijing, Shanghai, and Shenzhen.  In Beijing, the company is now charging passengers for rides in its driverless taxis around Shougang Park, one of the sites for the Winter Olympics in 2022.  Baidu has entered a partnership with state-owned automaker BAIC Group and plans to roll out a fleet of 1,000 fully autonomous cars over the next three years.  It obtained the first batch of T4 road test licenses in China in 2019.

Figure 2 shows the modules included in the platform.

Figure 2: Apollo modules

Engineers at Intel parallelized the Expectation-Maximization (EM) motion planning module [5] of the Apollo platform using OpenMP directives [6].  The loop candidates for parallelization were identified by analyzing the loops in the code for CPU time, loop trip count, and loop dependence.  Multithreading was applied to suitable loops, and improved the performance by a factor of 1.4x if eight threads were used, however no gain was obtained with more than eight OpenMP threads.  The target platform is the Intel Harcuvar platform, which is based on the Intel Atom® CPU. This platform is an embedded system with 16 Atom cores.


The examples above are just two illustrations of the use of the OpenMP API to optimize key parts of autonomous driving software running on embedded systems. These examples show that the OpenMP API has a role to play both in the autonomous driving space and in the wider embedded systems space.


[1] Project Autoware GitHub :

[2] Sommer, L., Stock, F., Solis-Vasquez, L., and Koch, A. (2019). Work-in-Progress: DAPHNE – Automotive Benchmark Suite for Parallel Programming Models on Embedded Heterogeneous Platforms. In Proceedings of the International Conference on Embedded Software (accepted for publication 07/2019), EMSOFT ’19. Piscataway, NJ, USA: IEEE Press.

[3] Sommer, L., and Koch, A. (2020). OpenMP Device Offloading for Embedded Heterogeneous Platforms – Work-in-Progress. In Proceedings of the International Conference on Embedded Software, EMSOFT ’20. Piscataway, NJ, USA: IEEE Press.

[4] Project Apollo Github :

[5] Haoyang Fan, Fan Zhu, Changchun Liu, Liangliang Zhang, Li Zhuang, Dong Li, Weicheng Zhu, Jiangtao Hu, Hongye Li, Qi Kong, “Baidu Apollo EM Motion Planner” –

[6] Hung-Ju Tsai , Yuan Chen, and Yang Wang. Performance Optimization for an Autonomous Driving Planning module in Project Apollo. Published on Intel website on 20 december 2018.

Legal: The OpenMP name and the OpenMP logo are registered trademarks of the OpenMP Architecture Review Board. Other names and brands are the property of others.