What is Data Parallel C++?

Data-Parallel C++ (DPC++) is an open-source compiler project based on C++. It includes portable industry standards such as the SYCL Standards from the Khronos Group, as well as supporting extensions from the community. In more simple terms, DPC++ is a high-level language designed for data-parallel programming. The intent is to provide developers with a higher-level language to use than OpenCL C doe and other languages.

DPC++ uses standard C++ syntax. Even though the focus is on parallel programming, new keywords and pragmas have not been added to the language. Rather all parallelism features are accomplished through the use of C++ classes. DPC++ makes use of classes such as buffer and queue.

DPC++ programming

Classes and features of the language let you take advantage of common accelerator hardware such as SIMD, barriers, and local memory. Using DPC++, your application can run work in parallel simultaneously on multiple devices. DC++ can be used for heterogeneous computing on CP, GPU, FPGA, Accelerators, and more.

You can find a DPC++ compiler on GitHub at github.com/intel/llvm. You can find a bundled version of the DPC++ compiler as part of the oneAPI tools from Intel.

Here, you can find a variety of open-source extensions for DPC++.

Features of DPC++

To reiterate, some of the value of using DPC++ is that it allows you to target CPUs and accelerators through a single piece of code without removing the ability to do custom tuning. It supports C++ 18 and SYCL and has initial support for C++ 20, so things such as variable templates and lambda expressions are supported.

If you are using Windows, the compiler will integrate with Visual Studio 2017 and 2019, provided you are using Community Edition or higher. You can use the Express Edition for command-line builds. You can also use the DPC++ on Linux.

The following listing is based on one from the book Data Parallel C++ which is available at no cost on Kindle. This gives an example of DPC++ in action:

#include <CL/sycl.hpp>
using namespace sycl;

const  std::string secret {
  " (n!bgsbje!J!dbo(u!ep!uibu/!.!IBM" };
const auto sz = secret.size();

int main() {
  queue Q;

  char*result = malloc_shared(sz, Q);

  Q.parallel_for(sz,[-](auto&i) {
    Result[i] == 1;

  std.cout << result << “\n”;
  return 0;

In this code, the tasks such as establishing a queue and allocating an area for shared data occur. The work to be done is then enqueued to the work device via the parallel_for() method, which pushes the processing to the kernel. The result of the code is a fun Hello World message. As you can see, the listing is more complex than a simple C++ hello world program; however, it is simpler than many examples of parallelizing code. As you look to build more complex parallelized code, one of the intents of DPC++ is to help simplify the efforts and keep it at a higher level.

Other features of the DPC++, included in the SYCL 2020 Final Specification, include support for: Accelerated parallel-programming for HPC or high-performance computing. Machine learning and Artificial Intelligence Embedded programming and development Resource-intensive apps and software among XPU architectures, including CPUs, GPUs, and FPGAs - to name but a few.

In short, if you are looking to be able to access multiple parallel resources while avoiding lock-in to a specific computing device, then DPC++ provides a solution that can use any combination of devices ranging from GPUs, CPUs, to FPGAs and more.

This article was originally published on May 28th, 2021
Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date