Native Parallel Programming for Visual C++ with the Parallel Processing Library

WEBINAR: On-demand webcast

How to Boost Database Development Productivity on Linux, Docker, and Kubernetes with Microsoft SQL Server 2017 REGISTER >

Taking advantage of the processing power available in today's multi-core and multi-processor world is the holy grail of performance for today's C++ developer. Existing libraries like OpenMP make some advances in the ability to simply and effectively use multiple threads declaratively, but it is still difficult to find a middle-ground between a pure declarative mode and the traditional fine-grained thread control model. New libraries in Visual C++ 2010 address this gap, allowing developers to write code based on the concept of a task. Tasks are then scheduled for execution using the available computing resources.

Visual C++ 2010 ships with a new library for concurrent processing called the Parallel Pattern Library (PPL). A similar library exists for managed development, called the Task Parallel Library (TPL), but despite similarities between the two libraries, they don't share a common code base and each is optimized for the languages it targets and the runtime environment in which it executes.

Rather than using the concepts of threads and fibers as the base unit of scheduling, the PPL uses the more abstract concept of a task as the basis to allocate and schedule work. Using a more abstract work allocation unit means that the tedious work of mapping code to available processor resources can be handled at a library level rather than explicitly in application code-a strategy that accommodates a diverse range of hardware configurations easily.

You declare a task object using lambda expressions:

#include <functional> 
#include "ppl.h"

using namespace Concurrency;
using namespace std::tr1;

task_handle> consoleWrite1 =

The preceding code snippet declares a simple task that writes a string to the standard output stream. The included ppl.h header file includes the type definitions from the PPL, while <functional>, is part of the C++ TR1 enhancements that ships with Visual C++ 2008 SP1 and Visual C++ 2010. The using directive specifies pertinent namespaces declared in the two header files. You reference tasks in the PPL using the templatized task_handle type. The preceding code 's task handle refers to a function that takes no parameters and returns no values. The final piece of the task declaration is a standard lambda expression - in this case a by-value capture is specified with the square brackets. The lambda body includes a simple printf statement/

The task_handle type has very limited in-built functionality, and you can't execute a task directly by itself. Instead, tasks are executed as part of a task_group, which provides the functionality to schedule various tasks for execution:

//declare tasks
task_handle> consoleWrite0 =
task_handle> consoleWrite1 =

 //execute tasks
 task_group tg;;;

 //wait for task completion
 task_group_status status = tg.wait();

 if (status == completed){
  printf("Tasks were completed successfullyn");
  printf("Tasks were cancelled or an exception occurred during task executionn");

The preceding code declares two tasks, and then creates a task-_group to execute them. The task_group's wait method waits for the two tasks to complete. The code checks the return value of the wait method to check whether the task executed successfully.

As the code uses the task_handle variables only once, to call the task_group's run method, the code can be simplified to:

//declare and execute tasks
task_group tg;[]{printf("consoleWrite0n");}));[]{printf("consoleWrite1n");}));

//wait for task completion
task_group_status status = tg.wait();


PPL Helper Types

In addition to the elemental task_handle and task_group types, the PPL includes three parallel helper functions that are similar to standard C++ features. The parallel_for loop is the same as the C/C++ language for loop, but the statements within the loop execute in parallel. Using that feature, you can rewrite the earlier code sample as:

parallel_for(0, 2, [](size_t ix)
  printf("consoleWrite%in", ix);

The preceding code is functionally equivalent to:

for(int ix  = 0, ix < 2, ++ix)
  printf("consoleWrite%in", ix);

The parallel_for type takes an initial value and a max value for the loop index, and a lambda expression to execute for each loop iteration. An overloaded version of parallel_for also lets you specify a step value other than 1.

The STL algorithm for_each applies a function object to each element within a nominated range. The PPL has an equivalent parallel_for_each which behaves the same but executes the per-element operation in parallel using tasks and task groups under the cover:

vector v;

//add one to each element
parallel_for_each(v.begin(), v.end(), [](int &elem){
 elem += 1;

The final helper function in the PPL is parallel_invoke, which executes the function objects specified in the parameter list in parallel. The October 2008 CTP of Visual Studio 2010 ships with 9 overloads that allow up to 10 function object or lambda expressions to be executed with a single call to parallel_invoke. Using this helper function, the code sample from the start of the article can be condensed into a single statement:


The call to parallel_invoke will not return until each of the function objects have been called and returned, so there is no need for the equivalent of the task_group.wait call. One disadvantage of the current parallel_invoke function is that it doesn't supply a return value that you can check for exceptions that occurred during function object execution, so if it is critical to monitor the success of each parallel invocation, you need to track status using a try-catch block, recording status with an external variable. Unfortunately, that negates some of the benefits of using the streamlined parallel_invoke over the task_group type.

The Parallel Pattern Library provides a general purpose library containing methods that can execute multiple processing tasks across the available processing resources, eliminating the headache of manually allocating task execution. When combined with the TR1 enhancements to C++, particularly the ability to use lambda functions, the PPL provides a convenient and readily-usable toolkit that combines the simplicity of managed-language equivalents with the elegance and expressiveness of C++.

The PPL forms part of a larger parallel programming offering that ships with Visual C++ 2010. Future articles will cover other PPL libraries.

About the Author

Nick Wienholt is an independent Windows and .NET consultant based in Sydney. He is the author of Maximizing .NET Performance and co-author of A Programmers Introduction to C# 2.0 from Apress, and specializes in system-level software architecture and development, with a particular focus of performance, security, interoperability, and debugging.

Nick is a keen and active participant in the .NET community. He is the co-founder of the Sydney Deep .NET User group and writes technical article for Australian Developer Journal, ZDNet, Pinnacle Publishing, CodeGuru, MSDN Magazine (Australia and New Zealand Edition) and the Microsoft Developer Network. In recognition of his work in the .NET area, Nick was awarded the Microsoft Most Valued Professional Award from 2002 through 2007.


  • parallel processing veruss parallel patterns

    Posted by Brad Jones on 08/19/2010 04:21pm

    The title says Parallel Processing Library. The article says Parallel Patterns Library.

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • The software-defined data center (SDDC) and new trends in cloud and virtualization bring increased agility, automation, and intelligent services and management to all areas of the data center. Businesses can now more easily manage the entire lifecycle of their applications and services via the SDDC. This Aberdeen analyst report examines how a strong foundation in both the cloud and internal data centers is empowering organizations to fully leverage their IT infrastructure and is also preparing them to be able …

  • On-demand webcast Continuous integration and continuous deployment (CI/CD) allow DevOps teams to be more efficient. When starting from a production environment, the use of Microsoft SQL Server 2017 in Docker containers and Kubernetes clusters can facilitate a DevOps CI/CD pipeline. Using SQL Server tools also allows you to easily integrate core DevOps application lifecycle management practices to database development. Watch this on-demand presentation to learn how defining the database dependency as …

Most Popular Programming Stories

More for Developers

RSS Feeds

Thanks for your registration, follow us on our social networks to keep up-to-date