Native Parallel Programming for Visual C++ with the Parallel Processing Library

Taking advantage of the processing power available in today’s multi-core and multi-processor world is the holy grail of performance for today’s C++ developer. Existing libraries like OpenMP make some advances in the ability to simply and effectively use multiple threads declaratively, but it is still difficult to find a middle-ground between a pure declarative mode and the traditional fine-grained thread control model. New libraries in Visual C++ 2010 address this gap, allowing developers to write code based on the concept of a task. Tasks are then scheduled for execution using the available computing resources.


Visual C++ 2010 ships with a new library for concurrent processing called the Parallel Pattern Library (PPL). A similar library exists for managed development, called the Task Parallel Library (TPL), but despite similarities between the two libraries, they don’t share a common code base and each is optimized for the languages it targets and the runtime environment in which it executes.


Rather than using the concepts of threads and fibers as the base unit of scheduling, the PPL uses the more abstract concept of a task as the basis to allocate and schedule work. Using a more abstract work allocation unit means that the tedious work of mapping code to available processor resources can be handled at a library level rather than explicitly in application code-a strategy that accommodates a diverse range of hardware configurations easily.


You declare a task object using lambda expressions:

#include <functional>
#include “ppl.h”

using namespace Concurrency;
using namespace std::tr1;

task_handle> consoleWrite1 =
([]{printf(“consoleWrite1n”);});


The preceding code snippet declares a simple task that writes a string to the standard output stream. The included ppl.h header file includes the type definitions from the PPL, while <functional>, is part of the C++ TR1 enhancements that ships with Visual C++ 2008 SP1 and Visual C++ 2010. The using directive specifies pertinent namespaces declared in the two header files. You reference tasks in the PPL using the templatized task_handle type. The preceding code ‘s task handle refers to a function that takes no parameters and returns no values. The final piece of the task declaration is a standard lambda expression – in this case a by-value capture is specified with the square brackets. The lambda body includes a simple printf statement/


The task_handle type has very limited in-built functionality, and you can’t execute a task directly by itself. Instead, tasks are executed as part of a task_group, which provides the functionality to schedule various tasks for execution:

//declare tasks
task_handle> consoleWrite0 =
([]{printf(“consoleWrite0n”);});
task_handle> consoleWrite1 =
([]{printf(“consoleWrite1n”);});

//execute tasks
task_group tg;
tg.run(consoleWrite0);
tg.run(consoleWrite1);

//wait for task completion
task_group_status status = tg.wait();

if (status == completed){
printf(“Tasks were completed successfullyn”);
}
else{
printf(“Tasks were cancelled or an exception occurred during task executionn”);
}


The preceding code declares two tasks, and then creates a task-_group to execute them. The task_group’s wait method waits for the two tasks to complete. The code checks the return value of the wait method to check whether the task executed successfully.


As the code uses the task_handle variables only once, to call the task_group’s run method, the code can be simplified to:

//declare and execute tasks
task_group tg;
tg.run(([]{printf(“consoleWrite0n”);}));
tg.run(([]{printf(“consoleWrite1n”);}));

//wait for task completion
task_group_status status = tg.wait();


 


PPL Helper Types


In addition to the elemental task_handle and task_group types, the PPL includes three parallel helper functions that are similar to standard C++ features. The parallel_for loop is the same as the C/C++ language for loop, but the statements within the loop execute in parallel. Using that feature, you can rewrite the earlier code sample as:

parallel_for(0, 2, [](size_t ix)
{
printf(“consoleWrite%in”, ix);
});

The preceding code is functionally equivalent to:

for(int ix  = 0, ix < 2, ++ix)
{
printf(“consoleWrite%in”, ix);
};

The parallel_for type takes an initial value and a max value for the loop index, and a lambda expression to execute for each loop iteration. An overloaded version of parallel_for also lets you specify a step value other than 1.


The STL algorithm for_each applies a function object to each element within a nominated range. The PPL has an equivalent parallel_for_each which behaves the same but executes the per-element operation in parallel using tasks and task groups under the cover:

vector v;
v.push_back(1);
v.push_back(2);
v.push_back(3);

//add one to each element
parallel_for_each(v.begin(), v.end(), [](int &elem){
elem += 1;
});


The final helper function in the PPL is parallel_invoke, which executes the function objects specified in the parameter list in parallel. The October 2008 CTP of Visual Studio 2010 ships with 9 overloads that allow up to 10 function object or lambda expressions to be executed with a single call to parallel_invoke. Using this helper function, the code sample from the start of the article can be condensed into a single statement:

parallel_invoke(
([]{printf(“consoleWrite0n”);}),
([]{printf(“consoleWrite1n”);})
);

The call to parallel_invoke will not return until each of the function objects have been called and returned, so there is no need for the equivalent of the task_group.wait call. One disadvantage of the current parallel_invoke function is that it doesn’t supply a return value that you can check for exceptions that occurred during function object execution, so if it is critical to monitor the success of each parallel invocation, you need to track status using a try-catch block, recording status with an external variable. Unfortunately, that negates some of the benefits of using the streamlined parallel_invoke over the task_group type.


The Parallel Pattern Library provides a general purpose library containing methods that can execute multiple processing tasks across the available processing resources, eliminating the headache of manually allocating task execution. When combined with the TR1 enhancements to C++, particularly the ability to use lambda functions, the PPL provides a convenient and readily-usable toolkit that combines the simplicity of managed-language equivalents with the elegance and expressiveness of C++.


The PPL forms part of a larger parallel programming offering that ships with Visual C++ 2010. Future articles will cover other PPL libraries.


About the Author


Nick Wienholt is an independent Windows and .NET consultant based in Sydney. He is the author of Maximizing .NET Performance and co-author of A Programmers Introduction to C# 2.0 from Apress, and specializes in system-level software architecture and development, with a particular focus of performance, security, interoperability, and debugging.


Nick is a keen and active participant in the .NET community. He is the co-founder of the Sydney Deep .NET User group and writes technical article for Australian Developer Journal, ZDNet, Pinnacle Publishing, CodeGuru, MSDN Magazine (Australia and New Zealand Edition) and the Microsoft Developer Network. In recognition of his work in the .NET area, Nick was awarded the Microsoft Most Valued Professional Award from 2002 through 2007.

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read