.NET Task Parallel Library Advanced Data Parallel

Much of the .NET
Task
Parallel Library
(TPL) Data Parallel
functionality is encapsulated in Parallel Loops. Parallel Loops are
conceptually similar to regular loops. Like a regular loop a Parallel Loop usually
operates on a collection of data, performing a computation on each element in
the collection. Controlling a parallel loop’s execution is more complicated
though.

Unlike a regular loop, Parallel loops must partition a
collection so computations can be doled out to worker Tasks. Parallel Loops
also require a developer to address concurrency issues like cancellation and
thread safe operations.

The TPL Parallel class includes methods and overloads to execute
and control Parallel loops. Wading through all the overloads, methods, and
options can take time. Luckily, the methods and overloads are variations on a
handful of core classes and concepts. What follows will introduce the TPL Data
Parallel core classes and concepts.

Data Parallel

Prior articles introduced
Data Parallel
so a complete introduction to Data Parallel is beyond the
scope of this article. However some context is important.

Data Parallel algorithms take advantage of the natural
properties of collections and the common computations applied to a collection
member. Often collection contents are uniform and independent of one another. So
as long as two of the same operations are executing on different collection
members, an operation can often be carried out in parallel.

Data Parallel algorithms often follow this pattern:

  • The collection is broken into chunks or many smaller collections.
  • Each chunk is distributed to a separate Thread or worker task.
  • Worker tasks operate on their individual chunks.
  • The whole algorithm is complete when all worker tasks are
    complete.

As stated earlier, much of the TPL Data Parallel
functionality is encapsulated in the Parallel class.

Parallel.For

For and ForEach methods comprise the Parallel class’ Data
Parallel functionality. ForEach operates over a collection. ForEach will be
addressed later in the article. For encapsulates a more general Data Parallel
algorithm. A sample For implementation doing a parallel string concatenation appears
below:

int startCount = 0;
string reportedResults = "";
var result =
    Parallel.For<string>(0, 100

    , () => { return "Start " + (++startCount).ToString(); }

    ,(i, loopState, inVal) =>
    {
        Console.WriteLine("Thread == " + Thread.CurrentThread.ManagedThreadId.ToString()
            + " At " + i.ToString() + " "
            + inVal+ "rn");

        return inVal + " " + i.ToString();
    }

    , (s) => { reportedResults = reportedResults + " -- " + s;}
);

Console.WriteLine("Reported results are " + reportedResults);
Console.WriteLine("Result was " + result.IsCompleted.ToString());
 

The For signature utilized in the sample above appears
below.

public static ParallelLoopResult For<TLocal>
(
int fromInclusive
, int toExclusive
, Func<TLocal> localInit
, Func<int, ParallelLoopState, TLocal, TLocal> body
, Action<TLocal> localFinally
);

Like a traditional For loop, the first two values define the
scope of the workload. In the sample above the code concatenates the numbers 0
to 99 together, separated by spaces. Internally, TPL determines how to divide
the workload range and allocates a Task for each range.

Each executing Task first makes a call to the localInit
parameter Func delegate. Including this parameter allows a developer to
initialize the workload. The Func delegate return value is dictated by the
For<> generic implementation. Had the sample operated on a collection the
Func could have returned a reference to the collection or a smaller segment of
a larger collection.

The body delegate parameter executes on each workload
iteration. Body delegate will be running in parallel across the Tasks allocated
to the workload. ParallelLoopState will be discussed later in the article. As
would be expected the iteration and class instance allocated in the localInit
delegate is passed to each body delegate invocation.

The localFinally delegate parameter runs when the allocated
Task has completed. This delegate would support Scatter-Gather scenario where
each Task performs some portion of the work and returns the results of its
efforts. Though somewhat unclear in the documentation; TPL appears to handle
calling the localFinally delegate in isolation.

ForEach works a lot like the For method. The biggest
difference is ForEach operates on IEnumerable classes. This difference means
that, instead of calling a localInit delegate to initialize the workload, TPL includes
overloades for a Partitioner class implementation.

Partitioner

A Partitioner intelligently divides and balances a
collection so different Tasks can operate independently on segments of the
collection. A Parallel.ForEach sample appears below.

var enumerable = new string[20] {"0","1","2","3","4"
    ,"5","6","7","8","9"
    ,"10","11","12","13","14"
    ,"15","16","17","18","19"};
var opts = new ParallelOptions();
var cancelS = new CancellationTokenSource();//read about under cancellations.

opts.CancellationToken = cancelS.Token;

var result = Parallel.ForEach<string>
    (
    Partitioner.Create<string>(enumerable,true)

    , opts

    , (s, loopState, on) =>
    {
        //How you can use Loopstate
        if (loopState.ShouldExitCurrentIteration) { loopState.Break(); }

        enumerable[on] = s + " Thread "
            + Thread.CurrentThread.ManagedThreadId.ToString()
            + " on == " + on.ToString();

    }

    );

foreach (var val in enumerable) { Console.WriteLine(val); }
Console.WriteLine("The result was " + result.IsCompleted.ToString());
 

Running the earlier For sample results may have yielded some
strange output. For method does not guarantee ordered execution. In fact a Task
allocated to the middle of the For range may have finished after the Task
handling the end of the range. Partitions can instill result ordering.

TPL includes some standard Partitions, but a developer may
need something more specialized. Collections of more complex objects may
require special handling to load balance a workload across executing Tasks. A
complete Partitioning review could fill an entire article. For a complete
review; there are good resources at the end of the article. The sample above
also demonstrates the ParallelLoopState and ParallelOptions classes.

ParallelLoopState and ParallelOptions

ParallelOptions
cancellations
are demonstrated in the ForEach sample. Cancellations allow
code external to abort a Task during or before Task execution. Including the
CancellationToken in the ParallelOptions surfaces the Cancellation through the
ParallelLoopState.ShouldExitCurrentIteration property. To leverage a
Cancellation, a Parallel Loop Body must query ShouldExitCurrentIteration at
some point during execution. ParallelLoopState includes Break and Stop methods.
As might be expected, Break and Stop halt execution. Unlike traditional
looping, however, running Tasks may be executing in parallel and in various
completion stages.

Conclusion

Task Parallel Library Data Parallel loops are encapsulated
in the Parallel class. Data Parallel loops require more complicated control
mechanism than traditional loops.

Resources

"Custom
Parallel Partitioning with .NET 4.0
"

"Task Parallel
Library
"

"Data Parallelism
(Task Parallel Library)
"

More by Author

Must Read