Understanding Data Parallel in the .NET Task Parallel Library

Data Parallelism
and Task
Parallelism
are common Parallel Computing
forms. Task
Parallel Library (TPL)
bears the "Task Parallel" name, but
where does Data Parallel fit into TPL? How is Data Parallel done in the Task
Parallel Library? What does Data Parallel look like in the Task Parallel
Library? The remainder of this article will answer these questions.

Task Parallel and Data Parallel Defined

Resources at the end of the article abound with Task and
Data Parallelism definitions. Essentially, Task Parallelism is
collaboratively running parallel work. The Data Parallelism definition
is very similar with some seemingly subtle differences. Data Parallel usually
operates on a Collection and normally the same Action is applied to each member
in the collection.

Data Parallel data structures are built on top of the TPL Task
Parallel data structures. It’s a mistake to look at Data Parallel in TPL
without some understanding of the Task Parallel components underpinning TPL
Data Parallel. Also, it is difficult to grasp where an application can benefit
from Data Parallel without seeing how Data Parallel classes can be substituted
for basic TPL data structures.

Migrating code implemented with basic TPL data structures to
a TPL Data Parallel implementation is the best way to understand where and how
to apply Data Parallel.

Basic TPL Sample

Following the definition above; not all parallel
problems fit the Data Parallel mode. As stated earlier Data Parallel usually
operates on a Collection and executes the same action on each member in the
collection.

Below is sample Data Parallel code implemented using
some basic TPL classes.

            var workItems = new List<Task>();

            for (int n = 0; n < 10000; ++n)
            {
                workItems.Add(Task.Factory.StartNew(
                    index =>
                        {
                            Data[(int)index] = "This is index " + index.ToString();
                        }, n)
                    );
            }
            Task.WaitAll(workItems.ToArray());

While the sample code above compiles and runs there
are inefficiencies.

The biggest inefficiency is the code visits each
member in the array and allocates a separate Task for each member. While
allocating a Task for each member in the collection may seem sensible, consider
that the number of Cores on the machine roughly equates to the number of
concurrently executing Tasks.

Since the operation is the same on each collection
member, a better approach is to partition the data according to the number of
cores and allocate separate partitions to each Core. This would reduce the Task
allocation overhead and the need to visit each array member.

A Partitioning Improvement

One partitioning method would involve creating a
separate array of indexes and, assuming a two core machine, operating two tasks
each on a separate index partition. The code would resemble something like the
sample below.

            var workItems = new List<Task>();

            var part1 = new int[5000];
            var part2 = new int[5000];

            for (int n = 0; n < 10000; ++n)

            {
                if (n <= 4999) { part1[n] = n; }
                if (n > 4999) { part1[n - 5000] = n; }
            }


            workItems.Add(Task.Factory.StartNew(
                () =>
                {
                    foreach (var i in part1)
                    {
                        Data[i] = "This is index " + i.ToString();
                    }
                })
                );

            workItems.Add(Task.Factory.StartNew(
                () =>

                {
                    foreach (var i in part2)
                    {
                        Data[i] = "This is index " + i.ToString();
                    }
                })
                );

            Task.WaitAll(workItems.ToArray());

Since Partitioning and Task allocation would be
identical whether partitioning a string collection like in the sample above or
some other data structure, a developer could construct some set of classes to
perform the partitioning on any Collection type.

However, it would be easier to just use the Data
Parallel data structures built into TPL and let TPL Data Parallel handle all
the plumbing code.

Moving to TPL Data Parallel

The sample code below applies the Data Parallel data
structures to the same workload as the code above.

            Parallel.For(0, 10000,
                        index =>
                        {
                            Data[index] = "This is index " + index.ToString();
                        }
                    );

Aside from being more compact, the sample code above is more
readable than the code samples shown earlier in the article. In fact, the code
is virtually identical to a traditional For loop. While resembling a For loop,
the code above works nothing like a traditional For loop. Execution behavior
more closely resembles the second Task based sample above.

Another major difference between the Data Parallel sample
and the samples above is use of the iterator index value. When utilizing a
llamda; rather than passing a value to the delegate most developers prefer a
Closure.

Looking at the earlier Task sample, instead of an Action
with an object parameter; why didn’t the earlier samples reference the iterator
value? Because the Task executes after the loop has terminated, at the time the
Task executes the loop iterator will most likely be at its terminal value. In
the samples above the terminal value is 9999.

TPL Data Parallel is simply a cleaner way to do Data
Parallelism.

One major Data Parallel caveat is: like all parallel code,
concurrency friendly code comes with an overhead price.

Parallel != Better

Parallel code is not necessarily faster code. There is
overhead in the Data Parallel partitioning and in allocating the necessary TPL
data structures. The code below performs equally well or better than all of the
samples above.

             for (int n = 0; n < 10000; ++n)
             {
                 Data[n] = "This is index " + n.ToString();
             } 

The code runs faster because the workload is trivial. Time
and effort required creating Tasks and partitioning the array can outweigh the
effort required to execute the workload.

Parallel code also requires concurrency friendly handling. Developers
must be careful when modifying shared memory.

Conclusion

Data Parallelism and Task Parallelism are common Parallel
Computing forms. .NET Framework Task Parallel Library (TPL) supports both
parallelism forms. TPL Data Parallelism is built on the TPL Task Parallelism
data structures. Though TPL Data Parallelism may often resemble traditional
looping, Data Parallelism is still concurrent code and care must always be
taken with concurrent code.

Resources

"Custom
Parallel Partitioning with .NET 4.0
"

"Task Parallel
Library
"

"Data-parallelism
vs. Task-parallelism
"

"Data Parallelism
(Task Parallel Library)
"

Wikipedia Definitions

http://en.wikipedia.org/wiki/Data_parallelism

http://en.wikipedia.org/wiki/Task_parallelism

More by Author

Must Read