Surveying Parallel Computing in .NET Framework 4.0

"I hope you don't mind that I put down in words, How wonderful life is while you're in the world."
--Bernie Taupin

Introduction

Multithreading or parallelism is a big topic. There are a lot of issues to consider: what is available, how much parallelism should be employed, what performance benefits will I achieve, and what is the added cost of debugging? All of this complexity suggests that developers need higher levels of abstraction to make using multi-core and multi-processor computers and parallelism easier. That is what Microsoft has done with .NET framework 4.0 with the Task Parallelism Library (TPL). One still needs to consider things like how much is too much, best practices, and what are the debugging costs, but with higher abstractions at least using parallelism is technically easier.

Prior to .NET framework 4.0 we had asynchronous calls, the thread pool, lower level threading, worker controls, and the parallel extensions that could be downloaded. Now higher levels of abstraction and new tools have been incorporated into the framework, but learning about all these tools is a big job.

In this article I will start with a pretty basic aspect of the TPL, for loops. The example compares a sequential for loop, a threaded for loop, and the new Partitioner class that is sort of a hybrid of sequential and parallel processing. For loops are bread and butter constructs, so it is a good place to start.

Implementing a Sequential For Loop

Using a simple for loop is an easy task. Write the for loop and process the data. You don't have to worry about thread issues, they are easy to debug, and in many cases the performance is probably not a big factor. The for loop demo in Listing 1 is to establish a baseline for the other two parts of this article.

Const max As Integer = 1000
Dim numbers = Enumerable.Range(0, max).ToArray()

' Sequential loop
Dim stopwatch As Stopwatch = New Stopwatch()
stopwatch.Start()

For i As Integer = 0 To numbers.Count - 1
  numbers(i) = Math.Pow(numbers(i), 2)
Next

stopwatch.Stop()
Console.WriteLine("Elapsed sequental time {0}", stopwatch.Elapsed)
Listing 1: Writing a simple for loop to square an array of integers.

In the sample the Stopwatch class is used to track how long the sequential loop takes to process. The for loop accesses the array of integers and uses Math.Pow to square each integer in the array. Simple, no worries. On my PC the example ran in about 1 millisecond.

Implementing a Parallel For Loop

To use Parallel.For you need to import System.Threading.Tasks. Parallel is has static members and accepts the inclusive and exclusive range variables--the loop extent--and a generic delegate, Action(Of T) that can be satisfied with a Lambda expression, to perform the action on the data-see Listing 2.

Imports System.Threading.Tasks

Module Module1

    Sub Main()
      Const max As Integer = 1000
      Dim numbers = Enumerable.Range(0, max).ToArray()

      Dim stopwatch As Stopwatch = New Stopwatch()
      stopwatch.Start()

      Parallel.For(0, numbers.Count, Sub(i)
        numbers(i) = Math.Pow(numbers(i), 2)
      End Sub)

      stopwatch.Stop()
      Console.WriteLine("Elapsed parallel time {0}", stopwatch.Elapsed)

…

In Listing 2 iteration starts at 0 and goes up to but doesn't include the numbers.Count. The second argument, the exclusive range variable is equivalent to Count - 1. The Lambda expression starting with the keyword Sub performs the action. (In the example a multi-line Lambda sub is used. This is also new in .NET framework 4.0.)

The performance in the second example is much slower, here is why. For each iteration the delegate satisfied by the Lambda expression is called on a different thread. This means the thread is set up, the delegate is called, and then all of that infrastructure is deconstructed, or approximately that is what happens.

If you move the action to a delegate containing a breakpoint--see the fragment in Listing 3--and Figure 1, you can see that there aren't a thousand threads because threads are reused but the threads do pile up. Spinning up all of those workers and partitioning the worker threads for each delegate call take time; consequently the performance of the code in Listing 2 or 3--depending on how you write the code--is much slower.

Dim action As Action(Of Integer) =
  Sub(i)
    numbers(i) = Math.Pow(numbers(i), 2)
  End Sub

Parallel.For(0, numbers.Count, action)
Listing 3: This fragment is equivalent to shorter Parallel.For in Listing 2.


Figure 1: Threads being created for the code fragment in Listing 3

The caveat is that multithreading is available but doesn't always help you, especially in behaviors that are so simple. The next section demonstrates how to theoretically speed up small loops by combining sequential behaviors with parallelism.

Surveying Parallel Computing in .NET Framework 4.0

Using a Partitioner to Speed Up Smaller Tasks in Parallel

In Listings 2 and 3 each iteration happened on a different thread. The result is a lot of internal work managing threads. The Partitioner class breaks up data into partitions and calls a delegate on a range of values. The intended result is that the delegate call happens for each partition of data rather than for each element. In theory the way the Partitioner helps is that there are fewer threads and fewer thread calls. In Listings 2 and 3 the delegate is called for each item--basically a thousand partitions. By using the Partitioner class the data may be broken into ten partitions containing ten items each resulting in 100 rather than a thousand delegate calls. Listing 4 demonstrates how to use the Partitioner class.

Imports System.Threading.Tasks
Imports System.Collections.Concurrent

Module Module1

    Sub Main()
      Const max As Integer = 1000
      Dim numbers = Enumerable.Range(0, max).ToArray()

      Dim partition = Partitioner.Create(0, numbers.Count)
      
      Parallel.ForEach(partition, Sub(range, loopState)
        For i As Integer = range.Item1 To range.Item2 - 1
          numbers(i) = Math.Pow(numbers(i), 2)
        Next
      End Sub)

      stopwatch.Stop()
      Console.WriteLine("Elapsed parallel/partitioner time {0}", stopwatch.Elapsed)

      Console.ReadLine()
    End Sub

End Module
Listing 4: Using the Partitioner class to reduce the number of delegate calls and process chunks of the data into partitions but processing each chunk sequentially.

The Partitioner class is defined in the System.Collections.Concurrent namespace. Partitioner.Create creates the logical partitions of data. For example, if the Partitioner creates partitions of data containing 8 elements in each partition then in the example the delegate is called 125 times instead of 1,000. Parallel.ForEach accepts the Partitioner instances which is an instance of OrderablePartition(Of Tuple(Integer, Integer)) type in this instance. The lambda expression receives the range specified by the particular partition, and a loopstate which is not used in this example. Each multiline Lambda expression uses the range value to just process the data defined by the logical partition.

Again the processing of the data is so simple in this example that the code in Listing 1 routinely outperforms both of the Parallel examples. That is important to know. Use a couple basic rules of thumb when considering parallelism: identify fragments of code that might be causing a bottleneck, apply parallelism, and check whether it helps. Don't assume parallelism helps, and remember that debugging the code and checking performance improvements will add to your overhead--the time to implementation. The good news is that using threaded behavior is getting easier and a little less error prone.

Summary

There is no doubt that parallelism helps substantially in some instances. There is also no doubt that while getting easier, taking advantage of multi-core and multi-processor machines by using threading adds development costs, especially in terms of debugging. When employed judiciously, when you can determine and verify that parallelism will help, it can make a significant improvement in your applications performance. The key is to understand parallelism as much as you can, test your assertions about where it will help and then verify them.

Further reading on parallelism might include reading about Amdahl's law to better understand what you can expect from parallel performance, learning about multi-core versus multi-processor machines, and exploring guidelines and principles for safely using parallelism. Since this is a big topic you can check this column for more examples on parallelism in .NET framework 4.0.



About the Author

Paul Kimmel

Paul Kimmel is the VB Today columnist for CodeGuru and has written several books on object-oriented programming and .NET. Check out his upcoming book Professional DevExpress ASP.NET Controls (from Wiley) now available on Amazon.com and fine bookstores everywhere. Look for his upcoming book Teach Yourself the ADO.NET Entity Framework in 24 Hours (from Sams). You may contact him for technology questions at pkimmel@softconcepts .com. Paul Kimmel is a Technical Evangelist for Developer Express, Inc, and you can ask him about Developer Express at paulk@devexpress.com and read his DX blog at http:// community.devexpress.com/blogs/paulk.

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: November 20, 2014 @ 2:00 p.m. ET / 11:00 a.m. PT Are you wanting to target two or more platforms such as iOS, Android, and/or Windows? You are not alone. 90% of enterprises today are targeting two or more platforms. Attend this eSeminar to discover how mobile app developers can rely on one IDE to create applications across platforms and approaches (web, native, and/or hybrid), saving time, money, and effort and introducing apps to market faster. You'll learn the trade-offs for gaining long …

  • Live Event Date: October 29, 2014 @ 11:00 a.m. ET / 8:00 a.m. PT Are you interested in building a cognitive application using the power of IBM Watson? Need a platform that provides speed and ease for rapidly deploying this application? Join Chris Madison, Watson Solution Architect, as he walks through the process of building a Watson powered application on IBM Bluemix. Chris will talk about the new Watson Services just released on IBM bluemix, but more importantly he will do a step by step cognitive …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds