Understanding Tasks in .NET Framework 4.0 Task Parallel Library

Introduction

Microsoft has made a significant investment in Parallel computing features with Microsoft Visual Studio 2010 and the .NET Framework 4.0. Often the best way to understand new features is to look at core feature components and then find the core features or concepts within; proceeding deeper and finding a set of components that seem to be everywhere. The Task Parallel Library (TPL) is a core part of parallel computing in the .NET Framework. The Task class is the heart of TPL. I'm going to explain how a developer can use the Task class to leverage TPL.

A New Model

While a developer could leverage the TPL through features that existed in prior versions of Microsoft Visual Studio and the .NET Framework, truly getting all the benefits of TPL requires architectural approaches that may be new to many .NET developers.

I think of TPL as following a new kind of model that simply extends familiar concepts like threads, delegates, and the Threadpool. Models follow prescriptive patterns to solve problems and work in a particular context. The first step along the path to understanding the new model is becoming acquainted with the ideas behind the new model. The basic ideas are as follows:

  • Delegates are references to functions. Delegates can be passed around an application without passing a full reference to a class. A developer can control the flow of an application by organizing and invoking data structures containing delegates.
  • Delegates can be passed to Threads in the ThreadPool or to Threads created in the application. Developers use multiple threads to improve application performance and responsiveness.
  • Because threads can run concurrently and often data structures are shared by two concurrently running threads, executing delegates must control how they access shared data structures. An executing delegate must signal or lock a shared data structure before changing the data structure.
  • Locking data structures and creating too many Threads can introduce bottlenecks and/or consume resources in an application and can defeat the purpose of multi-threading.

To really grasp how to apply TPL requires code. For the remainder of the article I'm going to walk through some sample TPL code that demonstrates how Task works with other TPL classes.

TPL Core and Sample Code

There are two namespaces that encapsulate many of the TPL classes: System.Threading.Tasks and System.Collections.Concurrent. Here is a list of some of the classes you'll find in the namespaces.

  • Task
  • TaskScheduler
  • TaskFactory
  • TaskCancelledException
  • BlockingCollection<T>

The code below taken from page 55 in the "Patterns of Parallel Programming" whitepaper on the Microsoft Parallel Computing web site; utilizes classes from each namespace.

static void ProcessFile(string inputPath, string outputPath)
{
    var inputLines = new BlockingCollection<string>();
    var processedLines = new BlockingCollection<string>();
    // Stage #1
    var readLines = Task.Factory.StartNew(() =>
    {
        try
        {
            foreach (var line in File.ReadLines(inputPath)) inputLines.Add(line);
        }
        finally { inputLines.CompleteAdding(); }
    });
    // Stage #2
    var processLines = Task.Factory.StartNew(() =>
    {
        try
        {
            foreach (var line in inputLines.GetConsumingEnumerable()
            .Select(line => Regex.Replace(line, @"\s+", ", ")))
            {
                processedLines.Add(line);
            }
        }
        finally { processedLines.CompleteAdding(); }
    });
    // Stage #3
    var writeLines = Task.Factory.StartNew(() =>
    {
        File.WriteAllLines(outputPath, processedLines.GetConsumingEnumerable());
    });
    Task.WaitAll(readLines, processLines, writeLines);
}

As you can see, this is an interesting piece of code. It looks sequential, yet the subject of this article is parallel computing. If you've seen or written multithreaded applications before, you're probably not used to seeing a program structured this way. TPL hides much of the ugliness you've seen in multithreaded and asynchronous code.

Now let's break down how this code executes. As promised earlier in the article, I'm starting with the role of the Task class.

Understanding Tasks in .NET Framework 4.0 Task Parallel Library

Tasks

A developer can "new" a Task or utilize a TaskFactory to instantiate a Task class. Tasks wrap an Action delegate's execution. Invoking interdependent Action delegates asynchronously requires some coordination. An Action delegate can execute asynchronously on the current thread or any number of other possible threads. So, instead of specifying the execution thread, a developer organizes the delegates (work), declares the interdependencies of the work, and let's a group of other TPL classes carry out the execution. Task class methods reveal groups of method overloads, each with a sort of execution coordination theme. Below are the groupings:

  • Start - instructs a Task to schedule the Action to begin execution
  • Wait - blocks until the Action completes execution and the Task receives notification that the Action has completed.
  • WaitAll - works similar to a Wait, but blocks on a collection of Tasks, waiting for all Tasks to complete
  • WaitAny - works similar to a Wait, but blocks until one of the Tasks in the collection completes.
  • ContinueWith - executes another Task when a Task receives notification that an Action has completed, allows a developer to chain dependent Tasks together. Some of the documentation on this calls it a Continuation.

Task creation and execution is in the example appears below:

    var readLines = Task.Factory.StartNew(() =>
    {
        try
        {
            foreach (var line in File.ReadLines(inputPath)) inputLines.Add(line);
        }
        finally { inputLines.CompleteAdding(); }
    });

The example utilizes a TaskFactory. TaskFactory simplifies creating Tasks that follow a particular convention. In the example, the selected TaskFactory convention is to create a new Task with the default TaskScheduler and call the Task Start method.

A TaskScheduler manages queuing Tasks to run on a set of Threads. TaskScheduler works a lot like the ThreadPool, only developers can create their own TaskScheduler to handle different types of coordination or workloads and swap out the default for a custom TaskScheduler.

Example code also demonstrates using an anonymous method to improve readability.

Best of all though, the example demonstrates how a developer can compose interdependent delegates into Task objects and coordinate using concurrent-safe data structures to move execution from one Task to the next until the whole sample code completes.

BlockingCollection

Here is another copy of the sample code presented earlier in the article.

static void ProcessFile(string inputPath, string outputPath)
{
    var inputLines = new BlockingCollection<string>();
    var processedLines = new BlockingCollection<string>();
    // Stage #1
    var readLines = Task.Factory.StartNew(() =>
    {
        try
        {
            foreach (var line in File.ReadLines(inputPath)) inputLines.Add(line);
        }
        finally { inputLines.CompleteAdding(); }
    });
    // Stage #2
    var processLines = Task.Factory.StartNew(() =>
    {
        try
        {
            foreach (var line in inputLines.GetConsumingEnumerable()
            .Select(line => Regex.Replace(line, @"\s+", ", ")))
            {
                processedLines.Add(line);
            }
        }
        finally { processedLines.CompleteAdding(); }
    });
    // Stage #3
    var writeLines = Task.Factory.StartNew(() =>
    {
        File.WriteAllLines(outputPath, processedLines.GetConsumingEnumerable());
    });
    Task.WaitAll(readLines, processLines, writeLines);
}

I think the role of Task is somewhat intuitive, but the role of BlockingCollection may be new to many .NET developers. The graphic below depicts how data flows from inputLines, into processedLines, and then is written to a file.

Keep in mind that all of the steps to the right execute concurrently and the foreach statements sort of pull the data through the segments of code. So, for example, as inputLines are being added at the top, processedLines are being pulled from the collection and written to an output file.

BlockingCollection sports a number of features which make it ideal for the type of work depicted above.

As I mentioned earlier, locking, signaling, and other thread safe features are important for shared data structures.  BlockingCollection handles all of this in a thread-safe and, just as important, an efficient manner. Second, GetConsumingEnumerable method returns an IEnumerable<T> interface.  So, as you can see in the sample code above, BlockingCollection plays well with LINQ and foreach statements.

[TaskToBlocking.jpg]
Figure 1: Task to BlockingCollection

More?

I glazed over many of the advanced Task class features. Some of those features are described below.

Asynchronous Programming Model (APM) is an integral to application responsiveness. APM is used throughout the .NET Framework. The ParallelExtensionsExtras sample contains support for much of the .NET APM functionality plus patterns not already encapsulated by TPL classes.

There are a set of classes for handling Task cancellation. When employed cancellation can be handled similar to how you handle exceptions using try{} catch{} statements.

Usually when Microsoft adds new features to the .NET Framework, tooling comes in a later release. In .NET 4.0 and Visual Studio 2010 release Microsoft has added framework features and extensive tooling.

Finally, the parallel computing documentation is clear, well written, and extensive. I recommend visiting the sites from the sources below. You'll find everything from samples to patterns and guidelines.

Conclusion

If developers are going master multi-cores they'll first need to master new parallel computing techniques and data structures. Task Parallel Library(TPL) is the core part of parallel computing in the .NET Framework. Mastering TPL begins with understanding the Task class.

Resources

Parallel Computing Development Web site, you'll find everything you need here.
Patterns of Parallel Programming: Undersanding and appliying Parallel Patterns with the .NET Framework 4.0 and Visual C#
Parallel Programming Samples for .NET Framework 4.0
A Tour of ParallelExtensionsExtras
Optimize Managed Code For Multi-Core Machines
Parallel Computing Technical Articles on the Microsoft Parallel Computing development center
PDC 2009 Videos
TPL Documentation

Related Articles



About the Author

Jeffrey Juday

Jeff is a software developer specializing in enterprise application integration solutions utilizing BizTalk, SharePoint, WCF, WF, and SQL Server. Jeff has been developing software with Microsoft tools for more than 15 years in a variety of industries including: military, manufacturing, financial services, management consulting, and computer security. Jeff is a Microsoft BizTalk MVP. Jeff spends his spare time with his wife Sherrill and daughter Alexandra.

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Today's competitive marketplace requires the organization to frequently release and deploy applications at the pace of user demands, with reduced cost, risk, and increased quality. This book defines the basics of application release and deployment, and provides best practices for implementation with resources for a deeper dive. Inside you will find: The business and technical drivers behind automated application release and deployment. Evaluation guides for application release and deployment solutions. …

  • Live Event Date: April 22, 2014 @ 1:00 p.m. ET / 10:00 a.m. PT Database professionals — whether developers or DBAs — can often save valuable time by learning to get the most from their new or existing productivity tools. Whether you're responsible for managing database projects, performing database health checks and reporting, analyzing code, or measuring software engineering metrics, it's likely you're not taking advantage of some of the lesser-known features of Toad from Dell. Attend this live …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds