.NET Framework: Task Parallel Library Dataflow

Introduction

Coupled to the new C# Async Community Technology Preview released during PDC 2010; is a library called Task Parallel Library Dataflow (TDF). TDF is part of a growing list of technologies built on top of the Task Parallel Library (TPL) that are part of Microsoft's Technical Computing Initiative. Like many of the other Technical Computing products TDF aims to make Parallel Computing products and patterns more accessible to the Microsoft Development Community.

Like other Technical Computing products TDF builds on proven industry patterns and practices. TDF couples the Task Parallel Library to a set of classes/interfaces that leverage Message Passing to coordinate the behavior of a solution. Using a short sample application I'll demonstrate how to apply TDF.

Overview

You can download TDF from the Dataflow site http://msdn.microsoft.com/en-us/devlabs/gg585582.

Note: While I think it's important to study new products and patterns coming down the pipeline I would never use a CTP in a production application. Please remember that TDF is not a "production" grade release.

According to the TDF documentation, TDF's architecture gains it's inspiration from a number of different sources. Among the sources is the Concurrency and Coordination Runtime (CCR). TDF is a new product in CTP so there are no clear guidelines on where and when to use TDF. Good TDF candidates would likely be applications that benefit from the composition and decoupling you get with messaging and the execution benefits of TPL. So, for example, intensive workload applications like the ones you find inside of a Windows Service.

TDF is composed of a set of "Blocks". Blocks receive, send, or both receive and send messages to other Blocks. In general, the pattern for a block looks a lot like the graphic below.

Block Architecture, Source:
Figure 1: Block Architecture, Source: "An Introduction to TPL Dataflow

TDF Messages are instances of the class the particular block is configured to interact with. Messages are stored in TPL data structures.

Blocks leverage Tasks. I think of task classes as chunks of work. Blocks run tasks on a given TaskScheduler. Each Block exhibits different behavior in how it dispatches to the TaskScheduler and/or how it handles messages.

Some concrete examples will demonstrate how a few of these blocks work.

Sample Overview

The full sample code appears below.

  using System.Threading.Tasks.Dataflow;
  using System.Threading.Tasks;
  using System.Threading;
  
  namespace TPL.Test.DataFlows
  {
      class Program
      {
          static void Main(string[] args)
          {
              var writeOut = new ActionBlock<string>(s =>
              {
                  Console.WriteLine(s);
              }
  
              );
  
  
              var broadcast = new BroadcastBlock<string>(s => s.ToString());
  
              var transform = new TransformBlock<string, string>(s => s.ToUpper());
  
              var buffer = new BufferBlock<DateTime>();
  
              var join = new JoinBlock<string, DateTime>();
  
              var joinWrite = new ActionBlock<Tuple<string, DateTime>>(t => writeOut.Post(t.Item1 + " at " + t.Item2.ToString()));
  
  
              broadcast.LinkTo(transform);
  
              broadcast.LinkTo(writeOut);
  
              transform.LinkTo(join.Target1);
  
              buffer.LinkTo(join.Target2);
  
              join.LinkTo(joinWrite);
  
  
              //Begin activating everything 
  
              Task.Factory.StartNew(() =>
              {
                  while (true)
                  {
                      Thread.Sleep(2000);
                      buffer.Post(
  
                      DateTime.Now);
                  }                
              },
  
              TaskCreationOptions.LongRunning);
  
              var itr = 0;
  
              while (itr < 15)
              {
                  broadcast.Post(
  
                  "New string " + Guid.NewGuid().ToString());
  
                  Thread.Sleep(1000);
                  ++itr;
              }
  
              Console.WriteLine("Execution complete, any key to continue...");
              Console.ReadKey(); 
          }
      }
  }



.NET Framework: Task Parallel Library Dataflow

The graphic below depicts how message flow through the sample code.

[Execution.jpg]
Figure 2: Sample Execution Flow

A string is sent to a BroadcastBlock class. The BroadcastBlock dispatches the string to a TransformationBlock and an ActionBlock. The string is printed in the ActionBlock.

A separate part of the application sends a DataTime to a BufferBlock that flows to a JoinBlock. When the string leaves the TransformationBlock the string data flows to a JoinBlock associated to the DataTime BufferBlock. An ActionBlock associated with the JoinBlock receives the DateTime and the String. When data from both sides of the JoinBlock is present the JoinBlock posts a Tuple based message to the ActionBlock. The ActionBlock concatenates the string value to the DateTime and then sends the data to another ActionBlock to be written to the Console.

The remainder of the article will share more specifics on each block. Before looking at specifics though, I'm going to share features common to all blocks.

Common Features

Aside from the TPL there are underlying Interfaces behind all blocks. A complete review of the underlying interfaces is beyond the scope of this article. You'll find resources at the end of the article with more details.

Post method sends a message to a Block. Messages are queued in the block and processed first in first out. By default messages are processed one at a time, but a Block can be configured to process groups of messages.

BufferBlock and ActionBlock are two of the more basic Blocks in the library.

BufferBlock and ActionBlock

As mentioned earlier messages are queued to a collection inside of the Block. A BufferBlock simply stores the messages. When a message is Posted to an ActionBlock though; the ActionBlock dequeues the message and passes the dequeued message to the Delegate. TDF makes use of the Tuple.

A TransformBlock is like an ActionBlock, but it's a little more interesting.

TransformBlock

TransformBlock is an ActionBlock that returns a value. Like an ActionBlock a TransformBlock dequeues a mesage and executes a delegate. Unlike an ActionBlock a TransFormBlock can be a "Source" for messages as well as a "Destination". TransFormBlock illustrates another way blocks can be "linked" together.

Arguably, the most interesting block is the JoinBlock.

JoinBlock

JoinBlocks do exactly what you think they should do. Messages flowing from two or more other blocks can be interlinked producing a third message that is dispatched when data is present from both of the linked blocks. At first glance a join may not appear to be necessary. Think for a moment about how complicated the code would be without a join in a workflow-like scenario where multiple parts of your application must complete before a successive portion can continue.

Conclusion

Task Parallel Library Dataflow (TDF) is on its way to becoming a new set of classes in the .NET Framework. Built on the Task Parallel Library introduced in .NET Framework 4.0; TDF adds messaging and builds on proven compositional patterns.

Resources

TPL DataFlow Site
Parallel Computing Development Center
Soma's Dataflow Announcement
.NET Framework 4.0 Task Parallell Library vs. the Concurrency and Coordination Runtime
Introduction to TPL Dataflow





About the Author

Jeffrey Juday

Jeff is a software developer specializing in enterprise application integration solutions utilizing BizTalk, SharePoint, WCF, WF, and SQL Server. Jeff has been developing software with Microsoft tools for more than 15 years in a variety of industries including: military, manufacturing, financial services, management consulting, and computer security. Jeff is a Microsoft BizTalk MVP. Jeff spends his spare time with his wife Sherrill and daughter Alexandra.

Related Articles

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: September 17, 2014 @ 1:00 p.m. ET / 10:00 a.m. PT Another day, another end-of-support deadline. You've heard enough about the hazards of not migrating to Windows Server 2008 or 2012. What you may not know is that there's plenty in it for you and your business, like increased automation and performance, time-saving technical features, and a lower total cost of ownership. Check out this upcoming eSeminar and join Rich Holmes, Pomeroy's practice director of virtualization, as he discusses the …

  • Not long ago, security was viewed as one of the biggest obstacles to widespread adoption of cloud-based deployments for enterprise software solutions. However, the combination of advancing technology and an increasing variety of threats that companies must guard against is rapidly turning the tide. Cloud vendors typically offer a much higher level of data center and virtual system security than most organizations can or will build out on their own. Read this white paper to learn the five ways that cloud …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds