Working with Parallel LINQ

If there’s one thing that’s been added to .NET in recent years that’s made working with data an absolute delight, it’s LINQ. LINQ to XML, LINQ to Objects, and all the various LINQ to “Thing” that an army of developers all over the planet have written, and you just simply can’t avoid it.

One thing that a large number of developers doesn’t know, however, is that LINQ has a trick up its sleeve from .NET 4 onwards for dealing with large amounts of data and multiple CPUs.

Rather than get deep into writing complex code that checks for and adapts to multiple CPUs by using background workers and twisted multi-threading code, very many LINQ operations can be made to use multiple CPUs simply by adding an ‘AsParallel’ extension call to the end of the original query as the following code shows. First, let’s look at a non-parallel version:

using System;
using System.Diagnostics;
using System.Linq;

namespace blogplinq
{
   class Program
   {
      // 50 million bytes of test data
      private const int _dataSize = 50000000;
      static readonly int[] _testData =
         new int[_dataSize];

      // And a stopwatch to time what we do
      static readonly Stopwatch _stopwatch =
         new Stopwatch();

      static void Main()
      {
         // First let's fill our array with some
         // test data
         Random rand = new Random();
         for(int counter = 0; counter <
            _dataSize; counter++)
         {
            _testData[counter] = rand.Next(0, 50);
         }

         // Then start our stop watch
         _stopwatch.Start();

         // Now do a standard Linq and sum up all the
         // values in the array
         int total = (from n in _testData select n).Sum();

         // Now we stop our stop watch
         _stopwatch.Stop();

         // And show how long the operation took
         Console.WriteLine("Time to sum {0}
            integers : {1}",
            _dataSize, _stopwatch.Elapsed);

      }
   }
}

If we run this, we get something that looks like the following output at the console.

Linq1
Figure 1: Output of the preceding code

.7609870 is a long time in CPU cycles, even if it is still less than 1 second. To give things a fair comparison, I ran the program four times with the following result times:

.7609870
.7352695
.7402018
.7705166

Which, if we average them out, gives us an average of approx 0.7517437. Now, let’s change the code and add an AsParallel to the main LINQ query in our example and see if things improve any.

Change the line that reads:

int total = (from n in _testData select n).Sum();

so that it now reads:

int total = (from n in _testData.AsParallel()
   select n).Sum();

and then try running the program again.

Linq2
Figure 2: Second output of the preceding code

As you can see, the time taken has dropped slightly. As we did previously, let’s run it four times in total and average the time.

.6837836
.6833990
.6580767
.6600563

This run gives us an average of .6713289.

Confession Time

Before we go any further, a bit of a confession. It turns out that trying to test AsParallel fairly is actually quite a hard task. If the data you’re working with fits in the CPU’s cache, you’ll see little or no performance difference between the parallel and non-parallel versions unless you truly are dealing with huge amounts of data.

I actually had to reboot my machine and run the sample code a few times over the course of a couple of hours just to get a difference in the figures. It turns out that LINQ, by default, is exceptionally good at making sure the data it’s working with is in the best possible organization to work efficiently no matter what your hardware.

Using ‘AsParallel’ is, however, only the tip of the iceberg. There’s much more when you dig down, including specific parallel versions of stacks, queues, and other readymade collection objects all tailored to make appropriate use of multiple CPU hardware where possible.

It’s certainly worth using PINQeven if you are running on a machine that won’t use it, because if you do start using hardware that will support it, it’ll automatically start making your life easier.

Something about .NET you’d like to see demonstrated? Or a burning question that you’d like to know the answer to? Drop me a comment below; there’s a good chance I’ll do a post about it. Until next time happy PLINQ’ing, shawty.

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read