A Beginner’s Guide to PLINQ

Language Integrated Query
(LINQ) is a feature introduced in .NET Framework 3.5 which, helps in querying
various data sources like the in-memory objects, SQL databases and XMLs. LINQ
to objects is the most commonly used one across most applications. It performs
the query against an in-memory collection in a sequential manner. In .NET Framework 4.0, as
part of the parallel programming features, Parallel LINQ (PLINQ) has been
introduced. It is a parallel implementation of querying on collections. This
takes care of dividing the collection into subsets and spawning new threads for
processing each set of records. It differs from the normal threading operation
by running these tasks parallel on different processors available on the
machine. Parallel LINQ makes maximum use of a multi-core processor. In most of
the scenarios Parallel LINQ is supposed to bring a good amount of increase in
the performance but in some cases a sequential operation would be a better
option. I will highlight some cases where PLINQ can slow down the performance under
the “Things to Consider” section later in this article.

Making LINQ Parallel

Below is a normal LINQ query,
which performs the sequential processing on the list of names. You should also
know that LINQ always executes in a deferred way i.e. the querying of the
object actually happens only when the result set is accessed or some function
is run over the result set.

            var names = GetNames();
            var fiveLetterNames = from name in names
                                  where name.Length == 5
                                  select name;

In order to convert the above LINQ
to PLINQ all it needs is an extension method named AsParallel. Below is the
converted code.

     var names = GetNames();
            var fiveLetterNamesP = from name in names.AsParallel()
                                  where name.Length == 5
                                  select name;

Some Important Parallel LINQ Features

In this section let us discuss some important things that
can be leveraged from Parallel LINQ.

1. Maintaining
the order of the result set
: Since PLINQ processes the data parallel by
dividing into different subsets, by default the result set will not be ordered
(Ascending or Descending). You can force PLINQ to order the result set but this
yields to a delay in the performance as the result sets from different threads
should be buffered, combined together and then sorted. Below is the syntax to
yield ordered result set.

            var names = GetNames();
            var fiveLetterNamesP = from name in names.AsParallel().AsOrdered()
                                  where name.Length == 5
                                  select name;

the processor usage
: By default PLINQ makes use of all the processors
available on the machine. It can make use of up to 64 numbers. If you want to
control the processor usage say only on 3 processors the PLINQ tasks should be
executed then use the extension method WithDegreeOfParallelism.

var fiveLetterNamesP = from name in names.AsParallel().WithDegreeOfParallelism(3)
                       where name.Length == 5
                       select name;

3. Cancellation
of the query tasks
: The query tasks that are running on different threads
can be cancelled by issuing a cancellation token.

4. Exception
Handling in PLINQ
: PLINQ uses the new .NET Framework 4.0 exception type
named AggregateException to bundle all the exceptions that happened in a PLINQ
query and sends it to the caller.

Things to be Considered While Using PLINQ

1. The
golden rule is to always compare the query performance on a machine imitating
the deployment server and with a similar set of production data.

2. Avoid
using AsOrdered because this may slow down the performance. This may even make
PLINQ to perform slower than the sequential LINQ.

3. Do
not use PLINQ on a single core machine as this would be slower than the normal LINQ
most times.

doesn’t perform well when the collection is small and the operation delegate is
light weighted. This is because PLINQ involves the extra overhead of
partitioning the data, creating new threads and merging the resultant data.

I hope this
article provided a good introduction to the Parallel LINQ feature of .NET Framework
4.0. I will dig deeper into the PLINQ features in future articles.

Happy Reading!

More by Author

Must Read