Full Text Search: The Key to Better Natural Language Queries for NoSQL in Node.js
Language Integrated Query (LINQ) is a feature introduced in .NET Framework 3.5 which, helps in querying various data sources like the in-memory objects, SQL databases and XMLs. LINQ to objects is the most commonly used one across most applications. It performs the query against an in-memory collection in a sequential manner. In .NET Framework 4.0, as part of the parallel programming features, Parallel LINQ (PLINQ) has been introduced. It is a parallel implementation of querying on collections. This takes care of dividing the collection into subsets and spawning new threads for processing each set of records. It differs from the normal threading operation by running these tasks parallel on different processors available on the machine. Parallel LINQ makes maximum use of a multi-core processor. In most of the scenarios Parallel LINQ is supposed to bring a good amount of increase in the performance but in some cases a sequential operation would be a better option. I will highlight some cases where PLINQ can slow down the performance under the "Things to Consider" section later in this article.
Making LINQ Parallel
Below is a normal LINQ query, which performs the sequential processing on the list of names. You should also know that LINQ always executes in a deferred way i.e. the querying of the object actually happens only when the result set is accessed or some function is run over the result set.
var names = GetNames(); var fiveLetterNames = from name in names where name.Length == 5 select name;
var names = GetNames(); var fiveLetterNamesP = from name in names.AsParallel() where name.Length == 5 select name;
Some Important Parallel LINQ Features
In this section let us discuss some important things that can be leveraged from Parallel LINQ.
1. Maintaining the order of the result set: Since PLINQ processes the data parallel by dividing into different subsets, by default the result set will not be ordered (Ascending or Descending). You can force PLINQ to order the result set but this yields to a delay in the performance as the result sets from different threads should be buffered, combined together and then sorted. Below is the syntax to yield ordered result set.
var names = GetNames(); var fiveLetterNamesP = from name in names.AsParallel().AsOrdered() where name.Length == 5 select name;
2.Controlling the processor usage: By default PLINQ makes use of all the processors available on the machine. It can make use of up to 64 numbers. If you want to control the processor usage say only on 3 processors the PLINQ tasks should be executed then use the extension method WithDegreeOfParallelism.
var fiveLetterNamesP = from name in names.AsParallel().WithDegreeOfParallelism(3) where name.Length == 5 select name;
3. Cancellation of the query tasks: The query tasks that are running on different threads can be cancelled by issuing a cancellation token.
4. Exception Handling in PLINQ: PLINQ uses the new .NET Framework 4.0 exception type named AggregateException to bundle all the exceptions that happened in a PLINQ query and sends it to the caller.
Things to be Considered While Using PLINQ
1. The golden rule is to always compare the query performance on a machine imitating the deployment server and with a similar set of production data.
2. Avoid using AsOrdered because this may slow down the performance. This may even make PLINQ to perform slower than the sequential LINQ.
3. Do not use PLINQ on a single core machine as this would be slower than the normal LINQ most times.
4. PLINQ doesn't perform well when the collection is small and the operation delegate is light weighted. This is because PLINQ involves the extra overhead of partitioning the data, creating new threads and merging the resultant data.
I hope this article provided a good introduction to the Parallel LINQ feature of .NET Framework 4.0. I will dig deeper into the PLINQ features in future articles.