Introduction to LINQ, Part 1: LINQ to Objects

Perhaps the most important new feature to the next version of Visual Studio, for now code-named ‘Orcas,’ is the release of LINQ, which stands for Language INtegrated Queries. LINQ is actually a set of operators, called standard query operators, that allow the querying of any .NET sequence. LINQ comes in three sub-sets:

  • LINQ to ADO.NET, which includes the following:
    • LINQ to Entities, for querying EDM entities
    • LINQ to DataSet, for querying objects in the DataSet family
    • LINQ to SQL, for querying relational databases
  • LINQ to XML, for querying XML data
  • LINQ to objects, for querying any sequence of objects

In a series of three articles, I will introduce you to LINQ to objects, LINQ to XML, and LINQ to SQL. This first article refers to the former.

Why LINQ?

You may wonder whether LINQ was necessary. Is it actually useful? A very large number of applications deal with data sources, and the most two common sources are XML files and relational databases. However, in an application that works with relational databases, you actually deal with two languages: the first is the language the application is written in (C#, C++, Java, and so forth), and second is SQL. However, the SQL commands are always written as strings, which means it lacks two things:

  • Compile time support: You cannot know until run time whether the string is correct because the compiler treats it as a simple string.
  • Design time support: No tool offers IntelliSense support for these command strings.

LINQ addresses both these issues. Because the queries are integrated into the language, they are verified during compilation and you have IntelliSense support for them.

On the other hand, in procedural languages—such as C#, C++, and Java—you have to specify not only “what” to do, but also “how” to do that. Many times, that means a lot of code. However, if you could specify only “what” you want to do, and have the compiler or other tools decide how to do it, your work would be simpler and productivity would be increased. And that’s where LINQ steps in, because you don’t specify how a query is made; only what you want to query.

Short Overview of C# 3.0 Features

LINQ actually is based on several new features to C# 2.0 and especially 3.0. These include:

  • Lambda expressions: Express the implementation of a method and the instantiation of a delegate from that method; they have the form:
    c => c + 1

    which means a function with an argument, that returns the value of the argument incremented by one

  • Extension methods: Extend classes that you could not modify. They are defined as static methods of other classes that take at least one parameter, and the first parameter has the type of the class it extends, but preceded by the keyword this.
  • Initialization of objects and collections in expression context: Initialization previously could be done only in a statement context. Now, it could be done in an expression statement, such as when passing a parameter to a function.
  • Local types inference: The context keyword var is used as a placeholder for the type, which is inferred at compile time from the expression on the right side; it can only be used for local variables. Classes cannot have members declared with var, and functions cannot return var.
  • Anonymous types: The context keyword var also can be used to instantiate objects of types that are not explicitly defined, but created by the compiler. These anonymous types are limited to local scope.
  • Lazy evaluation: In C# 2.0, the keyword yields was introduced; this is used in loops before a return statement and delays the iteration of a source until the result is iterated. To understand what that means, I suggest reading more about it in my blog; lazy evaluation is essential for performance on LINQ queries because it avoids the generation of intermediary unnecessary results by delaying the execution until the latest possible moment, when all information about what is wanted is known.

I have only enumerated these new features. If they are unknown to you, I suggest additional readings, such as this Preview of What’s New in C# 3.0 by Sahil Malik.

Classic Approach

To understand how helpful LINQ could be, I will start with a problem approached in the classic procedural way in C#. Consider that you have a list of UEFA Champions League winners and you want to list the winners on a console. However, they should be grouped on the countries they represent, ordered descending by the number of winners from each country, and in case of the same number of winners, alphabetically by the name of the country.

Naturally, I would start by defining a class Winner, and a list of winners:

/// <summary>
/// Encapsulates information about a UEFA Champions League winner
/// </summary>
class Winner
{
   string _name;
   string _country;
   int _year;

   public string Name
   {
      get { return _name; }
      set { _name = value; }
   }

   public string Country
   {
      get { return _country; }
      set { _country = value; }
   }

   public int Year
   {
      get { return _year; }
      set { _year = value; }
   }

   public Winner(string name, string country, int year)
   {
      _name = name;
      _country = country;
      _year = year;
   }
}

/// <summary>
/// utility class with a single method returning the sequence of
/// winners
/// </summary>
class UCL
{
   /// <summary>
   /// returns a sequence of all UCL winners
   /// </summary>
   /// <returns>IEnumerable<Winner></returns>
   public static IEnumerable<Winner> GetWinners()
   {
      Winner[] winners =  {
         new Winner("Barcelona", "Spain", 2006),
         new Winner("Liverpool", "England", 2005),
         new Winner("FC Porto", "Portugal", 2004),
         new Winner("AC Milan", "Italy", 2003),
         new Winner("Real Madrid", "Spain", 2002),
         new Winner("Bayern Munchen", "Germany", 2001),
         new Winner("Real Madrid", "Spain", 2000),
         new Winner("Manchester Utd.", "England", 1999),
         new Winner("Real Madrid", "Spain", 1998),
         new Winner("Borussia Dortmund", "Germany", 1997),
         new Winner("Juventus", "Italy", 1996),
         new Winner("AFC Ajax", "Netherlands", 1995),
         new Winner("AC Milan", "Italy", 1994),
         new Winner("Olympique de Marseille", "France", 1993)
      };

      return winners;
   }
}

To list the countries according to the specified criteria, I would start by creating a dictionary with the country name as the key and a list of Winners as the value. After filling in this dictionary, I would have to re-sort the entries, ascending by the number of winners from each country. To do that, I would use a list with elements of same type as the dictionary, but sorted as specified above.

/// <summary>
/// orders the country descending by the number of titles won
/// </summary>
/// <param name="g1">first element to compare</param>
/// <param name="g2">second element to compare</param>
/// <returns>-1, 0 or 1</returns>
private static int
   CompareCoutryGroups(KeyValuePair<string, List<Winner>> g1,
                       KeyValuePair<string, List<Winner>> g2)
{
   if (g1.Value.Count > g2.Value.Count) return -1;
   else if (g1.Value.Count == g2.Value.Count)
   {
      return g1.Key.CompareTo(g2.Key);
   }
   return 1;
}

/// <summary>
/// prints the list of UCL winners grouped by countries,
/// descending by the number of title won by teams from that
/// country, and in case of same number of titles, alphabetically
/// by the country's name
/// </summary>
/// <param name="winners">sequence of Winner</param>
public void ListByCountriesClassic(IEnumerable<Winner> winners)
{
   // a dictionary is used to group the teams by the country
   // key is the country name
   // value is a list of winners from that country
   Dictionary<string, List<Winner>> dict =
      new Dictionary<string, List<Winner>>();

   // populate the dictionary with winners
   foreach (Winner w in winners)
   {
      try
      {
            dict[w.Country].Add(w);
      }
      catch (KeyNotFoundException)
      {
         dict.Add(w.Country, new List<Winner>());
         dict[w.Country].Add(w);
      }
   }

   // create a list with elements the key-value-pair from the
   // dictionary
   // the list is necessary to order the country groups by the
   // number of winners
   List<KeyValuePair<string, List<Winner>>> orderedlist =
        new List<KeyValuePair<string, List<Winner>>>();

   // populate the list
   foreach (KeyValuePair<string, List<Winner>> group in dict)
   {
      orderedlist.Add(group);
   }

   // sort the list by the specified criteria
   orderedlist.Sort(CompareCoutryGroups);

   // print the list
   foreach (KeyValuePair<string, List<Winner>> item in orderedlist)
   {
      Console.WriteLine("{0}: {1}", item.Key, item.Value.Count);

      foreach (Winner w in item.Value)
      {
         Console.WriteLine("{0}\t{1}", w.Year, w.Name);
      }
   }
}

For sorting, I defined a function called CompareCoutryGroups that takes two objects of type KeyValuePair<string, List<Winner>>, and returns -1, 0, or 1 according to the required criteria.

Of course, using that is quite trivial:

ObjectsDemo p = new ObjectsDemo();

IEnumerable<Winner> winners = UCL.GetWinners();

p.ListByCountriesClassic(winners);

and the output is:

Spain: 4
2006    Barcelona
2002    Real Madrid
2000    Real Madrid
1998    Real Madrid
Italy: 3
2003    AC Milan
1996    Juventus
1994    AC Milan
England: 2
2005    Liverpool
1999    Manchester Utd.
Germany: 2
2001    Bayern Munchen
1997    Borussia Dortmund
France: 1
1993    Olympique de Marseille
Netherlands: 1
1995    AFC Ajax
Portugal: 1
2004    FC Porto

You can see that Spain is first with 4 wins, followed by Italy with 3, and then England and Germany, both with 2 wins, but sorted ascending, alphabetically. The same for France, Netherlands, and Portugal, each with 1 win.

More by Author

Must Read