An Introduction to LINQ – Part 1

By Thiru Thangarathinam

Most programmers today are required to integrate some sort of data into their applications. Often, you have to take data from multiple sources such as memory collections, relational databases, XML files, etc. With the current implementation of .NET Framework, getting to this data is often tedious and requires familiarity with multiple data access technologies and XML APIs. To make matters worse, all data sources have different means of querying the data in them: SQL for databases, XQuery for XML, LDAP queries for Active Directory etc. In short, today’s data access story lacks a unified approach to accessing data from disparate data sources, which is exactly what the LINQ (Language INtegrated Query) family of technologies are intended to solve. In this series of article, you will understand all about LINQ including the basics of LINQ to performing data access using LINQ. The first installment of this series will focus on the basics of LINQ introducing the new features of C# 3.0 and how they can be used in conjunction with LINQ.


Introduction to LINQ

The official goal of LINQ family of technologies is to add "general purpose query facilities to the
.NET Framework that apply to all sources of information, not just relational or XML data". One of
the nice things about LINQ is that it integrates seamlessly with the existing .NET languages such as
C#, VB.NET because the underlying LINQ API is just nothing but a set of .NET classes that operate
like any other .NET class. In addition, the query functionality is not just restricted to SQL or
XML data; you can apply LINQ to query any class as long as that class implements IEnumerable<T> class.

As mentioned before, LINQ is a family of technologies that provide querying features and class patterns that DLinq and XLinq. DLinq (also referred to as LINQ to SQL) is specifically the version of LINQ that focuses on querying data from relational data sources. XLinq (LINQ to XML) is that aspect of LINQ that is geared towards querying XML data. To run the code samples supplied with this article, you need Visual Studio 2005 Professional RTM as well as
LINQ May 2006 Community Technology Preview.


Simple LINQ Example

To start with, create a new Visual C# LINQ Windows Application project using Visual Studio 2005. Notice that the Visual Studio automatically adds the following LINQ namespaces.


using System.Query;
using System.Data.DLinq;

using System.Xml.XLinq;

Out of the above namespaces, System.Query namespace is the core namespace that exposes standard LINQ query operators.

As a start, open up the form add the logic that loops through an array of string using LINQ. To this end, add a command button to the form and modify its Click event to look like the following:


private void btnLoopThroughStrings_Click(object sender, EventArgs e)
{
    string[] names = {"John", "Peter", "Joe", "Patrick", "Donald", "Eric"};

    IEnumerable<string> namesWithFiveCharacters =
                                from name in names
                                where name.Length < 5
                                select name;
    lstResults.Items.Clear();
    foreach(var name in namesWithFiveCharacters)

        lstResults.Items.Add(name);
}

Note that the above code assumes that there is a list box named lstResults added to the form and this list box is meant to display the results of the query. If you were to compile and run this program, you would see the below output:

Let us walk through the code line by line.

First, you declare a string array called names.

string[] names = {"John", "Peter", "Joe", "Patrick", "Donald", "Eric"};

Then you loop through the names array using the LINQ syntax. As shown below, the new LINQ syntax code enables you to query from any type of data source in a way that is very similar to querying from a table in T-SQL.


IEnumerable<string> namesWithFiveCharacters =
                            from name in names
                            where name.Length < 5
                            select name;

The above syntax is functionally similar to the T-SQL statement in that this allows you to query the names array using the from..where..select syntax. The main difference is the use of "from name in names" syntax in the beginning and the appearance of "select name" at the end.

Essentially, you retrieve data from an array of names using the "where" clause as a filtering mechanism. It is very important to note that it you need a way to refer to the objects inside the collection so that you can use standard query operators like where with those objects. In this case, "name" is used to refer to the each object inside the names array. Now that you have understood the basics of LINQ, let us extend our scope to query collections that contain objects.


An In-Depth look at LINQ

In this section, let us look at closer look at LINQ. To really understand LINQ, you also need to understand the new language features of C# 3.0. Specifically, it would be beneficial to discuss LINQ in the context of the below features of C# 3.0:

  • Type Inference
  • Lamda Expressions
  • Extension Methods
  • Anonymous Types

The next few sections will discuss the above features in the context of leveraging them using LINQ. To get started with LINQ, all you need to do is to import the System.Query namespace. This namespace contains a number of classes that enable you to accomplish a lot with LINQ.


Type Inference

To understand type inference, let us look at couple of lines of code.


var count = 1;
var output = "This is a string";

var employees = new EmployeesCollection();

In the above lines of code, the compiler sees the var keyword, looks at the assignment to count, and determines that it should be an Int32, then assigns 1 to it. When it sees that you assign a string to the output variable, it determines that output should be of type System.String. Same goes for employees collection object. As you would have guessed by now, var is a new keyword introduced in C# 3.0 that has a special meaning. var is used to signal the compiler that you are using the new Local Variable Type Inference feature in C# 3.0.

As an example, let us modify our string query example to use the var keyword.


private void btnLoopThroughStrings_Click(object sender, EventArgs e)
{
    string[] names = {"John", "Peter", "Joe", "Patrick", "Donald", "Eric"};

    var namesWithFiveCharacters =
                            from name in names
                            where name.Length < 5
                            select name'
    lstResults.Items.Clear();
    foreach(var name in namesWithFiveCharacters)
        lstResults.Items.Add(name);
}

As the above code shows, the variable namesWithFiveCharacters now uses the type "var" instead of IEnumerable<string>. Using "var" is much more extensible since it tells the compiler to infer the type from the assignment. In this case, based on the results of the query, which is IEnumerable<string>, the compiler will automatically assume that it is a variable of type IEnumerable<string>.

If you run the code, it still produces the same output.


Lambda Expressions

C# 2.0 introduced a new feature, anonymous methods, that allows you to declare your method code inline instead of with a delegate function. Lambda expressions, a new feature in C# 3.0, have a more concise syntax to achieve the same goal. Take a closer look at anonymous methods before discussing lambda expressions. Suppose you want to create a button that displays a message box when you click it. In C# 2.0, you would do it as follows:


public SimpleForm()
{
    addButton = new Button(...);
    addButton.Click += delegate

    {
        MessageBox.Show ("Button clicked");
    };
}

As the above code shows, you can use anonymous methods to declare the function logic inline. However C# 3.0 introduces an even simpler syntax, lambda expressions, which you write as a parameter list followed by the "=>" token, followed by an expression or a statement block. Lambda Expressions are the natural evolution of C# 2.0’s Anonymous Methods. Essentially, a Lambda Expression is a convenient syntax that is used to assign a chunk of code (the anonymous method) to a variable (the delegate). As an example,

employee => employee.StartsWith("D");

In this case, the delegates used in the above query are defined in the System.Query namespace as such:


public delegate T Func<T>();
public delegate T Func<A0, T>(A0 arg0);

So this code snippet could be written as:


Func<string, bool> person = delegate (string s) {
                        return s.StartsWith("D"); };

As you can see, the lambda expression is a lot more compact than the above one. Lambda Expressions are basically just a compact version of anonymous Methods, and you can use either of them or even regular named methods when creating filters for these query operators. Lambda Expressions, though, have the benefit of being compiled either to IL or to an Expression Tree, depending on how they are used.

Note that you can also pass parameters to Lambda expressions, which can be explicitly or implicitly typed. In an explicitly typed parameter list, the type of each expression is explicitly specified. In an implicitly typed parameter list, the types are inferred from the context in which the lambda expression occurs:


(int count) => count + 1    //explicitly typed parameter
(y,z) => return y * z;      //implicitly typed parameter


Extension Methods

Previously you understood how the query operators such as StartsWith can be used with the dot notation. You might wonder where these methods come from. These methods, which reside in the System.Query.Sequence class, are part of a new feature in C# 3.0 called Extension Methods.
Extension Method is a new way of extending existing types. Basically, this works by adding a "this" modifier on the first argument. For example, the Sequence class has the Where operator defined as follows:


public static IEnumerable<T> Where<T>(
    this IEnumerable<T> source, Func<T, bool> predicate) {

        foreach (T element in source) {
            if (predicate(element)) yield return element;
        }
}

Note the use of "this" modifier on the first argument. The compiler sees this and treats it as a new method on the specified type. So now IEnumerable<T> gets the Where() method. Here it is important to remember that the explicitly defined methods in the object get the first priority. For example, if you call Where() on an object, then the compiler goes to find Where() on the object itself first. If Where() is not present, then it goes off to find an Extension Method that matches the method signature. Clearly, while this feature is cool and really powerful, extension methods should be used extremely sparingly.

Note that C# 3.0 also makes it possible to add methods to existing classes that are defined in other assemblies. All the extension methods must be declared static and they are very similar to static methods. Note that you can declare them only in static classes. To declare an extension method, you specify the keyword "this" as the first parameter of the method, for example:


public static class StringExtension
{
    public static void Echo(this string s)
    {
        Console.WriteLine("Supplied string : " + s);

    }
}

The above code shows the extension method named Echo declared in the StringExtension class. Now you can invoke the Echo method like an instance method with a string. The string is passed with the first parameter of the method.


string s = "Hello world";
s.Echo();

Based on the above code, here are the key characteristics of extension methods.

  1. Extension methods have the keyword this before the first argument
  2. When extension methods are consumed, the argument that was declared with the keyword this is not passed. In the above code, note the invocation of Echo() method without any arguments
  3. Extension methods can be defined only in a static class
  4. Extension methods can be called only on instances. Trying to call them on a class will result in compilation errors. The class instances on which they are called are determined by the first argument in the declaration, the one having the keyword this.


Using Collections in LINQ

Now that you have seen the basics of LINQ and C# 3.0, let us look at a slightly more interesting example. First, let us define a new class named Person:

public class Person
{
    private string _firstName;
    private string _lastName;
    private string _address;

    public Person(){}

    public string FirstName
    {

        get { return _firstName; }
        set{_firstName = value; }
    }

    public string LastName
    {
        get { return _lastName; }
        set { _lastName = value; }
    }

    public string Address
    {
        get{return _address; }
        set{ _address = value; }
    }
}

Now let us create a Person collection and query it using LINQ. You add a button to the form, name it btnLoopThroughObjects and modify its Click event to look like the following:

private void btnLoopThroughObjects_Click(object sender, EventArgs e)
{
    List<Person> persons = new List<Person>
        {new Person{FirstName = "Joe", LastName = "Adams", Address = "Chandler"},

        new Person{FirstName = "Don", LastName ="Alexander", Address = "Washington"},
        new Person{FirstName = "Dave", LastName = "Ashton", Address = "Seattle"},
        new Person{FirstName = "Bill", LastName = "Pierce", Address = "Sacromento"},

        new Person{FirstName = "Bill", LastName ="Giard", Address = "Camphill"}};
    var personsNotInSeattle = from person in persons
                              where person.Address != "Seattle"
                              orderby person.FirstName
                              select person;

    lstResults.Items.Clear();
    foreach (var person in personsNotInSeattle)
    {
        lstResults.Items.Add(person.FirstName + " " + person.LastName +
            " - " + person.Address);
    }

}

The above code shows off a few cool features. The first is the new C# 3.0 support for creating class instances, and then using a terser syntax for setting properties on them:


new Person{FirstName = "Dave", LastName = "Ashton", Address = "Seattle"}

This is very useful when instantiating and adding classes within a collection like above (or within an anonymous type). Note that this example uses a Generics based List collection of type "Person". LINQ supports executing queries against any IEnumerable<T> collection, so can be used against any Generics or non-Generics based object collections you already have.

After creating the collection, you loop through the collection and filter out all the persons that are not in Seattle and order the results by first name of the persons using the below query.


var personsNotInSeattle = from person in persons
                          where person.Address != "Seattle"
                          orderby person.FirstName
                          select person;

The concept in this example is rather simple. It examines all persons using a compound from clause. If the address of a person is not equal to Seattle, the method adds that person to the resulting collection. The output produced by the above code is as follows:


Anonymous Types

In the previous section, the output from the query is an array of the persons. In the query, you specify that you only want those persons that are not in Seattle. In that case, you returned an array of Person objects with each Person object containing FirstName, LastName, and Address properties.

Let us say for example, you just want all the persons that meet the criteria but only with FirstName and Address properties. This means that you need to be able to create an unknown class with these two properties programmatically on the fly. This is exactly what the Anonymous Types in C# 3.0 allows you to accomplish this. Although these types are called anonymous types, CLR does assign a name to these types. But they are just unknown to us.

For example, the below snippet of code represents the Click event of a command button returns a sequence of a new type when queried using LINQ.


private void btnLoopThroughAnonymous_Click(object sender, EventArgs e)

{
    List<Person> persons = new List<Person>
        {new Person{FirstName = "Joe", LastName = "Adams", Address = "Chandler"},
        new Person{FirstName = "Don", LastName ="Alexander", Address = "Washington"},

        new Person{FirstName = "Dave", LastName = "Ashton", Address = "Seattle"},
        new Person{FirstName = "Bill", LastName = "Pierce", Address = "Sacromento"},
        new Person{FirstName = "Bill", LastName ="Giard", Address = "Camphill"}};

    var personsNotInSeattle = from person in persons
                              where person.Address != "Seattle"
                              orderby person.FirstName
                              select new {person.FirstName,
                              person.Address};
    lstResults.Items.Clear();
    foreach (var person in personsNotInSeattle)
    {

        lstResults.Items.Add(person.FirstName + " --- " + person.Address );
    }
}

Before discussing the code in detail, here is the output produced by the above code.

The above snippet of code is very similar to previous example in that here also you examine all persons using a compound from clause and return only those persons that are not in Seattle. However one key difference is that it does not return the entire person. It returns a new type that contains two public properties: FirstName, Address. This new type was created by the compiler. Here is the definition the compiler creates:


public class ?????
{
    private string firstName;
    private string address;

    public string FirstName
    {
        get { return firstName; }

        set { firstName= value; }
    }

    public string Address
    {
        get { return address; }
        set { address = value; }
    }
}

As you can see the above class is very similar to the Person class except that it is nameless meaning that it is an anonymous type. So the return value from the query is IEnumerable<?????> and what goes as a replacement for the question mark is something that is determined by the compiler. To be able to capture this collection of anonymous return type, you need a variable that can hold any object types including the compiler created types. This is exactly why you need a var keyword. As mentioned before, the var keyword is used to declare local variables when you do not know the name of the anonymous type that the compiler created for you. Variables declared with var must be initialized at the point they are declared, because it is the only way the compiler knows what type they might be.


DLinq and XLinq

Of course, data does not just exist in .NET memory. Two other important places where you will find data are databases and XML documents. This is where DLinq (LINQ to SQL) and XLinq (LINQ to XML) shine. DLinq provides special objects in addition to the standard LINQ objects that allow querying straight from the database. DLinq allows for data mapping through a simple class based mechanism. All query expressions are then dealt with as an "expression tree", which allows any LINQ expression to be converted into a different equivalent expression such as T-SQL. Using this technique, the following C# code snippet can actually perform a native query in SQL Server:

from c in customers
    where c.LastName.StartsWith("A")
    select c

By way of an expression tree, this C# query gets translated into a valid T-SQL query which then executes on the server. C# developers never need to learn the native T-SQL syntax. Also, think of the possibilities this opens up for CLR stored procedures!

As mentioned before XLinq enables you to query XML data. Using XLinq, you can query all customers whose last name starts with an "A" in the following fashion:


from c in customerXml.Descendants("Customer")
    where c.Element("LastName").Value.StartsWith("A")
    select c

XLinq is more powerful than DLinq, because in addition to querying XML, it can also create XML. Note that this article has just given you a flavor of DLinq and XLinq and the next couple of installments of this article series will go into more details on DLinq and XLinq.


Conclusion

In this part, we looked at the current state of today’s data access story. Then we looked at how LINQ and the new language features in C# 3.0 solve these issues by providing us with a consistent set of Standard Query Operators that we can use to query any collection that implements IEnumerable<T>. In this installment, we only focused on in-memory collections of data in order to avoid the confusion that most people have when mixing LINQ with DLinq and XLinq, but rest assured these will be covered in the future installments of this article. Furthermore, because LINQ is just a set of methods that adhere to the naming conventions for the Standard Query Operators, anybody can implement their own LINQ based collections for accessing any other type of data.



More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read