Implementing a Left Join with LINQ

Introduction

Oddly enough, LINQ doesn't define keywords for cross join, left join, or right join. As part of the LINQ grammar, you get join and group join. Joins can be equijoins or non-equijoins. An equijoin uses the join keyword and non-equal joins are contrived using where clauses. However, left, right, and cross joins are supported by LINQ (with a little nudge).

The two common joins are the inner join (or just join in LINQ) and the left join. Suppose you have two collections of data. One you will call the master or left collection, and the other you'll call the detail or right collection. A left join is a join whereby all of the elements from the left collection are returned and only elements from the right collection that have a correlated value in the left sequence. Usually, the correlation is a key or some kind of unique identifier. Using another analogy, if the left collection is the parent and the right is the child, a left join is all parents but only children with parents. (A right join returns orphans but no childless parents. Gotta love these computer analogies.)

In this article, I will demonstrate the group join because that's how you get to a left join. You also will see some code for LINQ to SQL that is pretty straightforward and my last article, "Search and Replace with Regular Expressions," and my upcoming book, LINQ Unleashed: for C#, cover LINQ to SQL in detail. I won't repeat that explanation here.

Defining a Group Join

A group join in LINQ is a join that has an into clause. The parent information is joined to groups of the child information. That is, the child information is coalesced into a collection and the child collection's parent information occurs only once. (The difference between a join—really an inner join—and a group join is that inner joins repeat the parent information for each child.)

The fragment in Listing 1 assumes you have a collection of orders and a collection or order details. (You do. The final listing demonstrates how to get these datum from the Northwind Traders database using LINQ to SQL.) The code demonstrates a group join followed by an array to display the parent and a nested array to display the children of each parent.

Listing 1: A group join on the Northwind Traders Orders and Order Details tables.

Dim groupJoin = (From order In orders _
                 Group Join detail In details On _
                 order.OrderID Equals detail.OrderID _
                 Into child = Group _
                 Select New With { _
                 .CustomerID = order.CustomerID, _
                 .OrderID = order.OrderID, _
                 .OrderDate = order.OrderDate, _
                 .Details = child}).Take(5)

Dim line As String = New String("-", 40)
For Each ord In groupJoin
   Console.WriteLine("{0} on {1}", ord.OrderID, _
      ord.OrderDate)
   Console.WriteLine(line)
   For Each det In ord.Details
      Console.WriteLine("Product ID: {0}", det.ProductID)
      Console.WriteLine("Unit Price: {0}", det.UnitPrice)
      Console.WriteLine("Quantity:   {0}", det.Quantity)
      Console.WriteLine("Discount:   {0}", det.Discount)
      Console.WriteLine()
   Next
      Console.WriteLine(line)
   Next

   'leftJoin.Write(Console.Out)
   Console.ReadLine()

The LINQ query starts with the anonymous variable groupJoin. (Any legal name will do here.) The clause From order in orders defines the range variable order on the collection orders. The range variable is like the iterator variable in a For loop. The clause Group Join detail in details defines the child range detail on the details sequence. The On..Equals clause describes the correlation in the equijoin. And, Into child = Group coalesces all of the child sequence data into a group. The last part Take(5) works like the TOP keyword in SQL. Take is an extension method that operates on sequences (which is what LINQ returns).

The result of the LINQ query as defined in Listing 1 is that you have a new object (called a projection) comprised of CustomerID, OrderID, and OrderDate, with a child sequence property, Details. Details is an attribute of the projection (the new type created with Select New With). The last part of the listing displays the outer data and then the grouped detail data.

Converting a Group Join to a Left Join

A group join is essentially a master detail in-memory relationship. A left join flattens out the data from the detail sequence and puts it on par with the master data. That is, where the group join has a nested detail property with its own properties, the left join will put the properties of the master and detail information as sibling properties.

The difference is that with a left join the right sequence may not have any data. You have to allow for nulls or LINQ would throw a null exception when it tried to access non-existent elements of the right sequence (Order Details in this example). You can convert a group join into a left join by adding an additional From clause and range variable on the Group and adding a call to the DefaultIfEmpty method on the group variable. The revised fragment in Listing 2 demonstrates. All of the code is provided in Listing 3.

Listing 2: A left join uses an additional From clause and range variable after the Group and invokes the DefaultIfEmpty method to handle missing children.

Dim leftJoin = (From order In orders _
   Group Join detail In details On _
   order.OrderID Equals detail.OrderID _
   Into children = Group _
   From child In children.DefaultIfEmpty _
   Select New With { _
      .CustomerID = order.CustomerID, _
      .OrderID    = order.OrderID, _
      .OrderDate  = order.OrderDate, _
      .ProductID  = child.ProductID, _
      .UnitPrice  = child.UnitPrice, _
      .Quantity   = child.Quantity, _
      .Discount   = child.Discount}).Take(5)

Notice that the projection in Listing 2 defines elements from Orders and Order Details as siblings in the new projected type. Here is the complete listing and some additional code for looking at the object state.

Implementing a Left Join with LINQ

Listing 3: All of the code to reproduce the data and run the sample.

Imports System.Data.Linq
Imports System.Data.Linq.Mapping
Imports System.IO


Module Module1

   Public connectionString As String = _
      "Data Source=BUTLER;Initial Catalog=Northwind;" + _
      "Integrated Security=True"


   Sub Main()
      ' Use LINQ to SQL to get the data - context represents
      ' the database
      Dim orderContext As DataContext =
         New DataContext(connectionString)
      Dim detailsContext As DataContext =
         New DataContext(connectionString)

      ' generic table does the ORM association
      Dim orders As Table(Of Order) =
         orderContext.GetTable(Of Order)()
      Dim details As Table(Of OrderDetail) =
         orderContext.GetTable(Of OrderDetail)()

      Dim allDetails = From detail In details _
                       Select detail

      For Each d In allDetails
         Console.WriteLine(d.ProductID)
      Next

      Console.ReadLine()

      ' make sure we have some data
      orders.Write(Console.Out)
      details.Write(Console.Out)

      ' define the left join - a group join with a twist
      Dim leftJoin = (From order In orders _
                      Group Join detail In details On _
                      order.OrderID Equals detail.OrderID _
                      Into children = Group _
                      From child In children.DefaultIfEmpty _
                      Select New With { _
                      .CustomerID = order.CustomerID, _
                      .OrderID = order.OrderID, _
                      .OrderDate = order.OrderDate, _
                      .ProductID = child.ProductID, _
                      .UnitPrice = child.UnitPrice, _
                      .Quantity = child.Quantity, _
                      .Discount = child.Discount}).Take(5)


      leftJoin.Write(Console.Out)
      Console.ReadLine()
   End Sub

   Function WriteLine(ByVal obj As Object) As Object
      Console.WriteLine(obj)
      Return Nothing
   End Function

   <System.Runtime.CompilerServices.Extension()> _
   Public Function Write(Of T)(ByVal obj As T, _
                               ByVal writer As TextWriter)

      If (TypeOf obj Is IEnumerable) Then
         Dim list As IEnumerable = obj
         For Each item In list
            Write(item, writer)
         Next
      End If

      Dim formatted = From info In obj.GetType().GetFields() _
                      Let value = info.GetValue(obj) _
                      Select New With {.Name = info.Name, _
                      .Value = IIf(value Is Nothing, "", value)}

      If (formatted.Count > 0) Then
         For Each one In formatted
            writer.WriteLine(one)
         Next
      Else
         Dim alternate = From info In obj.GetType().GetProperties() _
            Let value = info.GetValue(obj, Nothing) _
            Select New With {.Name = info.Name, _
            .Value = IIf(value Is Nothing, "", value)}
         For Each one In alternate
            writer.WriteLine(one)
         Next

      End If

      writer.WriteLine()

      Return Nothing
   End Function


End Module



<Table(Name:="Orders")> _
Public Class Order
   <Column()> _
   Public OrderID As Integer
   <Column()> _
   Public CustomerID As String
   <Column()> _
   Public EmployeeID As Integer
   <Column()> _
   Public OrderDate As DateTime
   <Column()> _
   Public ShipCity As String
End Class


<Table(Name:="Order Details")> _
Public Class OrderDetail
   <Column()> _
   Public OrderID As Integer
   <Column()> _
   Public ProductID As Integer
   <Column()> _
   Public UnitPrice As Decimal
   <Column()> _
   Public Quantity As Int16
   <Column()> _
   Public Discount As Single
EndClass

Summary

A left join is generally of the records in one set and only those records in the other set that are correlated to the records in the first set. I use the word records out of habit synonymously with objects. (Although in your example, rows of database data were used.) Although LINQ has no left join key phrase, the left join is supported through a group join and the DefaultIfEmpty method.

DefaultIfEmpty provides a default object when there are no child objects. Default child objects are necessary because LINQ supports defining a projection from parent and child objects, but in the left join, again, there may be no child and the object would in effect be null.

See you next month, same bate time, same bat channel.

About the Author

Paul Kimmel is the VB Today columnist for www.codeguru.com and has written several books on object-oriented programming and .NET. Check out his upcoming book, LINQ Unleashed for C#, due in July 2008. Paul Kimmel is an Application Architect for EDS. You may contact him for technology questions at pkimmel@softconcepts.com.

If you are interested in joining or sponsoring a .NET Users Group, check out www.glugnet.org. Glugnet opened a users group branch in Flint, Michigan in August 2007. If you are interested in attending, check out the www.glugnet.org web site for updates.

Copyright © 2008 by Paul T. Kimmel. All Rights Reserved.



Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • QA teams don't have time to test everything yet they can't afford to ship buggy code. Learn how Coverity can help organizations shrink their testing cycles and reduce regression risk by focusing their manual and automated testing based on the impact of change.

  • On-demand Event Event Date: September 10, 2014 Modern mobile applications connect systems-of-engagement (mobile apps) with systems-of-record (traditional IT) to deliver new and innovative business value. But the lifecycle for development of mobile apps is also new and different. Emerging trends in mobile development call for faster delivery of incremental features, coupled with feedback from the users of the app "in the wild." This loop of continuous delivery and continuous feedback is how the best mobile …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds