Using Projections to Boost Performance

Introduction to Entity Framework

Welcome to this installment of the .NET Nuts & Bolts column. Some of my prior .NET Nuts and Bolts columns have provided an introduction and examples with the Entity Framework (see Introducing the Entity Framework and Joins and UI Binding with the Entity Framework). This article is going to build on those columns and focus specifically on using projections to ensure you keep your query foot print smaller to help with optimal performance.

Querying Against the Model

The Entity Framework makes it a snap to write a LINQ statement that retrieves data from your model. Remember, while it looks like inline SQL, it is not. I used the sample Northwind database and generated a simple model. The following example code demonstrates querying the Customers table in the Northwind model for customers where the city of record is London. All of the matching results are returned and then written to the console. Not very exciting, but it serves the purpose to demonstrate how easy it can be.

  using (NorthwindEntities context = new NorthwindEntities())
  {
  var results = from customers in context.Customers
                  	  where customers.City == "London"
                  	  orderby customers.CompanyName
                  	  select customers;
  
  foreach (var customer in results)
  {
    Console.WriteLine("Company: {0}, contact: {1}, primary phone: {2}", 
  customer.CompanyName, 
  customer.ContactName, 
  customer.PrimaryPhone);
  }
  
  // Pause to see the output
  Console.ReadLine();
  }

The above code is a very simple example that grabs all of the customers from London and outputs some of the columns of data. However, it also illustrates an issue I've seen happen time and time again with LINQ statements. In the example above it is retrieving the entire customer record, but only uses a few of the fields. Because they are so easy to write and powerful, it can easily be overlooked as to what data is actually needed and just grab all of it. Comparatively when writing stored procedures folks tend to be more choosey about the data columns included in the output. In this case it results in a dozen or so additional fields being returned, but this can be significant depending upon the data columns involved.

Projecting Results

LINQ syntax includes the ability to shape the output format as desired. It is done through the "select" syntax that appears at the end of the LINQ statement. Rather than just "select customers", we use an expanded statement with "select new { ... };" that allows us to specify the shape of the output, which is called projecting. The sample code below is a refactoring of the original example with the change being selecting a limited set of columns rather than all the data.

  using (NorthwindEntities context = new NorthwindEntities())
  {
  var results = from customers in context.Customers
                  	  where customers.City == "London"
                  	  orderby customers.CompanyName
                  	  select new {
  	customers.CompanyName,
  	customers.ContactName,
  	customers.PrimaryPhone
  };
  
  foreach (var customer in results)
  {
    Console.WriteLine("Company: {0}, contact: {1}, primary phone: {2}", 
  customer.CompanyName, 
  customer.ContactName, 
  customer.PrimaryPhone);
  }
  
  // Pause to see the output
  Console.ReadLine();
  }

Importance of Projecting Results

A dozen extra columns in the previous example may not seem like much, but can be very significant especially if those columns include data that is of a large data type. For example, I have a number of applications that allow a user to upload a data file in to a web farm. The data file is stored in to a central database for a variety of reasons I won't go in to here, but for the sake of the example know there is a column with VARBINARY(max) that stores the raw file. Additionally, there is another VARBINARY(max) that stores a version of the file that has been converted to a PDF format. There is a utility program that grabs all of the newly added data files and does the conversion. The example below demonstrates roughly what an early version of the conversion utility looked like.

  using (WIN_DataEntities db = new WIN_DataEntities())
  {
    // Get the documents that need to be converted
    var q = from rd in db.ReviewDocument
            where rd.dt_PDFConverted.HasValue == false
            select rd;
  
    foreach (ReviewDocument doc in q)
    {
  if (doc.vc_FileExtension != string.Empty && 
  doc.vb_FileContents != null && 
  doc.vb_FileContents.Length > 0)
  {
    doc.vb_PDFContents = Converter.Convert(
  doc.vb_FileContents, 
  doc.vc_FileExtension);
                              
           doc.dt_PDFConverted = DateTime.Now;
    	}
  
  db.SaveChanges();
    }
  }

There are a couple of challenges with the early version. The first is that it grabs all documents recently added and includes the document content in what it grabs. If there are 100 new documents, this means the query will retrieve the content for all 100 of them before doing any processing. That's a lot to load across the network and in to memory for what we're doing. The second issue is the structure of the processing makes it hold on to records for a very long time and issues a large update statement at the end. The refactored version of the code is below. The first difference is rather than grabbing all of the documents a projection is used to get a list of the document ids for what needs to be converted. Then one at a time the documents are retrieved from the database, converted, and the record is updated and written back to the database. This keeps things at a much smaller unit of work. Yes, there are more calls to connect to the database, but we rely on connection pooling to help minimize that impact. The smaller unit of work means the conversion process has a much smaller resource footprint and faster execution time.

  // Build the list of ids of documents that need to be converted
  List<long> fileIds = new List<long>();
  using (WIN_DataEntities db = new WIN_DataEntities())
  {
    fileIds = db.ReviewDocument.Where(rd => rd.dt_PDFConverted.HasValue == false)
  .Select(r => r.in_Id).ToList();
  }
  
  // Process through the list one at a time 
  ReviewDocument doc = null;
  foreach (long docId in fileIds)
  {
    using (WIN_DataEntities db = new WIN_DataEntities())
    {
  doc = db.ReviewDocument.First(rd => rd.in_Id == docId);
  
  if (!string.IsNullOrEmpty(doc.vc_FileExtension) && 
             doc.vb_FileContents != null &&
             doc.vb_FileContents.Length > 0)
         {
    doc.vb_PDFContents = Converter.Convert(
  doc.vb_FileContents, 
  doc.vc_FileExtension);
  	}
  
  doc.dt_PDFConverted = DateTime.Now;
  db.SaveChanges();
    }
  }

Summary

We looked at the ability to shape the output results from LINQ statements and the syntax for projecting custom results. We explored an example using data files stored in a database to highlight the benefit of being choosey when retrieving data values and keeping units of work small.

Future Columns

The topic of the next column is yet to be determined. If you have something else in particular that you would like to see explained here you could reach me at mark.strawmyer@crowehorwath.com.

Related Articles



About the Author

Mark Strawmyer

Mark Strawmyer is a Senior Architect of .NET applications for large and mid-size organizations. He specializes in architecture, design and development of Microsoft-based solutions. Mark was honored to be named a Microsoft MVP for application development with C# for the fifth year in a row. You can reach Mark at mark.strawmyer@crowehorwath.com.

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: October 29, 2014 @ 11:00 a.m. ET / 8:00 a.m. PT Are you interested in building a cognitive application using the power of IBM Watson? Need a platform that provides speed and ease for rapidly deploying this application? Join Chris Madison, Watson Solution Architect, as he walks through the process of building a Watson powered application on IBM Bluemix. Chris will talk about the new Watson Services just released on IBM bluemix, but more importantly he will do a step by step cognitive …

  • Live Event Date: October 29, 2014 @ 1:00 p.m. ET / 10:00 a.m. PT It's well understood how critical version control is for code. However, its importance to DevOps isn't always recognized. The 2014 DevOps Survey of Practice shows that one of the key predictors of DevOps success is putting all production environment artifacts into version control. In this eSeminar, Gene Kim will discuss these survey findings and will share woeful tales of artifact management gone wrong! Gene will also share examples of how …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds