DLINQ: Bridging the Object Relational Divide

Are you a DataSet person or more of a business object person? What an awful choice to make! On the DataSet side, you lose the intelligence that you can stuff into your business objects. On the business object side, you lose searching, sorting, and other basic abilities to work with your data that the .NET Framework provides in the System.Data namespace.

Fortunately, Microsoft will soon offer an alternative: LINQ (Language Integrated Query), and specifically for database-related operations, DLINQ. DLINQ is the component of the LINQ project that provides a run-time infrastructure for managing relational data as objects—without giving up the ability to query.

Ostensibly, DLINQ provides this solution by giving you the representation of a strongly typed database. In other words, if you had a customers table, you could create a class very similar to the following:

[Table(Name="Customers")]
public class Customer
{
   [Column(Id=true)]
   public string CustomerID;

   [Column]
   public string City;
}

This class, with some other work, then would allow you to write queries in your .NET code as follows:

var q =
   from c in db.Customers
   where c.City == "London"
   select c;

So, you could use a relational, tabular representation of data in the database as objects that are strongly typed and checked at compile time in your .NET code. This allows you to both query your objects and stuff intelligence in them via mechanisms such as inheritance or partial classes. But isn't that what LINQ provides? Then, what does DLINQ add to the puzzle, besides simple query and persistence?

On the surface, DLINQ's functions sound quite straightforward: "represent relational data as objects," and "provide a mechanism to do the translation." But, a closer look quickly reveals various other hairy monsters in the sewer that need to be tamed.

For instance, the data that DLINQ queries upon does not have to worry about concurrency issues or transactional support. It doesn't have to worry about versioning or being disconnected from a huge data cache—your database. It also doesn't have to worry about preserving object identities. An object, when queried, always returns a copy of the data, but what if your object (say, "Customers") had been modified since the last time you read it? Objects that rely on database objects aren't instantaneously updated by the underlying database, unlike regular business objects, of which there is only one in-memory instance.

Also, how should your DLINQ queries be written so they execute efficient queries on the database and participate as good connection-pooling citizens in your architecture? Can you override the default behaviors?

This article attempts to answer a few of these questions. But first, how does DLINQ work?

The DataContext Class

At the heart of DLINQ is the DataContext class. Suppose you have a database that is created by the following T-SQL script:

Create Database Test
Go

Use Test
Go

Create Table Entity
(
   EntityID INT IDENTITY PRIMARY KEY,
   EntityName Varchar(100) NOT NULL
)
Go

Create Table EntityDetail
(
   EntityDetailID INT IDENTITY PRIMARY KEY,
   EntityID INT REFERENCES Entity(EntityID),
   EntityDetailName Varchar(100) NOT NULL
)
Go

The following figure represents the two resulting tables.

If you insert some sample data using the following script:

BEGIN
DECLARE @LASTID INT
   Insert Into Entity (EntityName) Values ('TestEntity')
   Select @LastID = SCOPE_IDENTITY()
   Insert Into EntityDetail(EntityID, EntityDetailName)
               Values (@LastID, 'Detail 1') ;
   Insert Into EntityDetail(EntityID, EntityDetailName)
               Values (@LastID, 'Detail 2') ;
END
Go

You could easily create a not-strongly-typed DataContext and use it to query the Entity Table as follows:

DataContext db = new DataContext(connectionString);

Table<Entity> Entities = db.GetTable<Entity>();

var q =
   from e in Entities
   where e.EntityName == "TestEntity"
   select e;

foreach (var entity in q)
   Console.WriteLine("EntityID = " + entity.EntityID);

For this code to work, it assumes a class called Entity already has been set up with the following structure:

[Table]
public class Entity
{

   [Column(Id=true)]
   public int EntityID;

   [Column]
   public string EntityName;
}

But wouldn't it be a lot better to have a strongly typed DataContext instead? This would allow compile-time checking, and enable you to write more intuitive code like this:

TestDataContext db = new TestDataContext(connectionString);

var q =
   from e in db.Entities
   where e.EntityName == "TestEntity"
   select e;

You could use a tool such as SqlMetal to do this, but it is important to understand the structure and layout of a strongly typed DataContext. Much like a strongly typed DataSet, at the heart of a strongly typed DataContext is a class that inherits from DataContext:

public partial class TestDataContext : DataContext
{
   public Table<Entity> Entity;
   public Table<EntityDetail> EntityDetail ;
   public TestDataContext(string connStr) : base(connStr) {}
}

As you can see, you now need to create two classes that represent the "Entity" and "EntityDetail" tables. Let's start with Entity:

[Table(Name="Entity")]
public class Entity
{
   ...
}

The "Name" property is really not necessary, because in this case it matches the database. However, by specifying it anyway, you can explicitly map to a table. You then could put the name in a resource/satellite assembly, so that if the structure changes, you can update all names in one place.

DLINQ: Bridging the Object Relational Divide

The next thing you'd need to do is create definitions for the two columns inside the class. The following code does that:

[Column(Id=true,Name="EntityID")]
public Int32 EntityID;

[Column]
public string EntityName;

You'd still have one important thing left to do before setting up Entity (EntityID, EntityName). When you reference other tables in a database, you use joins. In objects, you use a dot syntax, such as Entity.EntityDetails. So, you need to specify a property called "EntityDetails" on the class representation for the Entity table, with the appropriate attributes so the framework understands the "one" and "many" sides in a one-to-many relationship.

This is where the "EntitySet", "EntityRef", and "Association" attributes come in handy. The following is the implementation of Entity.EntityDetails:

private EntitySet<EntityDetail> _entityDetails ;
[Association(Storage="_entityDetails", OtherKey="EntityID")]
public EntitySet<EntityDetail> EntityDetails
{
   get
   {
      return this._entityDetails;
   }
   set
   {
      this._entityDetails.Assign(value) ;
   }
}

In comparison with the "many" side in one to many, the EntityDetail's Entity property implementation looks like this:

private EntityRef<Entity> _entity;
[Association(Storage="_entity", ThisKey="EntityID")]
public Entity Entity
{
   get
   {
      return this._entity.Entity ;
   }

   set
   {
      this._entity.Entity = value ;
   }
}

The bolded portions of the implementation code highlight a slight difference between the two sides.

Interestingly, the "Storage" attribute of the Association attribute points to a private variable that DLINQ can access for updating, should you choose to omit the "setter" in your object definitions. The following is the full code for the strongly typed DataContext:

public partial class TestDataContext : DataContext
{
   public Table<Entity> Entity;
   public Table<EntityDetail> EntityDetail;
   public TestDataContext(string connStr) : base(connStr) { }
}

Table(Name = "Entity")]
public class Entity
{
   [Column(Id = true, Name = "EntityID")]
   public Int32 EntityID;

   [Column]
   public string EntityName;
   private EntitySet<EntityDetail> _entityDetails;

   [Association(Storage = "_entityDetails", OtherKey = "EntityID")]
   public EntitySet<EntityDetail> EntityDetails
   {
      get { return this._entityDetails; }
      set { this._entityDetails.Assign(value); }
   }
}

Table(Name = "EntityDetail")]
public class EntityDetail
{
   [Column(Id = true)]
   public Int32 EntityDetailID;

   [Column]
   public string EntityDetailName;

   [Column]
   public Int32 EntityID;

   private EntityRef<Entity> _entity;

   [Association(Storage = "_entity", ThisKey = "EntityID")]
   public Entity Entity
   {
      get
      {
         return this._entity.Entity;
      }
      set
      {
         this._entity.Entity = value;
      }
   }
}

That's it! Your strongly typed DataContext is set up. Now, you can instantiate it by using the following code:

string connectionString =
   "Data Source=(local);Initial Catalog=Test;Integrated Security=true"

TestDataContext(connectionString);

You even can execute .NET code that lets you do joins using an object syntax. For instance, you could execute a query that looks like this:

var q =
   from e in db.Entity
   from ed in db.EntityDetail
   where e.EntityName == "TestEntity"
   select new {e.EntityName, ed.EntityDetailName} ;
foreach (object qPart in q)
{
   Console.WriteLine(qPart.ToString()) ;
}

Interestingly, you could rewrite the above query and produce the very same results like this:

var p =
   from ed in db.EntityDetail
   where ed.Entity.EntityName == "TestEntity"
   select new { ed.Entity.EntityName, ed.EntityDetailName};

foreach (object pPart in p)
{
   Console.WriteLine(pPart.ToString());
}

Starting from the May LINQ CTP, both these queries produce similar T-SQL queries. Thus, the query translation engine in LINQ now fully understands joins.

The Concept of Object Identity

When you work with TestDataContext as in the previous examples, you access the Entities object as db.Entities, which is very similar to accessing an object. But, if two variables access the same object, they refer to the same object instance. If one piece of code makes some changes, the other immediately sees the latest update of the data. Objects tied to database tables can't work this way, however. After all, their identity is constrained by primary keys in the database, from which they are disconnected in .NET code. Thus, data that was queried out of the same rows in the database probably will not correspond to the same object instances in your .NET client code.

This seeming discrepancy can be resolved by having the DataContext manage object identity. Whenever an object is retrieved from the database, it is logged in an identity table by its primary key. Whenever that same object is retrieved again, the original object instance is handed back to the application. This way, the DataContext translates the database's concept of identity (keys) into the language's concept (instances). The application sees the object only in the state that it was first retrieved. The new data, if different, is thrown away.

New data is thrown away? Oh no! Well, it really isn't as bad as it sounds. You expect your data to remain the same until you persist the changes back into the database. DLINQ manages this by giving you the built-in ability for optimistic updates. So, when an application attempts to save your out-of-date data back into the database, it will now throw an OptimisticConcurrencyException. But, your business logic will not get confused by data that is magically changing because of other users' actions.

Submitting Changes and Transactions

DLINQ makes submitting changes in your objects back to the database easy. All you do is modify the object and then call DataContext.SubmitChanges(). DLINQ maintains enough tracking information to enable it to submit a change and use optimistic concurrency to check for concurrency issues. It also provides various facilities for changing optimistic concurrency behaviors.

If the auto-generated commands aren't what you want, you also can specify your own update commands by using attributes such as UpdateMethod, which decorate the appropriate methods that are supposed to be called during a SubmitChanges process.

You also can make your update transactional by using System.Transactions like this:

using(TransactionScope ts = new TransactionScope()) {
   db.SubmitChanges();
   ts.Complete();
}

Alternatively, you could use an ADO.NET command like SqlTransactions as follows:

db.LocalTransaction = db.Connection.BeginTransaction();
try {
   db.SubmitChanges();
   db.LocalTransaction.Commit();
   db.AcceptChanges();
}
catch {
   db.LocalTransaction.Abort();
   throw;
}
finally {
   db.LocalTransaction = null;
}

A manually controlled transaction, such as the equivalent of SqlTransaction, is a little more involved and error-prone. Also, a manually controlled transaction is limited to a local database, and the ADO.NET 3.0 Entity Data Mapping Framework and DLINQ generally have a very good story around objects that are formed from disparate data sources—something that manually controlled transactions prevent you from using. So, you must try to use System.Transactions instead of manually controlled transactions whenever you can.

DataSet vs. Business Objects—Why Choose?

Even in its very early stages, DLINQ solves a very big problem in application architecture. It attacks that 60-80 percent of tedious code that developers have to write over and over again to ferry data back and forth between databases and objects. As time passes, I hope to see further guidance regarding what may be a good DLINQ query and what may not. Either way, this is a technology that will hopefully put the DataSet vs. business objects debate to rest and make our coding lives easier.

About the Author

Sahil Malik has worked for a number of top-notch clients in Microsoft technologies ranging from DOS to .NET. He is the author of Pro ADO.NET 2.0 and co-author of Pro ADO.NET with VB.NET 1.1. Also, Sahil is currently working on a multimedia series on ADO.NET 2.0 for Keystone Learning. For his community involvement, contributions, and speaking, he has also been awarded the Microsoft MVP award.



Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • On-demand Event Event Date: September 10, 2014 Modern mobile applications connect systems-of-engagement (mobile apps) with systems-of-record (traditional IT) to deliver new and innovative business value. But the lifecycle for development of mobile apps is also new and different. Emerging trends in mobile development call for faster delivery of incremental features, coupled with feedback from the users of the app "in the wild." This loop of continuous delivery and continuous feedback is how the best mobile …

  • Packaged application development teams frequently operate with limited testing environments due to time and labor constraints. By virtualizing the entire application stack, packaged application development teams can deliver business results faster, at higher quality, and with lower risk.

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds