Full Text Search: The Key to Better Natural Language Queries for NoSQL in Node.js
To write effective .NET data-driven applications, a deep understanding of the ADO.NET object model is essential. In version 2.0 of the .NET Framework, the set of classes that form the ADO.NET object model are nearly identical to those in version 1.1, but Microsoft has introduced a few key changes. This article examines what's new in the ADO.NET 2.0 classes for database developers.
Most of the enhancements fall in two main areas: .NET provider model and SQL Server managed provider. Some of the new features are common to all managed providers regardless of the database in the backend, which is the case with batch update and the factory model. Other refinements are specific of the SQL Server data provider and include asynchronous commands and support for new features available with SQL Server 2005.
Enhancements Common to All .NET Providers
A managed provider consists of a collection of related objects that provide the most common operations over a data storage system. Typically, a managed provider for a database exposes methods for the following purposes:
- To execute commands and stored procedures
- To open and close connections
- To read and fetch data
Each provider implements key functions in a way that fully leverages the characteristics of the underlying database. In terms of efficiency, this represents a quantum leap from the ActiveX Data Objects (ADO) model. However, it hinders the implementation of truly database-agnostic systems.
Managed data objects are based on interfaces, and each interface provides some "generic" members to make a call without knowing in advance the target database. In the .NET Framework 1.1, you have to create the connection object explicitly (in other words, through the new operator and knowing the exact type). Starting from there, you can create commands and get data readers but you cannot obtain a disconnected DataSet through an adapter. The .NET Framework 2.0 eliminates all these limitations.
The following code shows how to create a connection object in an indirect way through the factory pattern:
DbProviderFactory f; f = DbProviderFactories.GetFactory("System.Data.SqlClient"); DbConnection conn = fact.CreateConnection();
Another cross-provider enhancement regards the batch update process. Normally, during batch update operations, a data adapter does one roundtrip per changed row. This may result in uselessly large and expensive roundtrips and may impact performance. On the other hand, 1.x offers no way to control the batch size programmatically.
In the .NET Framework 2.0, the data adapter object comes with a new property (UpdateBatchSize), which determines the number of changed rows that are submitted to the server in a single shot. The property is set to 1 by default to preserve backward compatibility. Note that a batch size that is too large can clog the network and result in performance loss. Before deciding on a batch size greater than one, you should set a benchmark to determine the size that works in your production environment. Finally, batches can execute inside a transaction for added speed and consistency.
Enhancements to the SQL Server Provider
Asynchronous commands and the query notification mechanism (limited to SQL Server 2005) are two refinements that have been brought to a sole managed provider for SQL Server.
Any commands run against any version of SQL Server can execute asynchronously and be controlled via one of the following techniques: non-blocking, polling, or callback. In the first case, you run command, do something else, and then stop until the command terminates. The second alternative entails your running the command and then polling for completion. Finally, if you opt for the callback approach, your callback function will be invoked upon completion. No additional thread is created to permit this feature, mostly because ADO.NET is not thread safe and introducing thread synchronization would perhaps be a remedy worse than the disease.
The secret for asynchronous commands is in the network layer. You activate ad hoc support by specifying ASYNC=TRUE in the connection string. When this happens, the TCP socket is opened in overlapped mode and bound to the I/O completion port. Asynchronous commands are recommended when the client machine is different from the SQL Server machine. The ASYNC keyword generates an error if invoked on a local connection.
Of the various innovations SQL Server 2005 brings, one that finds a powerful implementation in the ADO.NET 2.0 object model is the notification API to track query results. You can bind an instance of the SqlDependency class to a command class and use it to monitor changes to the returned resultset. Here's an example:
SqlCommand cmd = new SqlCommand( "SELECT * FROM Authors", conn); SqlDependency dep = new SqlDependency(cmd); dep.OnChanged += new OnChangedEventHandler(OnDepChanged); cmd.ExecuteReader();
When the command executes, the results are cached internally and bound to the instance of the SqlDependency class. Whenever something happens throughout the database that modifies the results the query command returns, the OnChanged event is fired to notify registered listeners of the change.
This shrinkwrapped feature to detect changes to the whole table (not necessarily changes that involve the results of a query) is also available in ASP.NET for SQL Server 2000. In this case, the class is SqlCacheDependency and it is defined in the system.web assembly.
In addition, ADO.NET 2.0 improves two of the key classes that push disconnected programming: the DataSet and DataTable class. The two offer a richer set of methods and properties that fill most of the holes in the previous version. For example, both classes are now serializable to and from XML and can be used as arguments for Web services call. (In 1.1, this was only possible for DataSets.)
An even more urgent fix regards the binary serialization of both classes. Now you can choose between the classic XML DiffGram format (as in 1.1) and a truly binary format that significantly shrinks the size of the serialized stream for large sets of data.
Evolutionary, Not Revolutionary
Overall, ADO.NET 2.0 is not revolutionary, but it is the certification of an evolutionary path that goes in the direction of getting better support for specific database features within the boundaries of a common object model.
About the Author
Dino Esposito is Wintellect's ADO.NET expert and a trainer and consultant based in Rome, Italy. He runs the Cutting Edge column for MSDN Magazine and is a regular contributor to MSDN News, Visual C++ Developers Journal, and SQL Server Magazine.