Full Text Search: The Key to Better Natural Language Queries for NoSQL in Node.js
We know from experience that one can write a stored procedure that returns the data joined into a single whole. This is not very objected-oriented though because it does not allow for the co-existence of behaviors, for example, a regular expression that validates phone numbers. To resolve this disjunction one can define new objects that include the data from the database and add-on behaviors. The difficulty here is that one loses the simplicity of talking directly to the database, while gaining the power of robust, coherent objects. Using cursors directly in the presentation layer is generally considered client server, and adding a middle-layer that captures the problem domain is called n-tiered. Client-server is fast and convenient, but results in a weak object model. This disjunction is best illustrated by the vendors trying to promote object-oriented databases.
With Visual Basic .NET there may be a happy middle ground, where strong objects can be used without losing the simplicity of client-server. .NET contains a tool xsd.exe that will turn a single table or a schema describing several tables into a strongly typed class. This strongly typed representation of the database schema may then be extended to incorporate behaviors. The net effect is a strongly typed class representation of key relationships without a lot of manual exertion and an opportunity to add new behaviors that are relevant to the problem domain.
Visually Defining the XML Schema
XML has many uses in .NET. One particular use is to use XML as a means of capturing a database's schema. This technology is referred to by the initials XSD. There is an XSD tool that ships with Visual Studio .NET and the .NET Framework SDK named xsd.exe, and Visual Studio .NET contains an XSD Schema Designer that permits the visual definition of an XML schema. An XML schema and .NET's CodeDOM can be used to convert the XML schema into code; the code is a class that implements a strongly typed implementation of the schema.
As a refresher the weakly typed database operations are exhibited by indexing DataSets and DataTables to return columns or rows, and a strongly typed DataSet yields typed members. The difference in the code is its apparent expressivity. Using weakly typed ADO.NET one might index a table with a row and column name, which may or may not yield a result and that result will be typed generically as an Object. Using a strongly typed DataSet results in code that is expressive in the context of the solution domain. Instead of Tables, indexes, and columns on writes code in terms of entities in the solution domain, for example, customers and phone numbers.
While many good applications have been written using weakly typed client-server code, improvements are gained by employing typed, expressive code that conveys greater meaning, requires fewer explanatory comments, and is more robust due to its concision. (If typed and untyped objects seem foreign to you then a background on the benefits of strong types can be explored in Grady Booch's "Object Oriented Analysis and Design with Applications" from Addison Wesley.)
Preparation for the Example
To demonstrate typed DataSets with .NET I will be using the Northwind database because it is a ubiquitous sample database that is probably already on your PC or is readily available.
One can prepare to follow this example by adding a connection to an instance of the nwind.mdb (Northwind database) in Visual Studio .NET's Server Explorer. If the sample database is already in your Server Explorer you can skip to the next section. To add the Northwind database to your solution explorer, follow the steps below:
- Run Visual Studio .NET
- Open the Server Explorer from the View menu
- Select the Data Connections node and pick Add Connection from the context menu. (Click the right mouse button to display this menu)
- Step 3 will open the Data Link Properties applet
- On the Provider tab pick the Microsoft Jet 4.0 Ole DB Provider
- On the Connection tab browse to your copy of the nwind.mdb sample database
- Also, on the Connection tab click Test Connection to ensure that you are able to connect to the database
- Click OK.
When you have completed step 8 you should see a new data connection in the Server Explorer (refer to figure 1).
Figure 1: The expanded data connection to an instance of the Northwind sample database.Defining the Schema
You always have the option of using the command-line xsd.exe utility to generate a typed DataSet, but an IDE is implemented to make such routine tasks much easier than using command-line tools.
To add an XML Schema to your solution, follow the steps below:
- In Visual Studio .NET select Project|Add New Item
- Select the XML Schema project item template
- Rename the .xsd file to Customers for our example
- Click OK.
This will add a file named Customers.xsd to your project. The XSD designer will also be displayed. Drag and drop the Customers table from the Northwind connection in the Solution Explorer. Two XML elements representing the DataSet and the Customers table will be added to the Schema designer. Rename the DataSet element to Customers. (The default is Document.) The name you provide will be the name of the class containing the strongly typed DataSet when we generate the code (see figure 2).
Figure 2: a cutout from the XML Schema designer in Visual Studio .NET.
You have the option of adding additional tables to the schema and expressing relationships between these tables. (We'll save schema relationships for another day; but you are welcome to explore the designer.)
Generating the Typed DataSet
Now we are ready to generate the typed DataSet. The typed DataSet will be generated by the CodeDOM in a .vb file containing a new class that inherits from System.Data.DataSet, containing a nested typed DataTable, a typed DataRow, and a typed class for DataSet events.
With the schema designer open select Schema|Generate DataSet from the main menu in Visual Studio .NET. This step will use the CodeDOM to generate code. You can see the generated file by clicking the Show All Files button in the Solution Explorer (see figure 3).
Figure 3: Showing all files in a solution.
Click on the Customers.vb file to explore the generated code. You will see about 650 lines of generated code. You have the option of writing this code from scratch, but multiplied over a 100 tables and a wide variety of relationships you can see that generating 65,000 lines of code yields significant savings over handcrafting 65,000 lines of code.
Note: To offer some perspective here the national average is 9,000 lines of code per year per person. If this is true then writing database code for typed DataSets manually would require upwards of 7 man years. A good developer could generate a 100 typed DataSets in a couple of weeks. Although the average database may not have a 100 tables, it is altogether possible that even a modestly sized database may have a dozen tables with a multitude of combinations. The permutations can be huge and consequently, the savings equally dramatic.
If you explore the generated code you will see some new things. By now you know that classes and interfaces are different beasts; what you may not know is that VB.NET supports nested classes. A nested class is a class within a class. Using nested classes in this case makes sense.
In the generated typed DataSet you will find a typed DataTable, DataRow, and a class for managing data events. These classes are nested because they really don't make sense outside of the typed DataSet. For example, the Customers DataSet is comprised of the Customers DataTable, and the DataTable is comprised of the Customers DataRows. The dependency here is such that these classes only make senses as part of the whole. Technically you will need to keep this nested relationship in mind when you started interacting with instances of the Customers typed DataSet.
As you explore the code using the code editor or Intellisense you will see that the typed DataSet has strongly typed and named members that will yield more expressive code. For example, if we have a CustomersRow class we can refer directly to the City property rather than indexing the row to return the City column. In addition, basic initialization and checks for null fields are automatically coded for us. The net benefit is that all of the basic behaviors we would code manually exist in the typed DataSet and our code will use named and typed objects and members.
Programming with Typed DataSets
Inheritance is a beautiful thing. Because a typed DataSet is still a DataSet we can use the existing ADO.NET connection and adapter to connect to the data source and fill the DataSet. Our typed DataSet behaves just like an untyped DataSet relative to the rest of ADO.NET, but yields a better total code result. Listing 1 demonstrates how to initialize and interact with a typed DataSet.
Listing 1: Initializing and interacting with a Typed DataSet
1: Imports System.Data 2: Imports System.Data.oleDb 3: 4: Module Startup 5: 6: Public Sub Main() 7: 8: Dim connectionString As String = _ 9: "Provider=Microsoft.Jet.OLEDB.4.0;" + _ 10: "User ID=Admin;Data Source=C:\Program Files\" + _ 11: "Microsoft Visual Studio\VB98\NWIND.MDB;Mode=Share Deny None;" 12: 13: Dim sql As String = "SELECT * FROM CUSTOMERS" 14: Dim adapter As OleDbDataAdapter = _ 15: New OleDbDataAdapter(sql, connectionString) 16: 17: Dim theCustomers As Customers = New Customers 18: 19: ' Uses an implicit OleDbConnection object automatically 20: adapter.Fill(theCustomers, "Customers") 21: 22: Dim Row As Customers.CustomersRow 23: 24: For Each Row In theCustomers.Tables(0).Rows 25: Console.WriteLine(Row.CompanyName) 26: Next 27: 28: Console.ReadLine() 29: 30: End Sub 31: 32: End Module
Lines 8 through 10 define a valid connection string. This can be extracted easily from the Data Link Properties tool or by selecting the Data Connection in the Server Explorer, pressing F4, and copying the connection string from the Properties Window. Line 13 defines a suitable SQL statement. Lines 14 and 15 declare and create an instance of an OleDbDataAdapter. Line 17 creates an instance of the typed DataSet. Line 20 implicitly creates an instance of an OleDbConnection and fills the DataSet.
If we forget to name the table in line 20 (the second argument) then the typed DataTable will be null and the DataSet will contain a second, untyped table containing the data.The For Each loop—lines 24 through 26—show how we can now refer to entities in the typed DataSet by name. In the example, I am specifically referring to the CompanyName rather than indexing the Row object.
Extending Strongly Typed DataSets
During development there is a real possibility that you will need to regenerate the typed DataSet a couple of times. For this reason I would discourage you from modifying the typed DataSet directly. Instead, inherit from the typed DataSet, add new behaviors to the new type, and use the new subclass. In this way, even if you regenerate your typed DataSet you will avoid losing customizations.
In this article you learned the technical aspects of defining an XML schema, generating a typed DataSet, initializing, and using that typed DataSet. Because inheritance is supported in VB.NET, the new typed DataSet will behave the way any untyped DataSet will behave and can be used in the same ways too.
While the totality of this approach may seem to be modest, never underestimate the accumulative benefit of adopting best practices. Like employing an architect, Refactoring, using design patterns, and other great strategies, your total result will be greater reliability, efficiency, readability, and have a positive impact on your success.
About the Author
Paul Kimmel is a software architect, writer, and columnist for codeguru.com. Look for his recent book Visual Basic .NET Power Coding from Addison-Wesley. Paul Kimmel is available to help design and build your .NET solutions and can be contacted at email@example.com .
# # #