Full Text Search: The Key to Better Natural Language Queries for NoSQL in Node.js
From Chapter 6: Disconnected Data via ADO.NET and DataSets of the book Extending MFC Applications with the .NET Framework.
As programmers, many of us like to jump right in and start using a technology with so much promise. However, logic dictates that we take a minute to familiarize ourselves with the terms and classes that implement this technology. Therefore, this section is meant as a primer for the rest of the chapter. I'll begin by quickly going over the terminology and main classes associated with disconnected data and then introduce a few tasks and concepts that will be used or referenced throughout the chapter.
ADO.NET Terminology and Main Classes
The first new term you'll hear quite often regarding ADO.NET is that of a managed provider. This is simply the .NET equivalent of terminology that was originally introduced with OLEDB (and later used by its COM interface, ADO). In OLEDB, code that provides a generic interface to data is referred to as a provider. Therefore, since code written to run on top of the CLR is called "managed," we are given yet another new database term to remember. As of the time of this writing, the .NET Framework defines five managed providers:
- OLEDB: Supports data stores that have an OLEDB provider.
- ODBC: Supports data stores that have an ODBC driver.
- Oracle: A set of classes optimized for the Oracle database product.
- SQL CE: A .NET Compact Framework managed provider that supports Microsoft SQL Server CE.
- SQL Server: A set of classes that are optimized to support the Microsoft SQL Server database product.
While we're on the topic, I'll also be using the familiar terms data source and data consumer. (Data source is the generic name for the data being provided for consumption by the consumer.) Obviously, the consumer is any code that retrieves, stores, and manipulates data represented by the managed provider.
Like many other frameworks that you've seen throughout this book, ADO.NET is comprised of many classes. However, this chapter will focus on the following classes:
- Connection: Functions much like the ADO object of the same name and represents a connection to a data source.
- Command: Another holdover from ADO, the Command object represents a query or a command that is to be executed by a data source.
- CommandBuilder: Used to automatically generate the insertion, update, and delete commands for the data adapter object based on the select command. It is also used to provide optimistic concurrency for disconnected DataSet objects.
- DataSet: One of the key elements with ADO.NET is the DataSet. A little too involved to be defined with a single sentence, the DataSet represents an in-memory model of disconnected data and has built-in support for XML serialization. That latter capability is covered in Chapter 8, "Combining ADO.NET and XML."
- DbDataAdapter: The abstract base class for all data store.specific classes such as SqlDataAdapter, OracleDataAdapter, OleDbData Adapter, and so on.
- DataAdapter: The base class for the DbDataAdapter class.
- Data adapter: Not really a class, but a generic designation for one of the DbDataAdapter-derived classes.
- DataView: This class is most easily defined to MFC developers as the data equivalent of a CView class for data. For example, in a standard MFC document/view class you can build multiple views that are built on—but work with different parts of—the same data. Likewise, multiple DataView objects represent different views on the same DataSet.
- XmlDataDocument: Enables you to treat DataSet data as XML data in order to support things like XPath search expressions, XSL (eXtensible Stylesheet Language) transformations, and so on.
Now that you've been introduced to the terms, it's easier to define a managed provider as a group of classes that interface to the generic DataSet class to abstract you from the specifics of the data you are reading or modifying. For example, the System::Data::SqlClient namespace defines about 15 classes and several delegates that are optimized for use with the SQL Server database product. Among these classes are derived types of the base classes I mentioned in the previous list: SqlConnection, SqlDataAdapter, SqlCommand, and SqlCommandBuilder.
Let's now look at the DataSet class a bit more closely. The DataSet class is a collection of data structures (other classes) that are used to model relational data. The following list details the main classes that comprise either the DataSet class or one of its member classes:
- DataTable: If you're familiar with ADO, then at first glance you might be tempted to think of a DataSet class as being comparable to souped-up ADO Recordset objects. However, datasets are so encompassing that there is no equivalent in ADO for them. The DataTable class, on the other hand, is a more true ADO.NET equivalent of the ADO Recordset object, as it encapsulates a two-dimensional array (rectangle) of data organized into columns and rows.
- DataColumn: Within the DataTable class are a collection of Data Column definitions. As the DataRow class (described next) defines actual data, the DataColumn class defines the data store column definitions. Example members of this class are ColumnName and DefaultValue as well as Boolean properties such as AllowDBNull, AutoIncrement, and ReadOnly.
- DataRow: The DataRow class encapsulates the data for a given DataTable object in addition to defining many members that support the disconnected capabilities of the DataSet/DataTable. These members include support for tracking the current and original values of each column, the current state of the row (a DataRow State enumeration with such values as Added, Deleted, Detached, Modified, and Unchanged) and a connection to the parent table to support DataRelation via the GetParentRows and GetChildRows methods.
- DataRelation: DataRelation objects are used to define how multiple DataTables are associated. For example, it is quite common to use this feature when dealing with tables that have a parent/child relationship, such as order header and order detail tables. Using this feature, you can more easily navigate the related data of these two tables. This class is covered in more detail in the next chapter.
- Constraint: Each DataTable defines a collection of constraints that specify rules for maintaining data integrity. For example, when you delete a value that is used in one or more related tables, a ForeignKeyConstraint determines whether the values in the related tables are also deleted, set to null values, set to default values, or whether no action occurs.
Constructing and Filling DataSet Objects
Now that that you've been introduced to the main ADO.NET classes that will be used throughout this chapter, let's take a look at a code snippet that illustrates how to connect to and retrieve data from a data source. After the code snippet, I'll provide a walkthrough of the various classes that are being used here as well as a lot of not-so-obvious tasks that are being performed for us in order to facilitate a disconnected dataset.
SqlConnection* conn = new SqlConnection(S"Server=fantine;" S"Database=Northwind;" S"Integrated Security=true;"); SqlDataAdapter* adapter = new SqlDataAdapter(S"SELECT * FROM Employees", conn); SqlCommandBuilder* cmd = new SqlCommandBuilder(adapter); conn->Open(); DataSet* dataset = new DataSet(); adapter->Fill(dataset, S"AllEmployees"); conn->Close(); // No longer needed DataTableCollection* tables = dataset->Tables; employeesTable = tables->Item[S"AllEmployees"]; // ... Use employees table as needed.
While this code looks pretty straightforward, there's much more going on here than meets the eye.
- The first thing the code snippet does is to connect to the SQL Server sample database, Northwind, using the SqlConnection class.
|Specifying Connection Strings for Different Database Products
For this chapter,I chose to use the SQL Server database as it 's the most commonly used database among Visual C++/MFC professionals.In addition,while much of the code that you 'll see in this chapter can easily be massaged to work with any managed provider,the initialization of the Connection object is data source .speci .c.There- fore,if you are using another product,such as Oracle or Microsoft Access,or want to use the OLEDB or ODBC interfaces to these or other databases,the http://www. connectionstrings.com Web site is an invaluable resource,as it contains connection strings for virtually every data store.
- Once that is done, the code uses a DbDataAdapter-derived class (SqlDataAdapter) designed specifically for SQL Server access. As mentioned in the previous section, the data adapter is what connects a dataset to the underlying data store. However, what's really interesting here is that while I'm passing a "select" value to the SqlData Adapter class' constructor, the various data adapter classes define four distinct commands (in the form of SqlCommand classes): Select Command, InsertCommand, UpdateCommand, and DeleteCommand. (From here on, the latter three commands will be referred to en masse as action commands.) One extremely important note to make here is that the data adapter does not automatically generate commands to reconcile changes made to a dataset based on the select statement used to construct the adapter. You must either set these commands yourself or use a command builder class, which segues nicely into the next items of interest from the code snippet.
- Once the data adapter has been constructed with the desired select command, an SqlCommandBuilder object is instantiated and associated with the data adapter. The command builder automatically generates the appropriate action commands (complete with the underlying SQL code, ADO.NET Command objects, and their associated Parameters collections) based on the adapter's Select Command.
- Next, the connection is opened. One thing to note here is that the data adapter is designed to minimize the time a connection stays open. As you see more code in this chapter, take note that the data adapter's associated connection is never explicitly opened or closed. Instead the adapter knows when it needs to connect and disconnect. For example, when calling the data adapter object's Update method in order to commit changes to the dataset, the data adapter will automatically use an already open connection to the data store or make the necessary connection and automatically disconnect when finished.
- After that, we're finally down to the ttDataSet object itself. To construct and fill the dataset, you can simply use the DataSet class's default constructor and then call the data adapter object's Fill method, passing the constructed DataSet object as the first parameter. The Fill method retrieves data from the underlying data store based on the data adapter's SelectCommand value. (In this example, that value was set when the SqlDataAdapter was constructed.) You'll also notice that I specified a literal value of "AllEmployees" for the second parameter to the Fill method. This value specifies the name that I wish to give the DataTable that will be constructed with the returned data. If I had not named the dataset's data table, it would have been named "Table" automatically. (When more than one data table are generated and not specifically named, they are assigned the names Table1, Table2, and so on.)
|Creating Multiple DataTables in a DataSet
While most of the chapter 's code snippets and demo applications will only read and modify a single table,there might be times when you 'll want a DataSet to contain multiple DataTable objects.The section entitled "Creating Multiple DataTables in a DataSet" will illustrate how to do this both by using multiple data adapters and also by combining multiple SELECT statements in a single data adapter in order to reduce round trips to the server.
- At this point, the requested data is in the DataRow members of the DataTable members of the DataSet object. Therefore, the code can safely disconnect from the data source and continue working until it wants to commit any changes made to the data!
- The last thing I'll illustrate here before moving on to the next code snippet is how to retrieve the desired DataTable objects from the DataSet object. As you can see from the code, the DataSet class has a public property called Tables that is simply a collection of Data Table objects. As with accessing any other .NET collection with Managed Extensions, you can use one of two overloaded Item indexers—one accepts the relative index and the other the named entity. Therefore, as the data adapter in this code snippet only constructed a single DataTable object that was named "All Employees" in the Fill method, it can be retrieved either by name or by passing its index value of 0.
|Different Ways to Construct Datasets
There are three distinct methods to constructing and filling datasets.One way—used in this chapter—is from a data adapter (which is typically associated with a database). You can also construct a dataset programmatically from any data your application has access to,either read from another source or generated within the application.This technique—while not overly difficult—is not used very often and is beyond the scope of this chapter.Finally, you can also construct a DataSet object from an XML document in situations where you wish to treat XML data as you would any other data format. The topic of mixing ADO.NET and XML is covered in Chapter 8.
Untyped vs.Typed Datasets
There are two basic ways to use the DataSet objects: untyped and typed. When using untyped datasets, you use the base BCL-provided DataSet objects and pass the relevant information that specifies which table, column, row, and so on that you're working with. For example, let's say that you're working with a row of data (represented by a DataRow object) for a table that contains a column named FirstName. For each row, you could access and modify the FirstName column as follows:
// row is a DataRow object // Retrieve value String firstName = row->Item[S"FirstName"]->ToString(); // Set value row->Item[S"FirstName"] = S"Krista";
The DataRow—needing to be a generic interface for all data—provides methods for reading and updating column values, respectively, where you're responsible for specifying the column name and—if updating—an Object representing the value. This generic approach, which makes you responsible for the specifics, is used throughout all the DataSet classes. Therefore, the main drawback to untyped datasets is that the code is not type-safe. In other words, mistakes made in your code, such as misspelling the column name or passing an incompatible data type, will only be realized at runtime.
Typed datasets, on the other hand, are classes that are generated from a specified data store. It's important to realize that these classes are still directly derived from the ADO.NET base classes. However, they contain members specific to the data store schema and, as such, allow for compile-time error checking. To continue our Employees table example, a typed DataSet would include a DataRow-derived class called EmployeesRow. This class would then define members for each column in the Employees table, as shown in the following excerpt.
public: EmployeesDataSet::EmployeesRow* AddEmployeesRow ( System::String* LastName, System::String* FirstName, System::DateTime HireDate, System::Byte Photo, System::String* Notes, System::Int32 ReportsTo );
Using the typed dataset, our read and update code becomes the following:
// row is an EmployeesRow object // Retrieve value String firstName = row->FirstName; // Set value row->FirstName = S"Krista";
As you can see, the main benefits to typed datasets are better readability and compile-time type checking.as each column is a class member that is associated with its correct type within the class. To draw a parallel between typed datasets and our MFC world, you could say that typed datasets are analogous to using the MFC ODBC Consumer Wizard to generate a CRecordSet class. The main difference is that while the various ADO.NET classes can be bound to .NET Windows Forms controls, they were designed for a managed world; thus, there's nothing akin to RFX that will automatically bind the data to our MFC dialogs/views and controls. That we have to do manually.
I'll get into more of the advantages and disadvantages of using typed datasets in the section entitled "Working with Typed Datasets." However, I at least wanted you to know at this point that they both exist and to understand the main differences between them. Also note that while typed datasets have some obvious advantages, this chapter will use mostly untyped datasets for the following reasons:
- Untyped datasets allow you to see more easily what is really going on in code snippets as the client code explicitly states table and column names, store-procedure parameter names, and so on, as opposed to the actual database entity names being hidden in a class.
- Untyped datasets allow for shorter, more focused code snippets and demos. Otherwise, each demo would require extra steps to create the typed datasets and then would require a lot of cross referencing between the main code and the typed DataSet class code.
Basic Database Operations with ADO.NET
Whether you're working with a connected or disconnected data store, the majority of database operations involve NURD work—New, Update, Read, Delete. However, as this section will illustrate, many of the sometimes very tedious database operations are made much easier with the help of the various ADO.NET classes.
|Quick Note on This Section's Examples
This section's code snippets are all freestanding functions that can be plugged directly into your own test applications.They make the sole assumption that the
Extending MFC Applications with the .NET Framework
# # #