Managed Extensions: Using the .NET ODBC Classes to Read Text Data

Welcome to this week's installment of .NET Tips & Techniques! Each week, award-winning Architect and Lead Programmer Tom Archer demonstrates how to perform a practical .NET programming task using either C# or Managed C++ Extensions.

While data for professional-caliber applications is most typically stored in traditional database systems, sometimes the data your application must use is in text format. This includes situations where you are accessing a small amount of test data, as well as scenarios where another system is providing a text file for you to use and you have no control over its format. While most people naturally think of using streams to read and write text files, ODBC has for years provided a driver specifically for this purpose.

Why would you use the driver and incur the overhead of ODBC when you can easily stream the data? Well, for starters, ODBC provides a generic SQL-like interface to the data. Secondly, due to ODBC's generic interface, using it instead of directly accessing a file via a stream allows you to more easily use the same code to access data in other formats.

For example, let's say that the data your code will ultimately work with is stored in a traditional RDBMS (relational database management system), such as SQL Server or Oracle. However, you might want to test your application logic against a small amount of data that you can quickly enter into a text file via any editor (such as Notepad). Using ODBC, you would simply specify different DSNs or ODBC drivers (for a DSN-less connection) based on which file format you're using. That way, you wouldn't have to maintain two completely different code bases for accessing your data (one for streaming text files and one for reading from the RDMBS).

This article illustrates how easy it is to read text data using the .NET ODBC classes.

Reading Data from a DSN-less Text File

The ODBC Desktop Drivers include a driver for reading text called the Microsoft Text Driver. The easiest way to access text data is to simply use the ODBC Admin application (odbcad32.exe) and—specifying the text driver—create a DSN against the desired text file. However, I'll show you the basic steps for using the .NET ODBC classes to access a text file, such that you don't have to perform the extra step of creating a DSN. (The complete code—including basic error handling and clean-up—can be found in the dialog class of this article's demo application.)

  1. Create the Connection—Using the OdbcConnection class, you can pass a connection string that allows you to specify the ODBC driver (the Microsoft Text Driver, in this case) and the path of the files. Note that I said files—plural. When you use the Microsoft Text Driver, you don't specify in the connection string the file the application will be accessing. Instead, you specify the path to the file and, optionally, the valid file extensions for any files in that path that can be opened. The reason for this is that the text driver logically treats the specified directory as a relational database and then the specific files that your application works with as tables within that database. This was a great idea by the folks at Redmond, as it more closely mimics how your code will access data from a true RDBMS. The following example creates a connection to the folder that contains the specified file:
    StringBuilder* connString = new StringBuilder();
    
    connString->Append(S"Driver={Microsoft Text Driver
                      (*.txt; *.csv)};");
    connString->AppendFormat(S"DBQ={0};", 
                             Path::GetDirectoryName(strFileName));
    
    OdbcConnection* connection =
       new OdbcConnection(connString->ToString());
    connection->Open();
    

    Note that when the Text Driver is used to make a connection to a given path (specified with the DBQ parameter), only files in that specific directory can be accessed—not files in any subdirectories.

  2. Create the Command—Once a connection is made, you then can create the desired command via the OdbcCommand class. This is where you would specify the file name. The following code snippet selects all rows from the specified file name, keeping in mind that the file must exist in the folder specified in the DBQ parameter of the connection string:
  3. CString strSelect;
    strSelect.Format(_T("SELECT * FROM [%s]"),
                     Path::GetFileName(strFileName));
    OdbcCommand* command = new OdbcCommand(strSelect, connection);
    
  4. Attach a Reader—Now that the command has been created, you can call the OdbcCommand::ExecuteReader method to execute the command and return a OdbcDataReader object that can be used to enumerate the returned data:
    OdbcDataReader* reader = command->ExecuteReader();
    while (reader->Read())
    {
      for (int iCurrCol = 0; iCurrCol < reader->FieldCount;
           iCurrCol++)
      {
        // retrieve values via the various ODBCReader methods
        // -- such as GetValue
        AfxMessageBox((CString)(reader->GetValue(iCurrCol)
                                      ->ToString()));
      }
    }
    

Taking Control of the Process with the schema.ini File

Once you've started working with text files via a DSN-less connection, you might run into situations that will have you asking things like "How do I specify how the file is delimited (for example, tab vs. comma)?" or "Where can I specify the character set?" These settings and more can be specified via a very simple file named schema.ini that resides in the same directory as the data file. The schema.ini file is documented on the Microsoft Web site, so I won't attempt to cover every possible parameter that can be specified. However, I will cover the most popular question I see on the Internet: how to specify whether the data includes (as its first row) the column names of the data.

By default, the text driver assumes that the data contains a column heading row. Therefore, if your data does not contain this row and you do not define a schema.ini file, you will find that the first row of data is ignored. For example, if your data looked like the following, the reader code above would display only the second and third records (leaving out your favorite author!):

Tom Archer,Archer Consulting Group
Bradley Jones,Jupitermedia
Bill Gates,Microsoft

In order to specify that the data does not include a column row and that you don't wish to name the columns, your schema.ini file would look like the following:

data file

[data.txt]
ColNameHeader=FALSE

In terms of specifying the column names for your data, you have two choices:

  • You can include—as the first row in the text file—the column names and then specify the ColNameHeader attribute in the schema.ini file. (You can also omit the schema.ini file, as the text driver defaults the ColNameHeader value to TRUE).

    data file

    Name,Company
    Tom Archer,Archer Consulting Group
    Bradley Jones,Jupitermedia
    Bill Gates,Microsoft
    

    schema.ini file

    [data.txt]
    ColNameHeader=TRUE
    
  • If the data doesn't include a column row, you can manually set the column names such that the two files look as follows:

    data file

    Tom Archer,Archer Consulting Group
    Bradley Jones,Jupitermedia
    Bill Gates,Microsoft
    

    schema.ini file

    [data.txt]
    ColNameHeader=FALSE
    Col1=Name Char Width 255
    Col2=Company Char Width 255
    

The column name can be retrieved from the reader by using the OdbcDataReader::GetName method. This article's demo application uses the last technique and—while being very simple in scope—allows you to tinker with your data file and schema.ini file so that you can easily test the various configuration combinations until you get it right for your particular application.



About the Author

Tom Archer - MSFT

I am a Program Manager and Content Strategist for the Microsoft MSDN Online team managing the Windows Vista and Visual C++ developer centers. Before being employed at Microsoft, I was awarded MVP status for the Visual C++ product. A 20+ year veteran of programming with various languages - C++, C, Assembler, RPG III/400, PL/I, etc. - I've also written many technical books (Inside C#, Extending MFC Applications with the .NET Framework, Visual C++.NET Bible, etc.) and 100+ online articles.

Downloads

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Learn How A Global Entertainment Company Saw a 448% ROI Every business today uses software to manage systems, deliver products, and empower employees to do their jobs. But software inevitably breaks, and when it does, businesses lose money -- in the form of dissatisfied customers, missed SLAs or lost productivity. PagerDuty, an operations performance platform, solves this problem by helping operations engineers and developers more effectively manage and resolve incidents across a company's global operations. …

  • Today's agile organizations pose operations teams with a tremendous challenge: to deploy new releases to production immediately after development and testing is completed. To ensure that applications are deployed successfully, an automatic and transparent process is required. We refer to this process as Zero Touch Deployment™. This white paper reviews two approaches to Zero Touch Deployment--a script-based solution and a release automation platform. The article discusses how each can solve the key …

Most Popular Programming Stories

More for Developers

RSS Feeds