Managed Extensions: Using the .NET OLE DB Classes to Read Text Files

Welcome to this week's installment of .NET Tips & Techniques! Each week, award-winning Architect and Lead Programmer Tom Archer demonstrates how to perform a practical .NET programming task using either C# or Managed C++ Extensions.

A previous .NET Tips & Techniques article illustrated how to access comma-delimited text files using the .NET ODBC classes. This week's tip covers performing the same task using the .NET OLE DB classes.

Note: If you're trying to decide which database-access technology to use—ODBC or OLE DB—I posted an article on my consulting Web site that compares ODBC to OLE DB.

Using the OLE DB Classes

Much like any of the database-access classes provided by the .NET BCL (Base Class Library), the OLE DB classes that you'll use most often represent a connection to the data source (OleDbConnection), a command object that will be executed against the data source (OleDbCommand), and a reader that is used to enumerate the results of a command (OleDbDataReader). Therefore, the standard process for reading any type of file using OLE DB classes is the following:

  1. Instantiate an OleDbConnection object, passing to it a connection string that must include the name of the OLE DB provider for the underlying data source.
  2. Call the OleDbConnection::Open method.
  3. Instantiate an OleDbCommand object and specify to it the SQL that you want to be executed against the data store.
  4. Call one of the OleDbCommand execute methods—ExecuteNonQuery, ExecuteReader, or ExecuteNonScalar:
      ExecuteNonQuery
      Used for a command that will not return a result set, but will return a value indicating the number of rows affected
      ExecuteReader
      Returns a result set based on the executed command
      ExecuteNonScalar
      Returns only the first column of the first row of the result set, ignoring the balance of the result set
  5. In the case of using the OleDbCommand::ExecuteReader:
    1. A reader object is returned. It is enumerated by using the OleDbDataReader::Read method to read all rows returned (indicated by the OleDbDataReader::FieldCount property).
    2. For each row, the OleDbDataReader::GetName method is used to determine the current column name, and then you can either use the generic OleDbDataReader::GetValue method to get the actual data value or methods specific to the data being read—such as GetBytes, GetChar, GetDateTime, and so on.

Specifying the Connection String

When formatting an OLE DB connection string, be aware of the following three main parts:

  • Provider—Always set this to the OLE DB provider that you will be using. In the case of accessing text files, this value is Provider=Microsoft.Jet.OLEDB.4.0;
  • Data Source—When accessing text files, set this value to the fully qualified file name (e.g., Data Source=myData.txt;)
  • Extended Properties—This is where things get a bit interesting. This parameter allows you to specify several properties that are specific to the provider being used. Specifically, you can specify things such as whether the file has a column row (as its first row) and how the data is formatted. Here are some example Extended Properties settings:

    No header row, file is delimited
    Extended Properties='text;HDR=No;FMT=Delimited'

    Header row, file is delimited
    Extended Properties='text;HDR=Yes;FMT=Delimited'

Note that I didn't mention how to change the FMT parameter of the Extended Properties. This is because if your file is delimited by something other than commas, you'll need to define a schema.ini file.

The schema.ini File

Because the Jet OLE DB provider uses the Microsoft Text Driver to access text files, you can define a schema.ini file just as you would if you were using ODBC to access the data. This is covered in a previous article, so I won't repeat all that here. However, bear in mind that if a conflict exists between what you specify in the Extended Properties part of the OLE DB connection string and a parameter you specify in the schema.ini file, the schema.ini file will take precedence. For example, if you specify HDR=No in the connection string, but specify ColNameHeader=TRUE in the schema.ini file, the file will be parsed as if the first row is a column header.

Demo Application

This article includes a simple demo that reads and displays the text from a small text file. Here's a screen shot of that application:

Now, here's the actual code from the function in the application that uses much of what you've read in this article (with the addition of some basic error-handling and clean-up):

void CReadingCsvFilesWithOledbDlg::DisplayCsvFile()
{
#pragma push_macro("new")
#undef new
  CWaitCursor wc;

  m_lstCsvData.DeleteAllItems();
  DeleteAllColumns(m_lstCsvData);

  OleDbConnection* connection = NULL;
  OleDbDataReader* reader = NULL;
  try
  {
    StringBuilder* connString = new StringBuilder();

    connString->Append(S"Provider=Microsoft.Jet.OLEDB.4.0;");
    connString->AppendFormat(S"Data Source={0};",
                             Path::GetDirectoryName(m_strFileName));
    connString->Append(S"Extended Properties='text;HDR=No;
                         FMT=TabDelimited'");

    connection = new OleDbConnection(connString->ToString());
    connection->Open();

    CString strSelect;
    strSelect.Format(_T("SELECT * FROM [%s]"),
                     Path::GetFileName(m_strFileName));
    OleDbCommand* command = new OleDbCommand(strSelect, connection);

    reader = command->ExecuteReader();
    int iCurrRow = 0;
    while (reader->Read()) 
    {
      int idx;
      for (int iCurrCol = 0; iCurrCol < reader->FieldCount;
           iCurrCol++) 
      {
        if (0 == iCurrRow)
        {
          // assumes a schema.ini file
          m_lstCsvData.InsertColumn(iCurrCol, (CString)reader->
                                    GetName(iCurrCol));
        }

        if (0 == iCurrCol) 
        {
          idx = m_lstCsvData.InsertItem(iCurrRow, (CString)
                                       (reader->GetValue(iCurrCol)->
                                       ToString()));
        }
        else 
          m_lstCsvData.SetItemText(idx, iCurrCol, (CString)
                                  (reader->GetValue(iCurrCol)->
                                   ToString()));
      }
      iCurrRow++;
    }
    SizeAllColumns(m_lstCsvData);
  }
  catch(Exception* e)
  {
    m_strFileName = _T("");
    UpdateData(FALSE);

    AfxMessageBox((CString)e->Message);
  }
  __finally
  {
    if (NULL != reader) reader->Close();
    if (NULL != connection) connection->Close();
  }
#pragma pop_macro("new")
}

After looking at the screen shot and the following code, you should be able to surmise that I have a schema.ini file that specifies the names of the columns that are displayed in the listview control.

Using the Registry

If you do not want to use a schema.ini file—for example, if you wanted a given machine to always handle text files the same way without having to maintain a bunch of separate schema.ini files and without having to hard-code the Extended Properties parameter to each OLE DB connection string—you can control how the Microsoft Text Driver works with your text files by modifying the following registry key:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Text\Format

Because these values are to be used in lieu of having a schema.ini file, you can find the valid values on the Microsoft Web site.

The following figure shows what the Registry key looks like on my development machine:



About the Author

Tom Archer - MSFT

I am a Program Manager and Content Strategist for the Microsoft MSDN Online team managing the Windows Vista and Visual C++ developer centers. Before being employed at Microsoft, I was awarded MVP status for the Visual C++ product. A 20+ year veteran of programming with various languages - C++, C, Assembler, RPG III/400, PL/I, etc. - I've also written many technical books (Inside C#, Extending MFC Applications with the .NET Framework, Visual C++.NET Bible, etc.) and 100+ online articles.

Downloads

Comments

  • Senior Systems Engineer - DoD Navigation Database

    Posted by Terry Glagowski on 02/04/2013 09:18am

    By default the precedence for the delimiter information in a connection is: 1) Schema.ini 2) Registry 3) Connection String Is there a way to make the connection string have precedence over the registry entry in this regard? Terry Glagowski Terry.Glagowski@Hanscom.af.mil GlagowskiT@acm.org

    Reply
  • CString in .NET

    Posted by darwen on 09/28/2004 10:24am

    I don't think you should really be using CString - you should be using String::Format instead. You shouldn't really mix .NET and native unless you really have to. And then there should be clearly defined interfaces between one and the other.

    • Good point

      Posted by Tom Archer on 09/28/2004 10:44am

      Correct. As I explain in my book - Extending MFC Applications with the .NET Framework - you should only mix code when you need to due to conversion/performance issues. What you're seeing is simply a quick test of the main topic of the article - using OLEDB to read comma-delimited files. However, the following code will probably perform slightly better:

      StringBuilder* select = new StringBuilder();
      select->AppendFormat(S"SELECT * FROM [{0}]", 
                           Path::GetFileName(m_strFileName));
      OleDbCommand* command = new OleDbCommand(select->ToString(), 
                                               connection);

      Reply
    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: December 11, 2014 @ 1:00 p.m. ET / 10:00 a.m. PT Market pressures to move more quickly and develop innovative applications are forcing organizations to rethink how they develop and release applications. The combination of public clouds and physical back-end infrastructures are a means to get applications out faster. However, these hybrid solutions complicate DevOps adoption, with application delivery pipelines that span across complex hybrid cloud and non-cloud environments. Check out this …

  • On-demand Event Event Date: October 29, 2014 It's well understood how critical version control is for code. However, its importance to DevOps isn't always recognized. The 2014 DevOps Survey of Practice shows that one of the key predictors of DevOps success is putting all production environment artifacts into version control. In this webcast, Gene Kim discusses these survey findings and shares woeful tales of artifact management gone wrong! Gene also shares examples of how high-performing DevOps …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds