CParser’�A Simple File Parser

Environment:VC6

Introduction

When there is the need to parse a file and a “real” parser would be oversized for the job, this rather simple parser might be an alternative. As the two demo projects show, the CParser is easy to use:


  • Construct a CParser

  • Add the tokens you want to search for

  • Reset() the parser each time before you start parsing

  • Step through the file byte by byte and call CheckForToken(currentByte) for each byte OR

  • Alternatively provide a callback for each token and call ParseFile(fileNameStr)

A token is a piece of text you are searching a file for (=parsing). Example: Assume you have a text file that holds address information and each entry begins with “name = …”. The String “name = ” would be a token you search for when scanning the file for address entries.

I created this class when I came across the need to read information from a file that has been generated previously. Thus, there was no need to provide any syntax checking and so forth because I could rely on the file-generating code not to produce faulty output. The files I had to scan were quite large (>80 Mb), so reading the file at once into an String and parsing the String with CString::Find() or similar methods was not a option.

Detailed Description

As mentioned before, the CParser class supports two different approaches for parsing a file. Both are described in detail below and there is a demo project for each. This section handles the aspects that are valid for both approaches.

In either case, you have to provide a set of token IDs. Therefore, create a enum structure as shown below. Important: make sure to start with ‘1’ because ‘0’ is defined as NO_TOKEN parser internally. In your application, you would give the entries more meaningful names. The demo project that parses a file containing information about some virtual graphical objects uses entries such as TOKEN_COLOR or TOKEN_SIZE, for example.

Do not forget to #include the CParser interface header “parser.h”.


#include “Parser.h”

enum T_TokenID
{
TOKEN_MY_FIRST_TOKEN = 1,
TOKEN_MY_SECOND_TOKEN,
TOKEN_MY_THIRD_TOKEN
};

The parser-related headers and sources are:


  • Parser.h: CParser interface—header

  • Parser.cpp: CParser implementation

  • Token.h: CToken interface—header included by CParser.h

  • Token.cpp: CToken implementation

These files are the same for both approaches and can be downloaded by “Download CParser sources only” in the Download section below. These files are also included in the demo projects.

To add the parser sources to your application, open Parser.cpp and Token.cpp and choose “Compile” from the “Build” menu for both files and confirm to add these sources to your application.

Parse a File Using Callbacks

For each token, you have to provide a callback function that must be static in case you use a member function of a class. The declaration would resemble this:

static void CallBackForTokenMY_FIRST_TOKEN(CStdioFile* pFile);

The implementation could look like this:


void CParserDemoDlg::CallBackForTokenMY_FIRST_TOKEN(CStdioFile*
pFile)
{
// Place your code to handle a token TOKEN_MY_FIRST_TOKEN here.
// pFile points at the file to parse. The file pointer points
// at the first byte after the token just found. Thus, you can
// read in some data that follows the token here.

}

Now, construct a CParser instance and add the tokens you want to search for. The parameters of the CParser::Add method are the Token-ID, the corresponding String you search for in the file, and the corresponding callback function/method.


CParser parser;

parser.Add((int)TOKEN_MY_FIRST_TOKEN, “name = “,
CallBackForTokenMY_FIRST_TOKEN);
parser.Add((int)TOKEN_MY_SECOND_TOKEN, “street = “,
CallBackForTokenMY_SECOND_TOKEN);
parser.Add((int)TOKEN_MY_THIRD_TOKEN, “phone = “,
CallBackForTokenMY_THIRD_TOKEN);

The parser is now ready for use. You can parse a file simply by calling

parser.ParseFile(“file_to_parse.txt”);

An important disadvantage of the callback approach results from the fact that callbacks have to be static methods. It is not possible to access non-static members of the same class directly. The parserDemoCB shows an example how to work around this problem: The CListBox m_lst_itemsInFile cannot be accessed directly, so a pointer is used instead. However, if you need to access non-static members and dislike the pointer idea, you can use the alternative CheckForToken(…)approach.

Parse a File Using CheckForToken(…) and a Switch-Case Block

To parse a file, construct a CParser instance and add the tokens you want to search for. Implementing the CheckForToken(…) approach does not make use of callbacks, so this time the CParser::Add method lacks the parameter pCallBack:


CParser parser;

parser.Add((int)TOKEN_MY_FIRST_TOKEN, “name = “);
parser.Add((int)TOKEN_MY_SECOND_TOKEN, “street = “);
parser.Add((int)TOKEN_MY_THIRD_TOKEN, “phone = “);

You now can open a file and step through the file byte by byte, call CheckForToken(…) and check whether a token was found. For better readability, no exception handling is included in the sample code shown below.


CFile file;
file.Open(fileNameStr, CFile::modeRead)
BYTE buffer;
parser.Reset();
while (file.Read(&buffer, 1) == 1)
{
T_TokenID currentToken = (T_TokenID)parser.CheckForToken(buffer);

switch ( currentToken )
{
case NO_TOKEN:
break; // do nothing but continue searching for a token

case TOKEN_MY_FIRST_TOKEN:
// place your code to handle a token TOKEN_MY_FIRST_TOKEN here
break;

case TOKEN_MY_SECOND_TOKEN:
// place your code to handle a token TOKEN_MY_SECOND_TOKEN here
break;

case TOKEN_MY_THIRD_TOKEN:
// place your code to handle a token TOKEN_MY_THIRD_TOKEN here
break;

default:
{
ASSERT(false); // CheckForToken(buffer) should always
// return a valid T_TokenType

}
} // switch ( CheckForToken(buffer) )
}
file.Close();

Downloads

Both demo projects parse the text file file_to_parse.txt, which also includes some explanation. When a token is found, the corresponding data is read and added to the dialog’s list box.

Download demo project, demonstrating the CheckForToken(…) approach – 13 Kb
Download demo project, demonstrating the callback approach – 13 Kb
Download CParser sources only – 3 Kb

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read