String Tokenizer

When you are writing a lexical analyzer it would be helpful to have a class like the
StreamTokenizer class from Sun’s Java, so I’ve done something like that and here it is the
CStringTokenizer class, the usage of this class is the same as by the StreamTokenizer from
Java, there are a few additional functionality’s and the function names are slightly

The CStringTokenizer class is contained in the files StringTokenizer.h and


The interface of the class:

class CStringTokenizer : public CObject

// Constructor, you must pass as parameter the string, it initializes the
// tokenizer with the default settings (see implementation)
CStringTokenizer(CString& string);
virtual ~CStringTokenizer(); // Destructor

// Private stuff for internal use (see the sample code)

double GetNumValue(); // returns numeric value of the last returned token
void PascalComments(BOOL bFlag); // Enable / disable Pascal comments
CString GetStrValue(); // returns the string value of the last token

void QuoteChar(int ch); // specifies that this char is used as quote
int LineNo(); // returns the current line number
void PushBack(); // push back a token (can not be used twice)
int NextToken(); // parse next token returns a TT_ constant or a char value
void LowerCaseMode(BOOL bFlag); // Enable / Disable lower case
void SlSlComments(BOOL bFlag); // Enable / Disable "//" comments
void SlStComments(BOOL bFlag); // Enable / Disable "/*" comments
void EolIsSignificant(BOOL bFlag); // Is true is set EOL is returned by Next Token as a token
void ParseNumbers(); // Enables number parsing (integer / double in normal format)
void ResetSyntax(); // reset syntax
void WordChars(int cLow, int cHi); // specify that the characters in the range are word characters
void WhiteSpaceChars(int cLow, int cHi); // specify that the characters in the range are white space characters
void OrdinaryChars(int cLow, int cHi); // specify that the characters in the range are ordinary characters
void OrdinaryChar(int ch); // specify that the character is a ordinary character
void CommentChar(int ch); // specify comment char

How to use the CStringTokenizer class:

you must include in your file

#include "StringTokenizer.h"

sample code for using the string tokenizer class:

    CString str;

// sample string
str = _T(“cwsddde1231+-“asdfgasd”-{dfsdf}iwreu/*dsfghsdgf*/fgdfg//wejfshg”);
str += TT_EOF; // add EOF to the string end

CStringTokenizer strtok(str); // String Tokenizer class

int val;
while((val = strtok.NextToken())!=TT_EOF) // parse the string
// display token code and str value
CString msg;
msg.Format (“%d %s”,val,strtok.GetStrValue());

This class is writen to be used at many types of lexical analyzers, you can inherit
your own lexical analyzer class from this CStringTokenizer class.


Bug corections:

1. String memory alocation error corected

2. Pascal comments bug corected

Sample project:

The sample project shows how you can use the String Tokenizer Class, and how you can
adjust it to your needs, the project also makes some pseudo Pascal, sintactical and some
semantical analisis, the String Tokenizer should be now bugfree, but the PascalLexical,
sintactical or semantical analizer should have bugs (I know 2 of them).

Download demo project – 46 KB

More by Author

Must Read