CSyntaxColorizer: Syntax Highlighting Class

Environment: Visual C++ 6, MFC CRichEditCtrl class

Overview

The CSyntaxColorizer class described here is a fast and versatile class for the syntax highlighting of code. The class is very simple to use, very fast, and highly flexible. The default highlighting mode is VC++, with comments in green, strings in dark blue, and keywords in light blue. The class exposes several methods that can be used for changing these defaults – and color is not the only option. Highlighted words can be made bold, italic, underlined, and more. The exposed methods take a CHARFORMAT structure as a parameter, so whatever text formatting changes can be made with a CHARFORMAT structure can also be made with the keyword, comment, and string formats in CSyntaxColorizer. As well, you are not limited to a predefined and fixed set of keyword groupings. The keywords can be grouped in any way you like, and then manipulated by group. For example, you could assign all compiler directives the group ID of 723 (or whatever) and then set all keywords with group ID 723 to the color red.

The default groupings for the CSyntaxColorizer class are as follows:

Group 0: VC++ keywords such as for, if, while, void, etc
Group 1: VC++ compiler directives such as #define, #include, etc
Group 2: VC++ compiler pragmas such as once, auto_inline, etc

CSyntaxColorizer maintains member variables, (m_cfDefault, m_cfComment and m_cfString), of type CHARFORMAT. These defaults are used when the class initializes its internal lists, and whenever the keyword, comment or string colors are changed using the methods that take a COLORREF parameter instead of a CHARFORMAT parameter. These default structures have a font of Courier New, 10pt size. Naturally, the class exposes methods for changing the defaults (see below).

Here are the declarations in CSyntaxColorizer.h for the exposed methods:

void Colorize(long StartChar,
              long nEndChar,
              CRichEditCtrl *pCtrl);
void Colorize(CHARRANGE cr,
              CRichEditCtrl *pCtrl);

void GetCommentStyle(CHARFORMAT &cf)
{
 cf = m_cfComment;
};

void GetStringStyle(CHARFORMAT &cf)
{
  cf = m_cfString;
};

void GetGroupStyle(int grp, CHARFORMAT &cf);

void GetDefaultStyle(CHARFORMAT &cf)
{
  cf = m_cfDefault;
};

void SetCommentStyle(CHARFORMAT cf)
{
  m_cfComment = cf;
};

void SetCommentColor(COLORREF cr);

void SetStringStyle(CHARFORMAT cf)
{
  m_cfString = cf;
};

void SetStringColor(COLORREF cr);

void SetGroupStyle(int grp, CHARFORMAT cf);

void SetGroupColor(int grp, COLORREF cr);

void SetDefaultStyle(CHARFORMAT cf)
{
  m_cfDefault = cf;
};

void AddKeyword(LPCTSTR Keyword,
                CHARFORMAT cf,
                int grp = 0);

void AddKeyword(LPCTSTR Keyword,
                COLORREF cr,
                int grp = 0);

void ClearKeywordList();

CString GetKeywordList();

CString GetKeywordList(int grp);

Using CSyntaxColorizer

The simplest and quickest way to use this class is to first declare it, then call one of the overloaded Colorize member functions. For example, this code

CSyntaxColorizer sc;
sc.Colorize(0, -1, &m_cRichEditCtrl);

first creates the object, then calls its Colorize method, which in this case colorizes all of the text in the specified rich edit box, using CSyntaxColorizer’s default font, keyword groupings, and colors described above.

If you don’t like the default colors, you can change them:

sc.SetCommentColor(RGB(255,0,0));
sc.SetStringColor(RGB(0,255,0));
sc.SetGroupColor(nMyGroup,RGB(0,0,255));

The preceding methods change the color using the CHARFORMAT structures that would be returned by their respective “Get…” methods.

If it’s more than just colors you don’t like, then you can set the CHARFORMAT structures using

sc.SetCommentStyle(cfMyStyle);
sc.SetStringStyle(cfMyStyle);
sc.SetGroupStyle(nMyGroup,cfMyStyle);

where cfMyStyle is a CHARFORMAT structure that you have created yourself from scratch, or have retrieved using one of the “Get…” methods and then modified to suit.

Adding keywords is easy too. The two AddKeyword methods each take an LPCTSTR as a parameter. The parameter is a NULL terminated list of words separated by commas. For example,

sc.AddKeyword("for,if,while", RGB(255,0,0), 4);

will add the three keywords to the sc object’s list, give them the color red and place them in group 4, using the CHARFORMAT structure currently in m_cfDefault. You can also send a single word as the LPCTSTR parameter. If the keyword already exists in the list, its color and group attributes are overwritten by those passed in the AddKeyword method. The AddKeyword method that takes a CHARFORMAT as a parameter instead of a COLORREF works in a similar fashion.

A word about comments…

By default, CSyntaxColorizer deals with C++ and Java multiline comments starting with /* and single line comments starting with //. If you want single line comments as in VB, (starting with ‘ or REM), simply add “REM” as one of the keywords. For example:

sc.AddKeyword("REM",cfMyStyle,nMyGroup);

CSyntaxColorizer will ignore any style or color settings you specify for the REM keyword, and instead will set them to whatever attributes you have set for the comments. CSyntaxColorizer will automatically treat the ‘ as the start of a single line comment once “REM” is added to the keyword list. Note: if you add only “REM”, then “rem”, “Rem”, etc will not be recognized – you will have to add these as well.

Where a comment starts with // and ends with ‘\’ + ‘\n’ (the line continuation character immediately followed by a newline character) CSyntaxColorizer will recognize it, and treat the next line as a comment line.

Speed

CSyntaxColorizer is pretty fast. Small files under 20K or so are colorized practically instantaneously on my 466MHz machine. Larger files, over 100K, take four or five seconds. CSyntaxColorizer can locate and identify words to be colorized fairly quickly, but the big time hog is not in the algorithm itself – it’s in the text formatting functions of CRichEditCtrl. If you comment out the two lines (line #494 & 495)

pCtrl->SetSel(iStart,iOffset + x);
pCtrl->SetSelectionCharFormat(pskTemp->cf);

then a 100K file takes less than a second, but only comments and strings are colorized.

About the Demo

The demo project is a quick and dirty dialog box with a rich edit control. It has a rather crude editing capablity, the minimum required to show off some of CSyntaxColorizer’s abilities. In particular, if you type in \* to start off a multiline comment, only the one line with the cursor on it will be reformatted. In this case, press the “Format” button. As well, you can load files, but you can’t save them.

Downloads

Download source – 5 Kb
Download demo project – 55 Kb

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read