CSyntaxColorizer: Syntax Highlighting Class

Environment: Visual C++ 6, MFC CRichEditCtrl class

Overview

The CSyntaxColorizer class described here is a fast and versatile class for the syntax highlighting of code. The class is very simple to use, very fast, and highly flexible. The default highlighting mode is VC++, with comments in green, strings in dark blue, and keywords in light blue. The class exposes several methods that can be used for changing these defaults - and color is not the only option. Highlighted words can be made bold, italic, underlined, and more. The exposed methods take a CHARFORMAT structure as a parameter, so whatever text formatting changes can be made with a CHARFORMAT structure can also be made with the keyword, comment, and string formats in CSyntaxColorizer. As well, you are not limited to a predefined and fixed set of keyword groupings. The keywords can be grouped in any way you like, and then manipulated by group. For example, you could assign all compiler directives the group ID of 723 (or whatever) and then set all keywords with group ID 723 to the color red.

The default groupings for the CSyntaxColorizer class are as follows:

Group 0: VC++ keywords such as for, if, while, void, etc
Group 1: VC++ compiler directives such as #define, #include, etc
Group 2: VC++ compiler pragmas such as once, auto_inline, etc

CSyntaxColorizer maintains member variables, (m_cfDefault, m_cfComment and m_cfString), of type CHARFORMAT. These defaults are used when the class initializes its internal lists, and whenever the keyword, comment or string colors are changed using the methods that take a COLORREF parameter instead of a CHARFORMAT parameter. These default structures have a font of Courier New, 10pt size. Naturally, the class exposes methods for changing the defaults (see below).

Here are the declarations in CSyntaxColorizer.h for the exposed methods:

void Colorize(long StartChar, 
              long nEndChar, 
              CRichEditCtrl *pCtrl);
void Colorize(CHARRANGE cr, 
              CRichEditCtrl *pCtrl);

void GetCommentStyle(CHARFORMAT &cf) 
{ 
 cf = m_cfComment; 
};

void GetStringStyle(CHARFORMAT &cf) 
{ 
  cf = m_cfString; 
};

void GetGroupStyle(int grp, CHARFORMAT &cf);

void GetDefaultStyle(CHARFORMAT &cf) 
{ 
  cf = m_cfDefault; 
};

void SetCommentStyle(CHARFORMAT cf) 
{ 
  m_cfComment = cf; 
};

void SetCommentColor(COLORREF cr);

void SetStringStyle(CHARFORMAT cf) 
{ 
  m_cfString = cf; 
};

void SetStringColor(COLORREF cr);

void SetGroupStyle(int grp, CHARFORMAT cf);

void SetGroupColor(int grp, COLORREF cr);

void SetDefaultStyle(CHARFORMAT cf) 
{ 
  m_cfDefault = cf; 
};

void AddKeyword(LPCTSTR Keyword, 
                CHARFORMAT cf, 
                int grp = 0); 

void AddKeyword(LPCTSTR Keyword, 
                COLORREF cr, 
                int grp = 0);

void ClearKeywordList();

CString GetKeywordList();

CString GetKeywordList(int grp);

Using CSyntaxColorizer

The simplest and quickest way to use this class is to first declare it, then call one of the overloaded Colorize member functions. For example, this code
CSyntaxColorizer sc;
sc.Colorize(0, -1, &m_cRichEditCtrl);
first creates the object, then calls its Colorize method, which in this case colorizes all of the text in the specified rich edit box, using CSyntaxColorizer's default font, keyword groupings, and colors described above.

If you don't like the default colors, you can change them:

sc.SetCommentColor(RGB(255,0,0));
sc.SetStringColor(RGB(0,255,0));
sc.SetGroupColor(nMyGroup,RGB(0,0,255));
The preceding methods change the color using the CHARFORMAT structures that would be returned by their respective "Get..." methods.

If it's more than just colors you don't like, then you can set the CHARFORMAT structures using

sc.SetCommentStyle(cfMyStyle);
sc.SetStringStyle(cfMyStyle);
sc.SetGroupStyle(nMyGroup,cfMyStyle);
where cfMyStyle is a CHARFORMAT structure that you have created yourself from scratch, or have retrieved using one of the "Get..." methods and then modified to suit.

Adding keywords is easy too. The two AddKeyword methods each take an LPCTSTR as a parameter. The parameter is a NULL terminated list of words separated by commas. For example,

sc.AddKeyword("for,if,while", RGB(255,0,0), 4);
will add the three keywords to the sc object's list, give them the color red and place them in group 4, using the CHARFORMAT structure currently in m_cfDefault. You can also send a single word as the LPCTSTR parameter. If the keyword already exists in the list, its color and group attributes are overwritten by those passed in the AddKeyword method. The AddKeyword method that takes a CHARFORMAT as a parameter instead of a COLORREF works in a similar fashion.

A word about comments...

By default, CSyntaxColorizer deals with C++ and Java multiline comments starting with /* and single line comments starting with //. If you want single line comments as in VB, (starting with ' or REM), simply add "REM" as one of the keywords. For example:
sc.AddKeyword("REM",cfMyStyle,nMyGroup);
CSyntaxColorizer will ignore any style or color settings you specify for the REM keyword, and instead will set them to whatever attributes you have set for the comments. CSyntaxColorizer will automatically treat the ' as the start of a single line comment once "REM" is added to the keyword list. Note: if you add only "REM", then "rem", "Rem", etc will not be recognized - you will have to add these as well.

Where a comment starts with // and ends with '\' + '\n' (the line continuation character immediately followed by a newline character) CSyntaxColorizer will recognize it, and treat the next line as a comment line.

Speed

CSyntaxColorizer is pretty fast. Small files under 20K or so are colorized practically instantaneously on my 466MHz machine. Larger files, over 100K, take four or five seconds. CSyntaxColorizer can locate and identify words to be colorized fairly quickly, but the big time hog is not in the algorithm itself - it's in the text formatting functions of CRichEditCtrl. If you comment out the two lines (line #494 & 495)
pCtrl->SetSel(iStart,iOffset + x);
pCtrl->SetSelectionCharFormat(pskTemp->cf);
then a 100K file takes less than a second, but only comments and strings are colorized.

About the Demo

The demo project is a quick and dirty dialog box with a rich edit control. It has a rather crude editing capablity, the minimum required to show off some of CSyntaxColorizer's abilities. In particular, if you type in \* to start off a multiline comment, only the one line with the cursor on it will be reformatted. In this case, press the "Format" button. As well, you can load files, but you can't save them.

Downloads

Download source - 5 Kb
Download demo project - 55 Kb


Comments

  • good job

    Posted by wshcdr on 12/23/2009 10:54am

    well done

    Reply
  • Updating syntax as text is entered.

    Posted by ahoodin on 06/17/2004 12:06pm

    Here is a project where I added functionality to update the text as you type. It is still a work in progress. Please feel free to comment. If you have an update, you can email me ahoodin@boardermail.com. http://www.codeguru.com/forum/showthread.php?s=f37f51f59cae3bcba7dadf91e159036b&threadid=298851&highlight=richedit

    Reply
  • Stops working if RichEdit20 is used

    Posted by Legacy on 11/03/2003 12:00am

    Originally posted by: Mehak Lala

    I am trying to use Richedit20 and once i use it everything turns into comment color as my starting line is a comment character. Why should richedit20 stop the parsing .. do i have to do soem additional setting.

    Any inputs greatly appreciated

    Mehak

    • How to solve!!

      Posted by MrTommek on 03/27/2004 07:48pm

      Sorry, you have to change from *(m_pTableTwo + '\'') = SQEND; *(m_pTableThree + '\n') = SLEND; to *(m_pTableTwo + '\'') = SQEND; *(m_pTableThree + '\r') = SLEND;

      Tommek

      Reply
    • How to solve!!

      Posted by MrTommek on 03/27/2004 07:43pm

      For all people who have/had the same problem, here is a solution: Change in the CSyntaxColorizer-class the line 98 from *(m_pTableTwo + '\'') = SQEND; *(m_pTableThree + '\r') = SLEND; to *(m_pTableTwo + '\'') = SQEND; *(m_pTableThree + '\n') = SLEND; !! Now the Colorizer-class will work for RichEdit20. Greez Tommek

      Reply
    Reply
  • Paste operation

    Posted by Legacy on 10/02/2003 12:00am

    Originally posted by: Tomasz Kulig

    There is problem with pasting text from html page to this control. (or another RTF text).

    1. Mark colorized line on html page.
    2. Paste it to CSyntaxColorizer. At this moment all attributes of text (color, fonts, etc) are similiar to html document.
    3. Move cursor in the middle of line and press <Enter>.
    Now second part of line is formated but height of line is improper.

    There is also problem when you put html table to this control.

    And my question. Mayby one of you can help me. How to check and translate clipboard content when it differs from simple text?

    Reply
  • Where to obtain TOM

    Posted by Legacy on 07/15/2003 12:00am

    Originally posted by: Jason Shelley

    Can anyone tell me where I can obtain the "tom.h" file and any other requisits for using TOM please? I have vc++ 6.0, office 2000, but no TOM :-(

    Reply
  • How to use it with CFontDialog class instead of CDialog

    Posted by Legacy on 05/06/2003 12:00am

    Originally posted by: Parinda Rivonkar

    It doesn't work with CFontDialog class.

    Reply
  • Create the above as an ActiveX control

    Posted by Legacy on 04/08/2003 12:00am

    Originally posted by: Pritpal Singh Mudher

    The work I have seen here is excellent. Is it possible to see this as an active X control
    

    Reply
  • Little bug with printf(" I said: "\bla bla\" \n");

    Posted by Legacy on 08/27/2002 12:00am

    Originally posted by: Dennis

    this is colorized wrong:
    printf(" I said: "\bla bla\" \n");
    The \" going wrong.
    I don't know how to fix it but I'm sure you can :)

    Greetings Dennis.

    Reply
  • How to add line numbers on the left side of the document frame?

    Posted by Legacy on 07/13/2002 12:00am

    Originally posted by: Ahmad Sebastian

    Hi, i'm currently working on a compiler project. I create
    
    the IDE from CRichEdit derived class, and I want my editor
    to show line numbers on the left side of the document on
    the same frame. The line number is not part of document,
    only for guideline.

    The illustration as follows (assume it's a window pic):

    ---------------------------
    | myprog.txt X |
    ---------------------------
    | 0001 | #include "ha.h" ||
    | 0002 | #define blabla ||
    | 0003 | class X ||
    | | ||
    | | ||
    ---------------------------
    ---------------------------

    Could anyone show me how to do that ?

    Reply
  • Weird behaviors for file over 64K

    Posted by Legacy on 06/12/2002 12:00am

    Originally posted by: L2L

    Any one experience any weird behaviors for files larger than 64K? For instance, I can't insert new text from keyboard after I opened a large file of 180K. I can delete one char and type in another one but not insert a new char. Strange indeed. Can someone suggest a work around or point out any thing I overlooked. Thanks.

    Reply
  • Loading, Please Wait ...

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Mobile is introducing sweeping changes throughout your workplace. As a senior stakeholder driving mobile initiatives in your organization, you may be lost in a sea of technologies and claims from vendors promising rapid delivery of applications to your employees, customers, and partners. To help explain some of the topics you will need to be aware of, and to separate the must-haves from the nice-to-haves, this reference guide can help you with applying a mobile strategy in the context of application …

  • Protecting business operations means shifting the priorities around availability from disaster recovery to business continuity. Enterprises are shifting their focus from recovery from a disaster to preventing the disaster in the first place. With this change in mindset, disaster recovery is no longer the first line of defense; the organizations with a smarter business continuity practice are less impacted when disasters strike. This SmartSelect will provide insight to help guide your enterprise toward better …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds