Environment: Visual Basic and Visual C++ forums on CodeGuru
Introduction
I wrote a small utility that allows you to do some basic syntax highlighting when pasting C++ or Visual Basic code into vBulletin groups, instead of using the ugly PHP codes.
Examples
Plain code:
int f(float d)
// Checks whether the number is positive or negative
{
if (f > 0) {
printf("positive\n");
return 1;
} else {
printf("negative or zero\n");
return -1;
}
}
The listing looks different with PHP syntax highlighting. Here is the PHP code:
int f(float d)
// Checks whether the number is positive or negative
{
if (f > 0) {
printf("positive\n");
return 1;
} else {
printf("negative or zero\n");
return -1;
}
}
And here’s how it looks with my small utility:
int f(float d)
// Checks whether the number is positive or negative
{
if (f > 0) {
printf("positive\n");
return 1;
} else {
printf("negative or zero\n");
return -1;
}
}
User Interface
The usage is pretty simple: You select the text you want to convert, copy it to the Clipboard (for example, with Ctrl+C), and left-click on the taskbar icon for the program. Then, paste it into CodeGuru. Right-clicking brings up a few buttons that let you choose the current parser or exit the program. For best results, it is recommended that your taskbar be at the bottom of the screen.
Configuration Files
When you first run the program. it will generate five configuration files. The main one is called SynHlt.cfg and contains the names of the parsers and the respective filenames of their configuration files. The default includes parsers for C++ and VB for either vBulletin or for HTML.
The Configuration files for the parsers are not complicated to change. A simple example is this:
#Keywords int float void char #Rules // \n [COLOR=green] [/COLOR] 1 /* */ [COLOR=green] [/COLOR] 3 " " [COLOR=red] [/COLOR] 3 \ #Symbols \r \e \t \w\w #KeywordColorBegin [COLOR=blue] #KeywordColorEnd [/COLOR]
Special Characters in the Input File
To specify special characters in the input file, the following tags are used:
Character | Name | Character Number or Explanation |
\n | newline | 10 |
\r | Carriage return | 13 |
\t | Tab | 9 |
\e | Empty character | This ends a string if it appears in a string, or maps an input sequence to nothing. In the example, the carriage return is mapped to nothing. |
\w | White space | 32 |
\# | # | This is used because comments and section starts in the input file begin with #. |
Rules
The first string in the line is the start string for the rule; the second one, the end string. Then comes the start code tag and finally the end code tag. The number (0, 1, or 2) indicates whether to include the beginning (1) and the end (2) string inside the code tag. For example, for the “//” comments it’s nicer not to include the new line inside the code tag.
An optional string can be used for escaping the end string. This means that if in a string, for example, we find a “\”, we’ll just skip the next character and hence not end the string if it was actually \”. One limitation is the fact that each rule must have a unique starting string. If two rules have the same one, the first rule will simply take precedence.
Symbols
This is the translation of the special symbols. For example, here a tab is converted to two spaces. This only works for one character at a time in the input, though.
Implementation Details
Win32-based program
The program is written entirely in straight Win32. I did this as an experiment and I’m pleased with the result. Unfortunately, it makes the source a bit harder to understand for people who are used to MFC. The file SyntaxHlt.cpp contains the handling of the Win32 stuff. It creates the main Window and does the message handling for the main window and the About Dialog. It also contains a few global variables, the most interesting of which is CSynHltButtons *g_Buttons. This class holds both the buttons and the parsers.
The Parser
The parser is implemented in Highlighter.h in the CSimpleParser class. It holds a list of rules (CRule), a list of keywords (CFSM), and a map for the special symbols (std::map<char, char *>). I actually use quite a few of the Standard Template Library (STL) containers to make life a little easier. The class CRule stores all the information about a rule (as described in the paragraph about the configuration files) and lets me check easily if the beginning of a certain string matches the beginning or end string of a rule. The class CFSM holds the keywords and is actually a simple tree where each node can have n children. Its main use is to let me check easily whether the beginning of a string is actually a keyword.
The CSimpleParser::ParseString function is where the actual work is done. For simplicity, it uses an std::string for output. Only one CRule can be active at a time; this simplifies the design quite a bit. So, it first checks whether a rule is active and if this is so, it performs only parsing of special symbols and checking whether the rule ends. If no rule is active, it checks whether a new rule begins, translates special symbols, and finally, if no rule has begun, it parses for keywords.
Writing/Reading a Parser from a File/the Registry
This functionality is provided by the CParseFromFile subclass (also defined in Highlighter.h). It reads the configuration for a parser from either a string (usually read from a configuration file) or from the Registry. The final program does not make use of the Registry functionality, but I left it in there for reference and maybe future use.
The SynHlt.cfg File
This file is handled by the CParserCollection class that can read in a configuration file and load the parsers accordingly or generate the default configuration files.
The Buttons
Because the program is straight Win32, I wanted to make it look a bit nicer than standard windows and buttons. So, I wrote the CButton class that draws a rounded rectangular button and does some simple message processing. It is defined in Buttons.h. The CButtonCollection handles a set of buttons and their message processing. Finally, the CSynHltButtons class is derived from both CButtonCollection and CParserCollection. This means that it can load a configuration file and set the text of its buttons accordingly. It also handles switching between the different parsers by clicking on their respective buttons.
Other Win32 Issues
The Clipboard functionality is provided by two functions, GetClipData and PutClipData. They both only handle the clipboard format CF_TEXT. Both are defined in SyntaxHlt.cpp.
The transparency of the buttons is achieved by setting the window region of the main window to include only the area occupied by the buttons. This is done inside ShowCfgButtons.
Finally, the taskbar icon is managed by the ShellIcon class defined in taskbar.h. It’s simplistic in that it loads the icon when an object of the class is created and unloads it when the object is destroyed.
Conclusion
This program demonstrates a few tricks of Win32 programming as well as a very simple design for a configurable parser. The program in itself is useful because it enables you to post nicely formatted code in vBulletin forums such as CodeGuru. Other languages can be handled by editing the configuration files. Other color schemes are easily obtained by editing the configuration files as well.
As extensions, there could be the coloring of number constants inside source code and the possibility to make keywords non–case-sensitive. This would be useful for Visual Basic programmers who don’t copy the source code directly from inside the VB environment.