Removing whitespace and rewriting comments, using regex searches

This article was contributed by Chad
Loder
.

Here are two very short macros that illustrate the regular
expession find and replace features available both from the GUI
and within the macro environment.

The first macro removes all trailing whitespace from all lines
in the file. A naive implementation would loop over each line in
the file, call RTrim() on the line, and then rewrite it. This
implementation uses regex search/replace to accomplish the same
task in a fraction of the time. It also lets you Undo in a single
step.

The regular expression used is:

":b+($)" 

Reading left to right, this expession means find a:

  • whitespace char, ":b",
  • actually one or more occurences of it, "+",
    followed by:
  • start a grouping, "("
  • end of line, "$"
  • end grouping, ")"

or in other words, find (one or more occurences of a
whitespace char) followed immediately by and end of line.

And the replace value (the second argument to
ActiveDocument.Selection.ReplaceText) is also special:

"1" 

which means replace the found pattern with everything inside
the first grouping. Note that the only thing in the grouping is
the end of line pattern, so this means that we replace the entire
pattern (whitespace + end of line) with just the end of line.


Sub RemoveTrailingWhitespace()
'DESCRIPTION: Remove all trailing whitespace from the document
 Dim curCol, curLine
 '-- save position
 curCol = ActiveDocument.Selection.CurrentColumn
 curLine = ActiveDocument.Selection.CurrentLine

 '-- replace all the trailing whitespace
 ActiveDocument.Selection.SelectAll
 ActiveDocument.Selection.ReplaceText ":b+($)", "1", dsMatchRegExp

 '-- restore position
 ActiveDocument.Selection.MoveTo curLine, curCol

End Sub

Here is another macro to illustrate the same concept. This
macro replaces all single-line comments of the type

/* this is a comment */ 

with

// this is a comment

Note some important things about this regular expression.

First of all, the * (asterisk) character is a special
character in regular expressions, so it has to be escaped by
putting a (backslash) before it. Also, there are two groupings
defined. The first grouping is all the text between the
"/*" and the "*/", minus leading and trailing
whitespace. The second grouping is the end of line.

Also, it is important to note that since a // comment makes
the entire remainder of the line part of the comment, we have to
be careful not to replace things like this:

for (int i = 1 /* start at second element */ ; i < n; i++) 

with the obviously incorrect:

for (int i = 1 // start at second element ; i < n; i++) 

so we make sure that the comment must have only whitespace
between the ending */ and the end of the line. That is
represented by the

":b*" 

before the end of line grouping. This says zero or more
whitespace characters.



Sub ConvertStarComments()
'DESCRIPTION: Converts single-line star comments To single line slash
comments.

 Dim curCol, curLine
 '-- save position
 curCol = ActiveDocument.Selection.CurrentColumn
 curLine = ActiveDocument.Selection.CurrentLine

 '-- replace all the single-line star comments
 ActiveDocument.Selection.SelectAll
 ActiveDocument.Selection.ReplaceText "/*(.+):b**/:b*($)",
"//12", dsMatchRegExp

 '-- restore position
 ActiveDocument.Selection.MoveTo curLine, curCol

End Sub

More by Author

Must Read