Using Regular Expressions to Search and Replace Text

A common task when dealing with user input or text files is searching through that input and replacing literals, special characters (such as carriage-return/line feed pairs in files), or patterns (such as phone numbers, contractions, and so forth). In fact, I recently finished working on a chatterbot (an artificial-intelligence application that verbally responds to voice or keyboard input) where this very task was needed in order to "smooth out" the user's input into something that the bot could more readily understand and respond to.

As a result, I wrote this column about performing basic search and replace tasks on user input via the .NET regular expressions classes.

Replacing Literals

The simplest type of search-and-replace functionality is to replace literals—that is, to instruct the regular expressions engine to parse an input string for a given substring and replace it with another. For this purpose, the Regex class defines several overloaded instance and static methods. Let's look at a couple of examples to see how easy this is.

In the following code, the function (ReplaceSimple) performs numerous literal transactions—such as replacing multiple spaces with a single space and reversing pronouns, to-be verbs, and personal pronouns (The chatterbot I worked on always reverses the sentence to make its answers more logical.):

String* ReplaceSimple(String* input)
{
  String* result = input;
 
  try
  {
    result = result->ToUpper();
 
    // remove multiple spaces
    result = result->Replace(result, S"\\s{2,}", " ");
 
    // reverse pronouns
    result = result->Replace(result, S"\\sI\\s", " YOU ");
 
    // reverse to-be verbs
    result = result->Replace(result, S"\\sAM\\s", " ARE ");
 
    // reverse personal pronouns
    result = result->Replace(result, S"\\sMY\\s", " YOUR ");
  }
  catch(Exception* ex)
  {
    Console::WriteLine(ex->Message);
  }
 
  return result;
}

Figure 1 shows an example of running this code snippet.

Figure 1: Example of Performing Simple Literal Replacement Using the Regex Class

As you can see, this is extremely easy. In fact, I could just as easily have used the String::Replace method to do the same job where the syntax is almost identical (with the difference being that the input string is not passed—because String::Replace is an instance method).

Therefore, take a look at a text-replacement task that specifically takes advantage of regular expressions—using groups and substitution patterns.

Using Groups and Substitution Patterns

Previous columns discussed how to define groups during the parsing of an expression. One of the most powerful aspects of regular expressions is the ability to define a named group and then use that group in a search-and-replace scenario. For example, say that you need to parse a document, locate all text formatted a certain way, and then reformat it. Obviously, that's more involved than simply replacing the found text with a literal. It involves using the found text in more of a dynamic way. With regular expressions, you can accomplish this via substitution patterns.

Substitution patterns are essentially special characters that tell the parser how you want to replace the found text. Table 1 lists the most commonly used substitution patterns.

Pattern Meaning
${group} Replaces the found text with the specified group
$n Replaces the found text with the group at index n
$$ Denotes the actual dollar sign because the dollar sign is the substitution pattern prefix
$& Denotes the entire match
$` Substitutes all the text leading up to found text
$' Substitutes all the text following the found text
$+ Substitutes the last group captured
$_ Substitutes the entire match

While all of these patterns are useful to varying degrees, the two that you'll find yourself using the most when performing search-and-replace tasks are the first two. They allow you to specify a named group during the capture (parsing) and then use the found text in the replacement. To illustrate this, consider a real-life scenario where you need to reformat dates. Using regular expressions and the ${group} substitution pattern, the following function (ConvertDateFormat) converts between U.S. and European date formats:

String* ConvertDateFormat(String* input, bool USInput)
{
  String* result;

  if (USInput)
  {
    String* regexp1 = S"(?<month>\\d{1,2})-"
                      S"(?<day>\\d{1,2})-"
                      S"(?<year>\\d{2,4})";
    String* regexp2 = S"${day}-${month}-${year}";
    result = Regex::Replace(input, regexp1, regexp2);
  }
  else
  {
    String* regexp1 = S"(?<day>\\d{1,2})-"
                      S"(?<month>\\d{1,2})-"
                      S"(?<year>\\d{2,4})";
    String* regexp2 = S"${month}-${day}-${year}";
    result = Regex::Replace(input, regexp1, regexp2);
  }

  return result;
}

While far from being an all-encompassing function, the ConvertDateFormat function should show you how easy it is to use groups as replacement text. As you can see, the first regular expression built—regexp1—is the following:

(?<month>\d{1,2})-(?<day>\d{1,2})-(?<year>\d{2,4})

This will cause the parser to create three distinct groups: month, day, and year. Each of these groups is simply a match for one-to-two digits (except year, which is a match for two or four digits) between a hyphen separator character. The second expression—regexp2—then uses the groups defined from the first expression to shift the date's components. Finally, the Regex::Replace method is called and passed the input string (the unformatted date) and the two expressions. Assuming you passed a date such as "8-11-64," the returned result would be an expected transliterated value of "11-8-64."

You might also note that technically the function doesn't need a boolean value because, regardless of what you pass as the second parameter, the first two sets of digits—whether they represent month and day or day and month—are going to be reversed. However, I coded it like this simply to make the processing logic more obvious. Having said that, look at how you could change the code to use the $n substitution pattern and not have the conditional logic:

String* ConvertDateFormat2(String* input)
{
  String* regexp1 = S"(?<first>\\d{1,2})-"
                    S"(?<second>\\d{1,2})-"
                    S"(?<year>\\d{2,4})";
  String* regexp2 = S"$2-$1-${year}";

  return Regex::Replace(input, regexp1, regexp2);
}

In the ConvertDateFormat2 function, I've used the more generic group names of first and second because I don't know which group represents month and which represents day. The regexp2 variable then specifies that a substitution pattern of $2-$1-${year}, which basically tells the parser to replace the found text with the second group, a hyphen, the first group, another hyphen, and then the group named year. Obviously, I could have used the group names again—first and second—but I wanted to show you how to use the group index value.

Looking Ahead

While intentionally simple, the examples presented in this column, along with substitution patterns listed in Table 1, should show you how you can easily introduce powerful search-and-replace functionality in your application. In the next—and final—column on using regular expressions, you'll see an extremely complex—and frequently requested—regular expression that enables you to parse a body of text for virtually any e-mail address format.



About the Author

Tom Archer - MSFT

I am a Program Manager and Content Strategist for the Microsoft MSDN Online team managing the Windows Vista and Visual C++ developer centers. Before being employed at Microsoft, I was awarded MVP status for the Visual C++ product. A 20+ year veteran of programming with various languages - C++, C, Assembler, RPG III/400, PL/I, etc. - I've also written many technical books (Inside C#, Extending MFC Applications with the .NET Framework, Visual C++.NET Bible, etc.) and 100+ online articles.

Comments

  • C++?

    Posted by brothdb on 12/04/2006 12:04pm

    Hello, Do you have a similiar example about replacing literals in C++? I need to replace pronounss like "he" to she or he and "him" to her or him. Mail addy is brothdb@yahoo.com. Thanks in advance

    Reply
  • C++?

    Posted by brothdb on 12/04/2006 11:30am

    Hello, Do you have a similiar example about replacing literals in C++? I need to replace pronounss like "he" to she or he and "him" to her or him. Mail addy is brothdb@yahoo.com. Thanks in advance

    Reply
  • C++?

    Posted by brothdb on 12/04/2006 11:08am

    Hello, Do you have a similiar example about replacing literals in C++? I need to replace pronounss like "he" to she or he and "him" to her or him. Mail addy is brothdb@yahoo.com. Thanks in advance

    Reply
  • C++?

    Posted by brothdb on 12/04/2006 10:32am

    Hello, Do you have a similiar example about replacing literals in C++? I need to replace pronounss like "he" to she or he and "him" to her or him. Mail addy is brothdb@yahoo.com. Thanks in advance

    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • On-demand Event Event Date: September 10, 2014 Modern mobile applications connect systems-of-engagement (mobile apps) with systems-of-record (traditional IT) to deliver new and innovative business value. But the lifecycle for development of mobile apps is also new and different. Emerging trends in mobile development call for faster delivery of incremental features, coupled with feedback from the users of the app "in the wild." This loop of continuous delivery and continuous feedback is how the best mobile …

  • Java developers know that testing code changes can be a huge pain, and waiting for an application to redeploy after a code fix can take an eternity. Wouldn't it be great if you could see your code changes immediately, fine-tune, debug, explore and deploy code without waiting for ages? In this white paper, find out how that's possible with a Java plugin that drastically changes the way you develop, test and run Java applications. Discover the advantages of this plugin, and the changes you can expect to see …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds