Splitting Strings with Regex in Managed C++ Applications

People who are familiar with regular expressions tend to think of them only in conjunction with searching a string for specific literals or patterns (such as e-mail address formats). However, one very nice feature that they often overlook is the ability to split strings into substrings based on defined delimiters or tokens.

Before I started using regular expressions, I—like many programmers who started in C—used the strtok function to delimit strings. The following is an example of using the strtok function to split a comma-delimited string into its various tokens:

void strtoktest()
{
  char input[] = _T("Tom,Archer,Programmer/Trainer,CodeGuru");
  char delimiters[] = _T(",");

  for (char* token = strtok(input, delimiters);
       token != NULL;
       token = strtok(NULL, delimiters))
  {  
    Console::WriteLine(token);
  }
}

The output is as follows:

Tom
Archer
Programmer/Trainer
CodeGuru

As you can see, strtok is pretty basic and easy to use. In fact, the strtok function is at the heart of a popular comma-delimited file class that I use much more than one would assume in this day and age. However, let's face facts. The function really is a hold-over from the C day; it's not object-oriented and certainly not very intuitive (I've used it for years and have to look up the syntax every single time I need it). A more modern approach is using the .NET Regex class and its Split method. Using the same input as the previous example, here's how you would delimit the same string input using the Regex class:

using namespace System::Text::RegularExpressions;
...

void regextest()
{
  String* input = _T("Tom,Archer,Programmer/Trainer,CodeGuru");
  Regex* regex = new Regex(S",");
  String* tokens[] = regex->Split(input);

  for (int i = 0; i < tokens->Length; i++)
  {
    Console::WriteLine(tokens[i]);
  }
}

As you can see, it only takes two lines of code: instantiating a Regex object (passing the delimiter list) and calling the Split method. One more advantage of using Regex::Split instead of strtok is that the result of the single call to Split is an array of all the strings (tokens). Obviously, it's not that much work to write the code to stuff the strings into an array yourself, but this is just one less step if your delimiting function is being called by another function that needs all of the strings returned in an array.

Looking Ahead

For various reasons—probably inertia more than any other—I never really got into using regular expressions until I started programming with .NET back in 2000. However, regular expressions really do make a lot of basic chores so much easier. I sometimes kick myself for not having used them much sooner. In future articles, I'll cover more aspects of the Regex class, such as using the Match and MatchCollection classes, how to properly use captures and groups, and searching for complex patterns such as e-mail addresses.



About the Author

Tom Archer - MSFT

I am a Program Manager and Content Strategist for the Microsoft MSDN Online team managing the Windows Vista and Visual C++ developer centers. Before being employed at Microsoft, I was awarded MVP status for the Visual C++ product. A 20+ year veteran of programming with various languages - C++, C, Assembler, RPG III/400, PL/I, etc. - I've also written many technical books (Inside C#, Extending MFC Applications with the .NET Framework, Visual C++.NET Bible, etc.) and 100+ online articles.

Comments

  • Please Help

    Posted by Mahee on 10/22/2012 10:21pm

    C/C++ program to return a list which will contain each word and number of occurences in the sentence "cap is map then map did buzz cap many map".

    Reply
  • Dividng a string of characters inot parts

    Posted by lexito on 05/23/2008 11:06am

    how can you display ex. 234578E, in parts of as
    23
    45
    78
    E
      when it is pases to a function

    Reply
  • good

    Posted by murongtianfeng on 12/06/2006 10:08pm

    it is usefull

    Reply
  • Splitting strings with multiple delimiters

    Posted by Black Sabbath on 11/03/2005 04:25pm

    I have string for e.g. XTOP/X1:Z, I have to split that into XTOP X1 Z Can you suggest a way other that paring each character and comparing it with each delimiter type(/,: etc.) -Black Sabbath

    Reply
  • How about String::Split ?

    Posted by darwen on 02/23/2005 06:34pm

    What about String::Split ? It does the same as your example. In fact the RegEx class can be used to do far more complicated pattern-matched splits as you have said here.

    • If you have any suggestions...

      Posted by Tom Archer on 02/23/2005 07:10pm

      I've used regex quite a bit over the past few years and will definitely want to cover things like groups and captures as that really seems to be an area that few people understand. If you have any specific areas that you think should be covered, let me know. I am a bit wary of getting into the specifics of the patterns themselves as technically that's nothing to do with .NET or Managed C++. What's you opinion on that?

      Reply
    • I look forward to the next one...

      Posted by darwen on 02/23/2005 06:58pm

      -

      Reply
    • Absolutely

      Posted by Tom Archer on 02/23/2005 06:49pm

      That's why this is the first in a series. Not everyone is experienced with regex so I wanted to start with the basics and gradually work towards more complex examples.

      Reply
    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: November 20, 2014 @ 2:00 p.m. ET / 11:00 a.m. PT Are you wanting to target two or more platforms such as iOS, Android, and/or Windows? You are not alone. 90% of enterprises today are targeting two or more platforms. Attend this eSeminar to discover how mobile app developers can rely on one IDE to create applications across platforms and approaches (web, native, and/or hybrid), saving time, money, and effort and introducing apps to market faster. You'll learn the trade-offs for gaining long …

  • Live Event Date: October 29, 2014 @ 11:00 a.m. ET / 8:00 a.m. PT Are you interested in building a cognitive application using the power of IBM Watson? Need a platform that provides speed and ease for rapidly deploying this application? Join Chris Madison, Watson Solution Architect, as he walks through the process of building a Watson powered application on IBM Bluemix. Chris will talk about the new Watson Services just released on IBM bluemix, but more importantly he will do a step by step cognitive …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds