Working with Regular Expressions in .NET

WEBINAR: On-demand webcast

How to Boost Database Development Productivity on Linux, Docker, and Kubernetes with Microsoft SQL Server 2017 REGISTER >

Introduction

Regular Expressions provide a standard and powerful way of pattern matching for text data. The .NET Framework exposes its regular expression engine via System.Text.RegularExpressions Namespace. The Regex class is the primary way for developers to perform pattern matching, search and replace, and splitting operations on a string. Many beginners avoid using regular expressions because of the apparently difficult syntax. However, if your application calls for heavy pattern matching then learning and using regular expressions over ordinary string manipulation functions is strongly recommended. This article is intended to give beginners a quick overview of .NET Framework's offerings for pattern matching using regular expressions.

Note:
This article will not teach you how to write regular expressions. It focuses primarily on using classes from System.Text.RegularExpressions namespace. It is assumed that you are already familiar with regular expression syntax and are able to write basic regular expressions.

Basic Terminology

Before you go any further let's quickly glance over the basic terminology used in the context of regular expressions.

  • Capture : When you perform pattern matching using a regular expression result of a single sub-expression match is called as a Capture. The Capture and CaptureCollection classes represent a single capture and a collection of captures respectively.
  • Group : A regular expression often consists of one or more Groups. A group is represented by rounded brackets within a regular expression (the whole regular expression itself is considered as a group). There can be zero or more captures for a single group. The Group and GroupCollection classes represent a single group and a collection of groups respectively.
  • Match : A result obtained after a single match of a regular expression is termed as a Match. A match contains one or more groups. The Match and MatchCollection classes represent a single match and a collection of matches respectively.

Thus the relation between the regular expression related objects is:

Regex class--> MatchCollection--> Match objects--> GroupCollection--> Group objects--> CaptureCollection--> Capture objects

The Regex Class

The Regex class along with few more support classes represents the regular expression engine of .NET Framework. The Regex class allows you to perform pattern matching, search and replace, and splitting on the source strings. You can use the Regex class in two ways, viz. calling static methods of Regex class or by instantiating Regex class and then calling instance methods. The difference between these two approaches will be clear in the section related to performance. The following table lists some of the important methods of the Regex class along with the purpose of each:

Method

Description

IsMatch

IsMatch() method is used to determine whether a string confirms a specified regular expression. It returns true if the string matches the specified pattern else returns false.

Match

Match() method searches a string for a specified pattern and returns the first occurrence of the pattern. Returns a Match object.

Matches

Matches() method searches a string for all the occurrences of a pattern. Returns a MatchCollection object.

Replace

Replaces all the occurrences of a pattern with a specified string value.

Split

Splits a string based on a specified pattern as a delimiter and returns the parts of the string as an array.

In the following sections you are going to use many of the methods mentioned above.

Pattern Matching Using Regex Class

In this section you will use the pattern matching abilities of the Regex class. Begin by creating a new Console Application and import System.Text.RegularExpressions namespace at the top.

using System.Text.RegularExpressions;

Using IsMatch() Method

In this example you will check whether a string is a valid URL. Key-in the following code in the Main() method.

static void Main(string[] args)
{
    string source = args[0];
    string pattern = @"http(s)?://([w-]+.)+[w-]+(/[w- ./?%&=]*)?";
 
    bool success = Regex.IsMatch(source, pattern);
    if (success)
    {
        Console.WriteLine("Entered string is a valid URL!");
    }
    else
    {
        Console.WriteLine("Entered string is not a valid URL!");
    }
    Console.ReadLine();
}

The Main() method receives the string to be tested as a command line argument. The pattern string variable holds the regular expression for verifying URLs. The code then calls the IsMatch() static method on the Regex class and passes the source and pattern strings to it. Depending on the returned boolean value a message is displayed to the user.

You could have achieved the same result by creating an instance of Regex class and then calling IsMatch() method on it, as shown below:

Regex ex = new Regex(pattern);
success = ex.IsMatch(source);

Using Match() Method

In order to see how Match() method can be used, modify the Main() method as shown below:

static void Main(string[] args)
{
    string source = args[0];
    string pattern = @"http(s)?://([w-]+.)+[w-]+(/[w- ./?%&=]*)?";
    Match match = Regex.Match(source, pattern);
    if(match.Success)
    {
        Console.WriteLine("Entered string is a valid URL!");
        Console.WriteLine("{0} Groups", match.Groups.Count);
        for(int i=0;i<match.Groups.Count;i++)
        {
            Console.WriteLine("Group {0} Value = {1} Status = {2}", 
            i, match.Groups[i].Value, match.Groups[i].Success);
            
            Console.WriteLine("t{0} Captures", match.Groups[i].Captures.Count);
            
            for (int j = 0; j < match.Groups[i].Captures.Count; j++)
            {
                Console.WriteLine("tt Capture {0} Value = {1} Found at = {2}",
                j, match.Groups[i].Captures[j].Value, match.Groups[i].Captures[j].Index);
            }
        }
    }
    else
    {
        Console.WriteLine("Entered string is not a valid URL!");
    }
    Console.ReadLine();
}

The code shown above makes use of the Match() method to perform pattern matching. As mentioned earlier the Match() method returns an instance of Match class that represents the first occurrence of the pattern. The Success property of the Match object tells you whether the pattern matching was successful or not. A for loop then iterates through the Groups collection (GroupCollection object). With each iteration, the group searched for and its status is outputted. Further, the Captures collection of each group is also iterated and with each iteration the captured value and its index in the string is outputted. The following figure shows a sample run of the above application.

A sample run of the application
Figure 1: A sample run of the application

Observe the above figure carefully. Our pattern contains 4 groups (three in rounded brackets of the regular expression and the whole expression) so Count property of the Groups collection returns 4. The first group (the whole expression) has value https://www.codeguru.com/. The second group has value of s (from https). The third group has two captures - www. and codeguru. Finally, the last group has value of / (the / at the end of the URL).

Using Matches() Method

Matches() method is similar to Match() method but returns a collection of Match objects (MatchCollection). You can then iterate through all of the Match instances and see various group and capture values. The following code illustrates how this is done:

MatchCollection matches = Regex.Matches(source, pattern);
 
foreach (Match match in matches)
{
    Console.WriteLine("Match Value = {0}",match.Value);
    Console.WriteLine("============");
    if (match.Success)
    {
        Console.WriteLine("Entered string is a valid URL!");
        Console.WriteLine("{0} Groups", match.Groups.Count);
        for (int i = 0; i < match.Groups.Count; i++)
        {
            Console.WriteLine("Group {0} Value = {1} Status = {2}", 
            i, match.Groups[i].Value, match.Groups[i].Success);
            Console.WriteLine("t{0} Captures", match.Groups[i].Captures.Count);
            for (int j = 0; j < match.Groups[i].Captures.Count; j++)
            {
                Console.WriteLine("tt Capture {0} Value = {1} Found at = {2}", 
                j, match.Groups[i].Captures[j].Value, match.Groups[i].Captures[j].Index);
            }
        }
    }
    else
    {
        Console.WriteLine("Entered string is not a valid URL!");
    }
 
}

The following figure shows a sample run of the above code:

Matches() method returns two Match objects
Figure 2: Matches() method returns two Match objects

Notice how Matches() method has returned two Match objects (one for http://site1.com and other for http://site2.com).

Search and Replace Using Regex Class

The Regex class not only allows you to perform pattern matching but also allows you to search and replace strings. Consider, for example, that you are developing a discussion forum in ASP.NET. For the sake of reducing SPAM and promotional content you want to scan forum posts made by new members for URLs and then replace the URLs with ****. Something like this can easily be done with the search and replace abilities of the Regex class. Let's see how.

static void Main(string[] args)
{
    string source = args[0];
    string pattern = @"http(s)?://([w-]+.)+[w-]+(/[w- ./?%&=]*)?";

    string result = Regex.Replace(source,pattern,"[*** URLs not allowed ***]");
    Console.WriteLine(result);

    Console.ReadLine();
}

In the code fragment shown above the regular expression is intended to scan URLs from the input string. You then call the Replace() method of the Regex class. The first parameter of the Replace() method is the string in which you wish to perform the replacement. The second parameter indicates the replacement string. The Replace() method returns the resultant string after performing the replacement. If you run the above code you should see something like this in the console window:

The Replace() method of the Regex class
Figure 3: The Replace() method of the Regex class

Notice how the URL has been replaced with the text you specify.

Splitting Strings Using Regex

Regex class also allows you to split an input string based on a regular expression. Say, for example, you wish to split a date in DD/MM/YYYY format at / so as to retrieve individual day, month and year values. The Split() method of the Regex class allows you to do just that. The following example shows how:

string strFruits = "Apple,Mango,Banana";
string[] fruits = Regex.Split(strFruits, ",");
foreach(string s in fruits)
{
    Console.WriteLine(s);
}

In the above code the Split() method takes the source string and a regular expression for searching the delimiter (, in the above example). It then splits the string and returns an array of strings consisting of individual elements. A sample run of the above code is shown below:

Splitting Strings Using Regex
Figure 4: Splitting Strings Using Regex

Regex Options

Most of the methods discussed above are overloaded to take a parameter of type RegexOptions enumeration. As the name suggests, the RegexOptions enumeration is used to indicate certain configuration options to the regular expression engine during the pattern matching process. The following table lists some of the important options of RegexOptions enumeration:

Option

Description

IgnoreCase

Indicates that the pattern matching operation should ignore character casing.

Multiline

Indicates that ^ and $ characters are to be applied to the beginning and end of each line and not just the beginning and end of the entire source string.

Singleline

Indicates that dot (.) should match every character, including a newline character.

RightToLeft

Indicates that the pattern matching will be performed from right to left instead of left to right in a source string.

Compiled

Indicates that the regular expression is to be converted to MSIL code and not to regular expression internal instructions.

Just to illustrate how RegexOptions enumeration can be used write the following code in the Main() method and observe the difference due to RegexOptions value.

bool success1 = Regex.IsMatch(source, "hello");
Console.WriteLine("String found? {0}",success1);
bool success2 = Regex.IsMatch(source, "hello", RegexOptions.IgnoreCase);
Console.WriteLine("String found? {0}", success2);

As you can see, the second call to the IsMatch() method makes use of RegexOptions enumeration and specifies the case should be ignored during pattern matching. If you observe the output of the above code (see below) you will find that IsMatch() method without any RegexOptions returns false whereas with RexexOptions.IgnoreCase returns true.

IsMatch() method without RegexOptions returns false; with RegexOptions.IgnoreCase returns true
Figure 5: IsMatch() method without RegexOptions returns false; with RegexOptions.IgnoreCase returns true

Note:
You can combine multiple RegexOptions values like this :

bool success2 = Regex.IsMatch(source, "hello", RegexOptions.IgnoreCase | RegexOptions.Compiled);

Performance Considerations

As mentioned earlier, the Regex class provides static as well as instance methods for pattern matching. The static methods accept the source string and the pattern as the parameters whereas the instance methods accept source string (since pattern is specified while creating the instance itself). The following code fragment makes it clear:

//Using static method
bool success = Regex.IsMatch(source, pattern);
//Using instance method
Regex ex = new Regex(pattern);
success = ex.IsMatch(source);

When you use static methods, the regular expression engine caches the regular expressions so that if the same regular expression is used multiple times the performance will be faster. On the other hand, if you use instance methods, the regular expression engine cannot cache the patterns because Regex instances are immutable (i.e. you cannot change them later). Naturally, even if you use the same pattern multiple times there is no way to boost the performance as in the previous case.

You should also be aware of the impact of RegexOptions.Compiled on the performance. While calling any of the Regex methods, if you use the RegexOptions.Compiled option then the regular expression is converted to MSIL code and not to regular expression internal instructions. Though this improves performance it also means that the regular expressions are also loaded as a part of the assembly making it heavy and may increase the startup time. So, you should carefully evaluate the use of RegexOptions.Compiled option.

Summary

Regular expressions provide a standard and powerful way of pattern matching. The Regex class represents .NET Framework's regular expression engine. The methods of Regex class are exposed as static as well as instance methods. These methods allow you to perform search, replace and splitting operations on input strings. Behavior of the regular expression engine can be configured with the help of RegExOptions enumeration.



About the Author

Bipin Joshi

Bipin Joshi is a blogger and writes about apparently unrelated topics - Yoga & technology! A former Software Consultant by profession, Bipin has been programming since 1995 and has been working with the .NET framework ever since its inception. He has authored or co-authored half a dozen books and numerous articles on .NET technologies. He has also penned a few books on Yoga. He was a well known technology author, trainer and an active member of Microsoft developer community before he decided to take a backseat from the mainstream IT circle and dedicate himself completely to spiritual path. Having embraced Yoga way of life he now codes for fun and writes on his blogs. He can also be reached there.

Related Articles

Downloads

Comments

  • nice article

    Posted by ghazanfar381 on 07/25/2011 05:06pm

    good article. http://solutionsdealer.net

    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • As all sorts of data becomes available for storage, analysis and retrieval - so called 'Big Data' - there are potentially huge benefits, but equally huge challenges...
  • The agile organization needs knowledge to act on, quickly and effectively. Though many organizations are clamouring for "Big Data", not nearly as many know what to do with it...
  • Cloud-based integration solutions can be confusing. Adding to the confusion are the multiple ways IT departments can deliver such integration...

Most Popular Programming Stories

More for Developers

RSS Feeds

Thanks for your registration, follow us on our social networks to keep up-to-date