Working with Regular Expressions in .NET

Introduction

Regular Expressions provide a standard and powerful way of pattern matching for text data. The .NET Framework exposes its regular expression engine via System.Text.RegularExpressions Namespace. The Regex class is the primary way for developers to perform pattern matching, search and replace, and splitting operations on a string. Many beginners avoid using regular expressions because of the apparently difficult syntax. However, if your application calls for heavy pattern matching then learning and using regular expressions over ordinary string manipulation functions is strongly recommended. This article is intended to give beginners a quick overview of .NET Framework's offerings for pattern matching using regular expressions.

Note:
This article will not teach you how to write regular expressions. It focuses primarily on using classes from System.Text.RegularExpressions namespace. It is assumed that you are already familiar with regular expression syntax and are able to write basic regular expressions.

Basic Terminology

Before you go any further let's quickly glance over the basic terminology used in the context of regular expressions.

  • Capture : When you perform pattern matching using a regular expression result of a single sub-expression match is called as a Capture. The Capture and CaptureCollection classes represent a single capture and a collection of captures respectively.
  • Group : A regular expression often consists of one or more Groups. A group is represented by rounded brackets within a regular expression (the whole regular expression itself is considered as a group). There can be zero or more captures for a single group. The Group and GroupCollection classes represent a single group and a collection of groups respectively.
  • Match : A result obtained after a single match of a regular expression is termed as a Match. A match contains one or more groups. The Match and MatchCollection classes represent a single match and a collection of matches respectively.

Thus the relation between the regular expression related objects is:

Regex class--> MatchCollection--> Match objects--> GroupCollection--> Group objects--> CaptureCollection--> Capture objects

The Regex Class

The Regex class along with few more support classes represents the regular expression engine of .NET Framework. The Regex class allows you to perform pattern matching, search and replace, and splitting on the source strings. You can use the Regex class in two ways, viz. calling static methods of Regex class or by instantiating Regex class and then calling instance methods. The difference between these two approaches will be clear in the section related to performance. The following table lists some of the important methods of the Regex class along with the purpose of each:

Method

Description

IsMatch

IsMatch() method is used to determine whether a string confirms a specified regular expression. It returns true if the string matches the specified pattern else returns false.

Match

Match() method searches a string for a specified pattern and returns the first occurrence of the pattern. Returns a Match object.

Matches

Matches() method searches a string for all the occurrences of a pattern. Returns a MatchCollection object.

Replace

Replaces all the occurrences of a pattern with a specified string value.

Split

Splits a string based on a specified pattern as a delimiter and returns the parts of the string as an array.

In the following sections you are going to use many of the methods mentioned above.

Pattern Matching Using Regex Class

In this section you will use the pattern matching abilities of the Regex class. Begin by creating a new Console Application and import System.Text.RegularExpressions namespace at the top.

using System.Text.RegularExpressions;

Using IsMatch() Method

In this example you will check whether a string is a valid URL. Key-in the following code in the Main() method.

static void Main(string[] args)
{
    string source = args[0];
    string pattern = @"http(s)?://([w-]+.)+[w-]+(/[w- ./?%&=]*)?";
 
    bool success = Regex.IsMatch(source, pattern);
    if (success)
    {
        Console.WriteLine("Entered string is a valid URL!");
    }
    else
    {
        Console.WriteLine("Entered string is not a valid URL!");
    }
    Console.ReadLine();
}

The Main() method receives the string to be tested as a command line argument. The pattern string variable holds the regular expression for verifying URLs. The code then calls the IsMatch() static method on the Regex class and passes the source and pattern strings to it. Depending on the returned boolean value a message is displayed to the user.

You could have achieved the same result by creating an instance of Regex class and then calling IsMatch() method on it, as shown below:

Regex ex = new Regex(pattern);
success = ex.IsMatch(source);

Using Match() Method

In order to see how Match() method can be used, modify the Main() method as shown below:

static void Main(string[] args)
{
    string source = args[0];
    string pattern = @"http(s)?://([w-]+.)+[w-]+(/[w- ./?%&=]*)?";
    Match match = Regex.Match(source, pattern);
    if(match.Success)
    {
        Console.WriteLine("Entered string is a valid URL!");
        Console.WriteLine("{0} Groups", match.Groups.Count);
        for(int i=0;i<match.Groups.Count;i++)
        {
            Console.WriteLine("Group {0} Value = {1} Status = {2}", 
            i, match.Groups[i].Value, match.Groups[i].Success);
            
            Console.WriteLine("t{0} Captures", match.Groups[i].Captures.Count);
            
            for (int j = 0; j < match.Groups[i].Captures.Count; j++)
            {
                Console.WriteLine("tt Capture {0} Value = {1} Found at = {2}",
                j, match.Groups[i].Captures[j].Value, match.Groups[i].Captures[j].Index);
            }
        }
    }
    else
    {
        Console.WriteLine("Entered string is not a valid URL!");
    }
    Console.ReadLine();
}

The code shown above makes use of the Match() method to perform pattern matching. As mentioned earlier the Match() method returns an instance of Match class that represents the first occurrence of the pattern. The Success property of the Match object tells you whether the pattern matching was successful or not. A for loop then iterates through the Groups collection (GroupCollection object). With each iteration, the group searched for and its status is outputted. Further, the Captures collection of each group is also iterated and with each iteration the captured value and its index in the string is outputted. The following figure shows a sample run of the above application.

A sample run of the application
Figure 1: A sample run of the application

Observe the above figure carefully. Our pattern contains 4 groups (three in rounded brackets of the regular expression and the whole expression) so Count property of the Groups collection returns 4. The first group (the whole expression) has value https://www.codeguru.com/. The second group has value of s (from https). The third group has two captures - www. and codeguru. Finally, the last group has value of / (the / at the end of the URL).

Using Matches() Method

Matches() method is similar to Match() method but returns a collection of Match objects (MatchCollection). You can then iterate through all of the Match instances and see various group and capture values. The following code illustrates how this is done:

MatchCollection matches = Regex.Matches(source, pattern);
 
foreach (Match match in matches)
{
    Console.WriteLine("Match Value = {0}",match.Value);
    Console.WriteLine("============");
    if (match.Success)
    {
        Console.WriteLine("Entered string is a valid URL!");
        Console.WriteLine("{0} Groups", match.Groups.Count);
        for (int i = 0; i < match.Groups.Count; i++)
        {
            Console.WriteLine("Group {0} Value = {1} Status = {2}", 
            i, match.Groups[i].Value, match.Groups[i].Success);
            Console.WriteLine("t{0} Captures", match.Groups[i].Captures.Count);
            for (int j = 0; j < match.Groups[i].Captures.Count; j++)
            {
                Console.WriteLine("tt Capture {0} Value = {1} Found at = {2}", 
                j, match.Groups[i].Captures[j].Value, match.Groups[i].Captures[j].Index);
            }
        }
    }
    else
    {
        Console.WriteLine("Entered string is not a valid URL!");
    }
 
}

The following figure shows a sample run of the above code:

Matches() method returns two Match objects
Figure 2: Matches() method returns two Match objects

Notice how Matches() method has returned two Match objects (one for http://site1.com and other for http://site2.com).

Search and Replace Using Regex Class

The Regex class not only allows you to perform pattern matching but also allows you to search and replace strings. Consider, for example, that you are developing a discussion forum in ASP.NET. For the sake of reducing SPAM and promotional content you want to scan forum posts made by new members for URLs and then replace the URLs with ****. Something like this can easily be done with the search and replace abilities of the Regex class. Let's see how.

static void Main(string[] args)
{
    string source = args[0];
    string pattern = @"http(s)?://([w-]+.)+[w-]+(/[w- ./?%&=]*)?";

    string result = Regex.Replace(source,pattern,"[*** URLs not allowed ***]");
    Console.WriteLine(result);

    Console.ReadLine();
}

In the code fragment shown above the regular expression is intended to scan URLs from the input string. You then call the Replace() method of the Regex class. The first parameter of the Replace() method is the string in which you wish to perform the replacement. The second parameter indicates the replacement string. The Replace() method returns the resultant string after performing the replacement. If you run the above code you should see something like this in the console window:

The Replace() method of the Regex class
Figure 3: The Replace() method of the Regex class

Notice how the URL has been replaced with the text you specify.

Splitting Strings Using Regex

Regex class also allows you to split an input string based on a regular expression. Say, for example, you wish to split a date in DD/MM/YYYY format at / so as to retrieve individual day, month and year values. The Split() method of the Regex class allows you to do just that. The following example shows how:

string strFruits = "Apple,Mango,Banana";
string[] fruits = Regex.Split(strFruits, ",");
foreach(string s in fruits)
{
    Console.WriteLine(s);
}

In the above code the Split() method takes the source string and a regular expression for searching the delimiter (, in the above example). It then splits the string and returns an array of strings consisting of individual elements. A sample run of the above code is shown below:

Splitting Strings Using Regex
Figure 4: Splitting Strings Using Regex

Regex Options

Most of the methods discussed above are overloaded to take a parameter of type RegexOptions enumeration. As the name suggests, the RegexOptions enumeration is used to indicate certain configuration options to the regular expression engine during the pattern matching process. The following table lists some of the important options of RegexOptions enumeration:

Option

Description

IgnoreCase

Indicates that the pattern matching operation should ignore character casing.

Multiline

Indicates that ^ and $ characters are to be applied to the beginning and end of each line and not just the beginning and end of the entire source string.

Singleline

Indicates that dot (.) should match every character, including a newline character.

RightToLeft

Indicates that the pattern matching will be performed from right to left instead of left to right in a source string.

Compiled

Indicates that the regular expression is to be converted to MSIL code and not to regular expression internal instructions.

Just to illustrate how RegexOptions enumeration can be used write the following code in the Main() method and observe the difference due to RegexOptions value.

bool success1 = Regex.IsMatch(source, "hello");
Console.WriteLine("String found? {0}",success1);
bool success2 = Regex.IsMatch(source, "hello", RegexOptions.IgnoreCase);
Console.WriteLine("String found? {0}", success2);

As you can see, the second call to the IsMatch() method makes use of RegexOptions enumeration and specifies the case should be ignored during pattern matching. If you observe the output of the above code (see below) you will find that IsMatch() method without any RegexOptions returns false whereas with RexexOptions.IgnoreCase returns true.

IsMatch() method without RegexOptions returns false; with RegexOptions.IgnoreCase returns true
Figure 5: IsMatch() method without RegexOptions returns false; with RegexOptions.IgnoreCase returns true

Note:
You can combine multiple RegexOptions values like this :

bool success2 = Regex.IsMatch(source, "hello", RegexOptions.IgnoreCase | RegexOptions.Compiled);

Performance Considerations

As mentioned earlier, the Regex class provides static as well as instance methods for pattern matching. The static methods accept the source string and the pattern as the parameters whereas the instance methods accept source string (since pattern is specified while creating the instance itself). The following code fragment makes it clear:

//Using static method
bool success = Regex.IsMatch(source, pattern);
//Using instance method
Regex ex = new Regex(pattern);
success = ex.IsMatch(source);

When you use static methods, the regular expression engine caches the regular expressions so that if the same regular expression is used multiple times the performance will be faster. On the other hand, if you use instance methods, the regular expression engine cannot cache the patterns because Regex instances are immutable (i.e. you cannot change them later). Naturally, even if you use the same pattern multiple times there is no way to boost the performance as in the previous case.

You should also be aware of the impact of RegexOptions.Compiled on the performance. While calling any of the Regex methods, if you use the RegexOptions.Compiled option then the regular expression is converted to MSIL code and not to regular expression internal instructions. Though this improves performance it also means that the regular expressions are also loaded as a part of the assembly making it heavy and may increase the startup time. So, you should carefully evaluate the use of RegexOptions.Compiled option.

Summary

Regular expressions provide a standard and powerful way of pattern matching. The Regex class represents .NET Framework's regular expression engine. The methods of Regex class are exposed as static as well as instance methods. These methods allow you to perform search, replace and splitting operations on input strings. Behavior of the regular expression engine can be configured with the help of RegExOptions enumeration.



About the Author

Bipin Joshi

Bipin Joshi is a blogger and writes about apparently unrelated topics - Yoga & technology! A former Software Consultant by profession, Bipin has been programming since 1995 and has been working with the .NET framework ever since its inception. He has authored or co-authored half a dozen books and numerous articles on .NET technologies. He has also penned a few books on Yoga. He was a well known technology author, trainer and an active member of Microsoft developer community before he decided to take a backseat from the mainstream IT circle and dedicate himself completely to spiritual path. Having embraced Yoga way of life he now codes for fun and writes on his blogs. He can also be reached there.

Related Articles

Downloads

Comments

  • Stil og design agt deres øjne

    Posted by qzgrcz694 on 07/16/2013 08:40pm

    Uanset om du har brug for netværks personlig brug derhjemme, eller hvis du har brug for at udstyre dit barns skole, som er en meget sød, med henblik på at få en favorit Beats hovedtelefoner nysgerrig, jeg gravede lidt dybere, og jeg gotta jeg ser frem til i alle de basale behov klik her. Faktisk ønsker jeg dig til at bryde det ned til den korte form.Da jeg fik den elskede monster beats hovedtelefoner, og jeg føler meget glad og spændt, men hvad er den virkelige god pleje af det, selv om de nye hovedtelefoner altid vil indtage markedet hurtigt, men når jeg har beats er stadig som ny, tilfredshed aldrig forladt. Vi taler om nogle enkle opførsel om forlængelse af levetiden for headsettet [url=http://beatsbydrdredanmark.webstarts.com/]beats by dre danmark[/url] BIII booo din troede involverer let jazz, med essentials involverer normal, god ole ‘, sammen med digitalkameraer sange for dig at lave en ny specifikt eksklusiv lyd. BIII øjeblikket omdefinerer private musik ekspertise til at gøre dine musikernes øretelefoner. Enhver observere, hver eneste nuance, kan optaget med aldrig-hørt-før realistisk look, plus iøjnefaldende design og stil trådløse høretelefoner bestemt ved en ny trompet mundstykke har som en installation vidnesbyrd om den mestre kunsten. Disse former for sædvanligvis er ikke kun de bedste ørepuder vedrørende jazz. Uanset hvad nogen hører, vil BIII sikkert vokset til betragtes som en af dine nuværende all-time musikalske teknologi højder. [urlhttp://beatsbydredanmark.webspawner.com/]Beats by Dre[/url] DRE beat, begrebet glatte jazz, klassiske elementer blandet rock, elektronisk musik, skaber fantastiske beats by dre unik lyd. Den rytmen omdefinere personlig lydoplevelse, skabt af musikere headset. Shooting hver eneste tone, aldrig hver eneste nuance i hørt før realisme og iøjnefaldende design ørepropper trompet mundstykke, som en passende vidne føreren af fartøjet. Det er ikke kun de bedste hovedtelefoner til jazz. Uanset om du lytter, vil rytmen være et nyt højdepunkt for al din musik.

    Reply
  • where you can steal the cheapest clarisonic mia

    Posted by iouwanzi on 06/05/2013 06:58pm

    [url=http://www.miaclarisonicaustralia.org/]clarisonic mia australia[/url] Give indsigt i LOCOG tænkning, og måske om ikke dets juridiske team vil reagere forskelligt på Dr Dre har Beats hovedtelefoner stunt, en talsmand sagde dengang: “Vores tilgang til håndhævelse i 2006 loven har altid været fornuftig, pragmatisk og forholdsmæssige.”Vi er nødt til at beskytte rettighederne for vores sponsorer, der har betalt for at få eksklusive associationer til legene i deres sponsorater kategorier. Vi er hård på kommercielt misbrug, men vi ønsker ikke at gøre noget for at dæmpe ægte begejstring og spænding om Games . [url=http://www.miaclarisonicaustralia.org/]clarisonic mia online[/url] Beats hovedtelefoner tilbyder en stilfuld og komfortabel design samt usædvanligt sprød lyd respons. Lydkvaliteten er for det meste afbalanceret, med varme mids og dundrende bas. Inkluderet er en nice bæretaske og en musik-telefon-kompatibelt kabel.han Monster Beats by Dr. Dre hovedtelefoner tilbyde et stilfuldt og komfortabelt design samt usædvanligt sprød lyd respons. Lydkvaliteten er for det meste afbalanceret, med varme mids og dundrende bas. Inkluderet er en nice bæretaske og en musik-telefon-kompatibelt kabel. [url=http://www.australiaclarisonic.com/]clarisonic mia[/url] Hvis man ville præcisere, hvad præcist udrette audio transskription tjenesteudbydere samt transskribere tjenester udføre, svaret er, som de fuldføre opgaven med at konvertere musik information i elektronisk fil info. Hvis det overhovedet er muligt, kan en uddannet transcriptionist lytte til lyd tape løsninger, der har brug for at konvertere, og derefter dokumentere dem i en elektronisk digital fil, der kan helt sikkert være en sætning fil eller tilknyttede fil formatering.

    Reply
  • España GHD aprobado al mejor precio

    Posted by wanzilucky on 06/05/2013 06:56pm

    [url=http://planchas-ghd.manifo.com/]planchas ghd[/url] Gracias a Twitter, he descubierto que algunos de ustedes estaban dispuestos a descubrir cuál es la diferencia entre el Cloud Nine y el ghd IV Styler fueron – y si valía la pena cambiar de lealtades o actualizar desde una alternativa más barata. Así que he creado esta tabla de comparación rápida que espero que le ayuda a tomar una decisión.Las mujeres han luchado por conseguir la perfección salón de golpe el cabello secado en su casa durante años. Mi cabello es grueso y recto, naturalmente, así que he estado en la búsqueda de un secador de pelo increíble que es bastante, pero se seca el pelo rápidamente y me da un poco de volumen. [url=http://comprar-ghd.manifo.com/]Comprar ghd[/url] Usando una combinación de la más alta de calor y ajuste de la velocidad y el botón de aire frío, el pelo se seca en menos de 5 minutos. El secador de pelo es ligero y de diseño ergonómico que se sienta muy bien en su mano, para no mencionar el hecho de que es más tranquilo y mi pelo se seca y rápidos. Me tomó un poco de tiempo para acostumbrarse a los botones en la parte posterior del secador de pelo – que fue diseñado para que sea accesible para los zurdos entre nosotros. [url=http://comprarsaleghd.webstarts.com/]Comprar GHD Planchas[/url] Reina el glamour en la nueva campaña de la firma más famosa del mundo. Salen a la luz más imágenes de la cantante, que vuelve a posar como embajadora mundial de esta marca y que encarna la elegancia en estado puro a través de impresionantes melenas.En la campaña 2012 de planchas de pelo GHD vemos dos lados opuestos de la cantante Katy Perry, que desde hace un año es imagen de la firma. En la primera tanda vimos a la cantante luciendo un look fresco y juvenil pero algo serio a lo que nos tiene acostumbrados con su melena lisa y flequillo. Tras esto, han salido las imágenes más glamourosas donde la cantante luce un look de auténtica diva de Hollywood.

    Reply
  • nice article

    Posted by ghazanfar381 on 07/25/2011 05:06pm

    good article. http://solutionsdealer.net

    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • 10 Rules that Make or Break Enterprise App Development Projects In today's app-driven world, application development is a top priority. Even so, 68% of enterprise application delivery projects fail. Designing and building applications that pay for themselves and adapt to future needs is incredibly difficult. Executing one successful project is lucky, but making it a repeatable process and strategic advantage? That's where the money is. With help from our most experienced project leads and software engineers, …

  • Organizations across the world continue to strive for greater agility, efficiency, and innovation. But they also need to cut costs, improve productivity and become more competitive. These new requirements create complexity and extra tasks for IT to manage. But with greater complexity, it is easy to miss a minor system vulnerability, which in turn can cause major security issues. This white paper analyzes how complexity in the IT environment is causing new security challenges and how best to address this …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds