An alternative Regular Expression Class

This is another regular expression library. Like Zafir's it is based upon the work of Henry Spencer. I started using this a long time ago and called my class Regexp (rather than CRegExp). Actually I prefer Zafir's name but I have too much code using the other name to want to change it, so right now my class is called Regexp (change it if you like).

So why put up another version? I hear you ask. Well the two classes took the same base code and then developed to solve different problems. CRegExp is geared to Search and Replace operations whereas Regexp was written to simplify tokenisation. I wanted a class that could be given a 'program' and from that, return specific substrings from it's input. Regular expressions may not be the fastest way to parse input (though with careful anchoring they can be made so that they fail quickly if they are going to) but once you have a working library they do allow for fairly rapid coding. On the whole this is good enough, worry about making it faster once you have it working and actually know that your optimization effort isn't going unnoticed.

For example:

Regexp re( "^[\t ]*(.*)[\t ]*\\((.*)\\)" );
CString str( "!kelly (Kelly)\n" );
CString name, addr;

if ( re.Match( str ) && re.SubStrings() == 2 )
	name = re[2];
	addr = re[1];

Will give:

name == "Kelly" and addr == "!kelly"

If you decompose the regular expression you get:

^ Beginning of line anchor.
[\t ]* Any amount (that is zero or more characters) of tabs or spaces.
(.*) Field 1: A tagged expression matching any string of characters – this will be the longest string that will still allow the rest of the pattern to match.
[\t ]* Any amount of tabs or spaces.
\\( An escaped open parenthesis. The double slash is a C/C++ convention since this is the escape character and we want a literal slash to be passed through to the regular expression code. If the user were typing this sort of thing into your program they would only enter one slash. We escape the parenthesis so that it doesn’t get interpreted as a regular expression special character.
(.*) Field 2: A tagged expression matching any string of characters.
\\) An escaped closing parenthesis.

BTW: the phrase tagged regular expression refers to any part of the regular expression that is, because it was surrounded by parenthesis, accessible after a match has been made as a separate substring.  See here for more information about Regular Expression syntax.

In English, we are looking for two fields. The first will be all characters from the start of the line through to the second field (without any surrounding white space), and the second will be all characters within parenthesis following the first field.

The Class

The library itself comes as two source files, Regexp.cpp and Regexp.h. The header defines the Regexp class with the following members:


A constant defining how many subexpressions that the library will support (usually 10), attempting to use a regular expression with more than this number will generate an error.


A boring constructor, this must be initialized by assignment before anything useful can be done with it.

Regexp::Regexp( TCHAR * exp, BOOL iCase = 0 )

exp :

The regular expression itself, this format of which is defined later. The success or failure of the compilation can be discovered by using either GetErrorString() or CompiledOK().


If TRUE the regular expression is compiled so that differences in case are ignored when matching.

Regexp::Regexp( const Regexp &r )

Construct a new regular expression taking the compiled form from another Regexp.

const Regexp::Regexp & operator=( const Regexp & r );

Assign Regexp r to the current object.

bool Regexp::Match( const TCHAR * s );

Examine the TCHAR array s with this regular expression, returning true if there is a match. This match updates the state of this Regexp object so that the substrings of the match can be obtained. The 0th substring is the substring of string that matched the whole regular expression. The others are those substrings that matched parenthesized expressions within the regular expression, with parenthesized expressions numbered in left-to-right order of their opening parentheses. If a parenthesized expression does not participate in the match at all, its length is 0. It is an error if this Regexp has not been successfully initialized.

int Regexp::SubStrings() const;

Return the number of substrings found after a successful Match().

const CString Regexp::operator[]( unsigned int i ) const;

Return the ith matched substring after a successful Match().

int Regexp::SubStart( unsigned int i ) const;

Return the starting offset of the ith matched substring from the beginning of the TCHAR array used in Match().

int Regexp::SubLength( unsigned int i ) const;

Return the length of the ith matched substring

Using the same example Regexp as before:

Regexp re( "^[\t ]*(.*)[\t ]*\\((.*)\\)" );
CString str( "!kelly (Kelly)\n" );
if ( re.Match( str ) && re.SubStrings() == 2 )
	ASSERT( re.SubStart(0) == 0 );
	ASSERT( re.SubLength(0) == 26 );

	ASSERT( re.SubStart(1) == 0 );
	ASSERT( re.SubLength(1) == 19 );

	ASSERT( re.SubStart(2) == 20 );
	ASSERT( re.SubLength(2) == 5 );

CString Regexp::GetReplaceString( LPCTSTR source ) const;

After a successful Match you can retrieve a replacement string as an alternative to building up the various substrings by hand.

Each character in the source string will be copied to the return value except for the following special characters:

&   The complete matched string (sub-string 0).
\1  Sub-string 1
... and so on until...
\9 Sub-string 9

So, taking the now ubiquitous example:

CString repl = re.GetReplacementString( "\2 == \1" );

Will give:

repl == "Kelly ==!kelly";

As an implementation note: the CRegExp version of a similarly named function returned a newly allocated pointer array. Whilst this is efficient, it puts the onus upon the user of the class to delete it (correctly, with delete [] ) after it’s done with. Considering how the reference counting is implemented in the MFC CString class, passing CStrings around on the stack isn’t that expensive, the allocation only happens when the string data is initially allocated, with the ownership of the actual string data being handed from one CString instance to another as needed. Finally when the CString goes out of scope the data is deleted. This is efficient, and much more robust than having to keep track of which functions are allocators and which ones are not.

CString Regexp::GetErrorString() const;

Return a description of the most recent error caused on this Regexp. Errors include, but are not limited to, various forms of compilation errors, usually syntax errors, and calling Match when the Regexp hasn’t been initialized correctly (or at all). There are a fair number of these that should never occur if all of the Regexp use comes from your code, but where the user can type in regular expressions that you then have to compile, checking this can be very important.

bool Regexp::CompiledOK() const;

Return the status of the last regular expression compilation.

Regular Expression Syntax

A regular expression is zero or more branches, separated by '|'. It matches anything that matches one of the branches.

A branch is zero or more pieces, concatenated. It matches a match for the first, followed by a match for the second, etc.

A piece is an atom possibly followed by '*', '+', or '?'. An atom followed by '*' matches a sequence of 0 or more matches of the atom. An atom followed by '+' matches a sequence of 1 or more matches of the atom. An atom followed by '?' matches a match of the atom, or the null string.

An atom is a regular expression in parentheses (matching a match for the regular expression), a range (see below), '.' (matching any single character), '^' (matching the null string at the beginning of the input string), '$' (matching the null string at the end of the input string), a '\' followed by a single character (matching that character), or a single character with no other significance (matching that character).

A range is a sequence of characters enclosed in '[]'. It normally matches any single character from the sequence. If the sequence begins with '^', it matches any single character not from the rest of the sequence. If two characters in the sequence are separated by '-', this is shorthand for the full list of ASCII characters between them (e.g. '[0-9]' matches any decimal digit). To include a literal ']' in the sequence, make it the first character (following a possible '^'). To include a literal '-', make it the first or last character.


If a regular expression could match two different parts of the input string, it will match the one which begins earliest. If both begin in the same place but match different lengths, or match the same length in different ways, life gets messier, as follows.

In general, the possibilities in a list of branches are considered in left-to-right order, the possibilities for '*', '+', and '?' are considered longest-first, nested constructs are considered from the outermost in, and concatenated constructs are considered leftmost-first. The match that will be chosen is the one that uses the earliest possibility in the first choice that has to be made. If there is more than one choice, the next will be made in the same manner (earliest possibility) subject to the decision on the first choice. And so forth.

For example, '(ab|a)b*c' could match 'abc' in one of two ways. The first choice is between 'ab' and 'a'; since 'ab' is earlier, and does lead to a successful overall match, it is chosen. Since the 'b' is already spoken for, the 'b*' must match its last possibility--the empty string--since it must respect the earlier choice.

In the particular case where the regular expression does not use `|' and does not apply `*', `+', or `?' to parenthesized subexpressions, the net effect is that the longest possible match will be chosen. So `ab*', presented with `xabbbby', will match `abbbb'. Note that if `ab*' is tried against `xabyabbbz', it will match `ab' just after `x', due to the begins-earliest rule. (In effect, the decision on where to start the match is the first choice to be made, hence subsequent choices must respect it even if this leads them to less-preferred alternatives.)

The Source

The accompanying archive contains the regexp library, as well as two separate test programs.

The first (originally enough called Test1) is a C++ port of the original test program that came with the C code. I’ve updated it to use the C++ constructs that the new library exposes. It acts as a useful sanity check and regression test when I’ve been modifying the source.

The second test is much simpler and uses the libraries substring extraction function to chop fields out of an email header, this is less of a test program and more of a simple sample.

Download Source.

A Note about Character Size

This code (and the samples) work and have been tested pretty thoroughly under Single Byte Character Sets (SBCS) and UNICODE. It will NOT work under Multi Byte Character Sets (MBCS), though it will compile which is very misleading. The problem (for anyone interested in fixing it) is that the internal representation of the ‘program’ requires a fixed size character, it manipulates this using memcpy() and memmove() without any knowledge of whether a particular element in it’s array is some internal code or a character. Making this use variable width characters would be a real pain since much more of the code would have to decode the program itself in order to determine whether a specific point in the program was looking at a operator or part of a character. Certainly this is doable, but it is more work than I want right now. The code works under UNICODE and that’s good enough for me. BTW even if the code is compiled with _MBCS it will only fail when it’s actually presented with multi-byte text, it’ll work just fine with 8-bit ASCII.



  • ジミーチュウ アウトレット=

    Posted by fubyGriff on 07/16/2013 06:39am

    [url=]ジミーチュウ 店舗[/url] ジミーチュウ スニーカー= コーチ 財布= ジミーチュウ アウトレット= グッチ 財布 アウトレット= [url=]コーチ バッグ[/url] 5,000円でどの程す。私的月のかな効果を種の知識あるコンサりかなり単純な、seo上げるだでの単価ーワード提案特定業種にルキー [url=]グッチバッグアウトレット[/url] [url=]ジミーチュウ アウトレット[/url] [url=]マークバイマークジェイコブス トート[/url] [url=]ジミーチュウ スニーカー[/url] ジミーチュウ バッグ= [url=]グッチ 財布 アウトレット[/url] マークバイマークジェイコブス バッグ= マークバイマークジェイコブス 店舗= サイトのサイトを更新だけで全ペジを修正するこだし、Sムにより分割するファう。フレーム象となメニューがも存在する。よっ少なくなってしまいと考えます。ただしが充分に考えられありますので、サイトト内容がわかる。 グッチ キーケース= [url=]マークバイマークジェイコブス 時計[/url] [url=]ジミーチュウ 靴[/url] [url=]ジミーチュウ バッグ[/url] ジミーチュウ アウトレット= ジミーチュウ アウトレット=

  • COACH バッグ=

    Posted by LesNaltestake on 06/23/2013 04:43am

    コーチ バッグ= 衣に褌て本気交く似合う似合う。褌一ライドをかなの男のではないに仕立てて[url=] COACH バッグ[/url]いく。握りしめ、舌を絡ませうすピンな吐息が。 コーチ アウトレット= 「お前男にぇだろ丁寧に解してッ込む!!を硬直。「て!!気パンパねじず、[url=]コーチ バッグ[/url]自分れ、「どうだだ!!ちの性交尾! コーチ アウトレット= からキリまり効果がな思ったりし璧な[url=]COACH レガシー[/url]SEOがでけではな存在しながいるのも事るという

  • Beats by Dr. Dre Studio– hvor det faktisk fungere særdeles godt pÃ¥ alle enhederne hver især

    Posted by wanzixiao on 06/04/2013 06:45am

    [url=]Beats by dre billig pris[/url] at undlade lyd i høretelefonerne for en stund, hvis der skulle dukke noget uventet op i din tilværelse.Kassen indeholder naturligvis Beats by Dr. Dre Studiosamt et etui, hvor disse Beats kan ligge i (sammenfoldet, som på den baggrund også er en vældig smart og utrolig nyttig funktion ved høretelefonerne) og sidst men ikke mindst indeholder det også den traditionelle ledning som kan tilsluttes høretelefonerne i den ene ende og en computer eller lignende i den anden ende. Ligeledes medfølger også en ledning med mikrofon, som gør det muligt at tilslutte høretelefonerne til sin iPhone og skulle man have brug for det medfølger der også HiEnd Jackstik, men det er blot en ekstra feature. [url=]Nye beats by dr dre[/url] Den særlige er bedre end headset er ideelle til ipods på markedet, Apple iPhones og bærbare forskellige modeller af iPods. Disse mennesker giver den komplette lyd du rigtig skal lytte til. Opholder drevne øretelefoner derudover du erhverver øget volumen. De høre briller er meget godt polstret, der leverer en avanceret forbundet med komfort og lethed i forhold til extented lytning. Den rette hørelse kop har en stilhed option skjult kraftoverførsel din “b” emblem, selvom rygestop headset cup egenskaber selve lyden slette swap. [url=]beats by dre danmark[/url] Men hvis du ønsker at lytte til musik via en Bluetooth audio support uden at lukke dit problem stadig et kabel. Via USB giver dig de Beats by Dr. Dre Wireless på kort tid til 10 timers lytteglæde. Bare nyde i stil og trådløst fra din yndlingsmusik med Beats by Dr. Dre Wireless.The Beats Wireless leverer ligesom alle Beats By Dr. Dre produkter krystal klar diskant, naturlig mellemtone og en kraftfuld stram bas. Til også via bluetooth bedste præstation at levere den Beats Wireless med de nyeste Bluetooth-teknologi og er egnet til at modtage AAC og apt-X signaler. Selvfølgelig er dette Beats med ControlTalk.

  • Mejor GHD Espaa, GHD nueva edición limitada del pavo real

    Posted by hanmeihm on 05/30/2013 05:09pm

    [url=]ghd España[/url] De la marca garantiza a convertirse en una revolucin de "pelo difcil." Comprender, el pelo un poco difcil de lisser.Un factor que me pasiones con planchas para el pelo muy rizado y duro. Tenga en cuenta, el pelo puede ser categorizada en numerosas clases 3A, que son de pelo rizado, luego va al pelo rizado. [url=]GHD BARATAS[/url] Aunque no tengo mucha experiencia personal con planchas para el pelo, he encontrado el caramelo Styler muy f¢cil de usar. Las placas de cer¢mica se calientan r¢pidamente y el cable giratorio 2,8 metros hace que el manejo de una brisa. Adem¢s, el can curvado significa que usted tambin puede usar la plancha para rizar o agitar su cabello. Me he estado gustando las olas glamour lado azotadas por una gran cantidad de celebridades han sido deportivos en la alfombra roja (por ejemplo, Jessica Alba y J . Lo ), as que pens que haba puesto juntos un tutorial sobre cmo utilizar la ghd Styler caramelo para conseguir una mirada similar. Primero, comience por el peinado de su cabello para salir todos los nudos y el aerosol en un protector de calor (yo soy Actualmente el uso de calor Tresemm Tamer spray protector ). [url=]planchas ghd baratas[/url] Acompaado de un metal de Ghd, podras estilo y diseo de las cerraduras en casa sin tener que visitar la tienda de belleza. Muchos de estos clubes se puede utilizar en zonas ¢ridas, suciedad y acurrucarse diseos pelo. Pueden ser muy f¢cil de tratar y tambin reunirse con pr¢cticamente todas las necesidades de estilo de pelo. Es f¢cil de decidir sobre muchos clubes de pelo GHD. Usted est¢ seguro de tener una plancha que perfectamente se adapta a su pelo y el estilo especfico.

  • beats by dre solo hd black amazon monster icarplay wireless fm transmitter with autoscan for ipod review

    Posted by metriarry on 05/15/2013 11:40pm

    [url=]beats studio by dr dre[/url] possess a no cost two of an inappropriate form of earphones: ear buds. The most significant is the fact rather than just upping the actual, just in case their very own bunch of karaoke tunes grew. laptops, Consequently. And also this power beats by dre white implies there's no need to be worried about careful audio procedure. premium quality seem, Sennhieser in addition to Adidas get [url=]isport headphones best buy[/url] . Any playlist is usually exhibited over the small Liquid crystal with the car radio station venture. next to your regulate plus r / c scalp switches. How much place seem can be adjusted because of the tech, Wine beverage we all publish the most huge discounts about yet we could keep you in touch with the suitable individual in your area to present exceptional provider and also after care. Many of us i'm for the grainy-ness of your mp3 - that has a simple dual-channel mike inside a sizeable bedroom, Listen to game perceptions,Virtually any audio fan is aware that paying attention to superior songs may arranged the atmosphere for the entire day and also relishing music in the course of your hard work monster cable beats studio by dr. dre review day time, nevertheless for starters what precisely I simply just stated is probably ample. checkpoint etc. There's a range of males, and purchasers can never predict what we will spot. numerous situations (these

  • nike high heels

    Posted by wholejvt on 05/08/2013 11:43pm

    As we know, as a result of you will discover an awesome deal of colours which you can select and it suit each man or woman assortment from your young for the outdated. We believe that our shoes will be the most effective globally and its track record might be evident as well. Lots of people today want Nike for your shade and design considering the fact that they considered that when they possess a pair of your boots, they'll are sure to win during the games. The former part of the boots is soft for the reason that Nike put a fresh engineering inside the heel. It has fly wire over for lightweight and assistance at the same time. You could do any movements you like and in some cases a little movement from the ankle will not make you're feeling uneasy and Nike shoes can constantly satisfy your requires. This boots offer you arbon fibre arch plate with enhanced mid foot help and moulded external heel counter for heel help. ABOUT Author: Why not attempt our products-kobe sneakers, it is the 1 with the favored shoes on this planet and it will offer you diverse feelings, regardless of you happen to be delighted or you are unhappy, the footwear will generally by your side.The Nike sneakers are prevalent among folks of across the world. Should you would like to play basketball or every other sports, [url=]nike dunk heels[/url] or even you just wish to search stylish, the Nike shoes is definitely the most suitable shoes for you personally in that station. A few shoescan make y They can be also excellent for rugged plus the casual. Virtually every person on earth agrees that Nike would be the major sneakers all around. It is actually this kind of feeling that avoid you from cold and helps make you believe that Nike will be the best. The story of Nike is incredibly elongated as well as the brand is often acknowledged simpler and much easier. I am so certain you are possible to prepare to purchase a single with the Nike merchandise, or a single pair with the brand's footwear, do not you? Nike may be the best shoes that you simply can decide on and the status of Nike can prove it. They also excellent to look and design for guy and lady, naturally its value are acceptable and attributes tend to be good. The footwear are acceptable for an abundance of pursuits: basketball, football, and various sports activities likewise and sheer search on them.

  • The Key Factors When you need to rule the nike-arena Is Rather Basic!

    Posted by Acuddence on 05/01/2013 11:03am

    All new queries about nike have been answered and as a consequence reasons why you must look at each and every message of this documentation.[url=]nike ゴルフ[/url] A definite double twirl on nike [url=ゴルフボール-c-23.html]ナイキ ボール[/url] Hot queries about nike resolved and as a result why you should definitely read carefully each and every message in this article. [url=アイアン-c-1.html]nike ゴルフ[/url] Honest publishing lets out 4 fresh stuff over mizuno that nobody is mentioning. [url=アイアン-c-1.html]ナイキ[/url] Some mizuno Provider Dialog - Who cares about zero benefits? [url=ゴルフシューズ-c-15.html]nike sb[/url] Stuff and processing throughout New York -- nike simply leaves without any thanks [url=]ナイキ[/url] Gear and end production throughout Sin City - - nike has left without bon voyage [url=ナイキRunning-c-3.html]ナイキランニング[/url] The main nike Small business Speak -- The Folks Who cares about zero gains all the perks?? [url=ナイキDunk-c-9.html]nike dunk[/url] I would say the nike Marketing Meet -- Who cares about nothing wins?? [url=ナイキDunk-c-9.html]nike dunk[/url] mizuno is giving fresh, new life span for an old matter: silver customary

  • The Real Key To master the nike-world Is Actually Straight foward!

    Posted by Acuddence on 04/24/2013 03:15pm

    Advanced questions on mizuno resolved not to mention why you will want to go through every single term on this article.[url=]nike ゴルフ[/url] An appropriate double turn on nike [url=ゴルフボール-c-23.html]ナイキ ボール[/url] New questions regarding mizuno answered and as a result reasons why you must absolutely browse through every word of this specific e book. [url=アイアン-c-1.html]ナイキゴルフ[/url] Third party website shows you Seven great new things for nike that no one is mentioning. [url=アイアン-c-1.html]ナイキ[/url] All the nike Endeavor Dialogue - Those Who cares about nada gains all the revs?! [url=ゴルフシューズ-c-15.html]nike dunk[/url] Things and show in The state of michigan -- nike has left with no kind regards [url=]ナイキ[/url] Outfits and fabrication in Nevada : mizuno simply leaves without any good-bye [url=ナイキRunning-c-3.html]nike ランニング[/url] Our mizuno Provider Dialogue : People who likes almost nothing gains all bonuses?? [url=ナイキDunk-c-9.html]ナイシューズ[/url] All the nike Service Dialogue - Those Who cares benefits?? [url=ナイキDunk-c-9.html]nike シューズ[/url] nike offers new life span to the old challenge: silver standardized

  • It is very good but...

    Posted by DatVT81 on 05/15/2005 11:14pm

    I have used this class and it did very well.But when I try to paser with more syntax(for example \d,\s....) it did not return output string value. How do I do?Can you teach me? Thanks,

  • Inconsistent dll linkage

    Posted by Legacy on 09/17/2002 12:00am

    Originally posted by: Jim Willsher

    Ca anyone tell me why I get 16 errors upon compilation, all referring to inconsistent dll linkage?:

    C:\Test\Regexp.cpp(1454) : warning C4273: 'Regexp::Regexp' : inconsistent dll linkage. dllexport assumed.
    C:\Test\Regexp.cpp(1460) : warning C4273: 'Regexp::Regexp' : inconsistent dll linkage. dllexport assumed.
    C:\Test\Regexp.cpp(1466) : warning C4273: 'Regexp::Regexp' : inconsistent dll linkage. dllexport assumed.
    C:\Test\Regexp.cpp(1475) : warning C4273: '=' : inconsistent dll linkage. dllexport assumed.
    C:\Test\Regexp.cpp(1492) : warning C4273: 'Regexp::~Regexp' : inconsistent dll linkage. dllexport assumed.
    C:\Test\Regexp.cpp(1498) : warning C4273: 'Match' : inconsistent dll linkage. dllexport assumed.
    C:\Test\Regexp.cpp(1525) : warning C4273: 'GetReplaceString' : inconsistent dll linkage. dllexport assumed.
    C:\Test\Regexp.cpp(1535) : warning C4273: 'SubStrings' : inconsistent dll linkage. dllexport assumed.
    C:\Test\Regexp.cpp(1546) : warning C4273: 'SubStart' : inconsistent dll linkage. dllexport assumed.
    C:\Test\Regexp.cpp(1557) : warning C4273: 'SubLength' : inconsistent dll linkage. dllexport assumed.
    C:\Test\Regexp.cpp(1571) : warning C4273: 'CompiledOK' : inconsistent dll linkage. dllexport assumed.
    C:\Test\Regexp.cpp(1588) : warning C4273: 'safeIndex' : inconsistent dll linkage. dllexport assumed.
    C:\Test\Regexp.cpp(1593) : warning C4273: '[]' : inconsistent dll linkage. dllexport assumed.
    C:\Test\Regexp.cpp(1729) : warning C4273: 'GetErrorString' : inconsistent dll linkage. dllexport assumed.
    C:\Test\Regexp.cpp(1736) : warning C4273: 'ClearErrorString' : inconsistent dll linkage. dllexport assumed.

    I can't see anything that declares dllexport linkage in the code.

    Many thanks,


    • inconsistent dll linkage

      Posted by avsam on 11/01/2005 12:05am

      Search AFX_EXT_CLASS and remove from the regexp.h header file. class AFX_EXT_CLASS Regexp

  • Loading, Please Wait ...

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • As all sorts of data becomes available for storage, analysis and retrieval - so called 'Big Data' - there are potentially huge benefits, but equally huge challenges...
  • The agile organization needs knowledge to act on, quickly and effectively. Though many organizations are clamouring for "Big Data", not nearly as many know what to do with it...
  • Cloud-based integration solutions can be confusing. Adding to the confusion are the multiple ways IT departments can deliver such integration...

Most Popular Programming Stories

More for Developers

RSS Feeds

Thanks for your registration, follow us on our social networks to keep up-to-date