Using Regular Expressions for Search/Replace


Normally, when you search for a sub-string in a string, the match should be exact. So if we search for a sub-string "abc" then the string being searched should contain these exact letters in the same sequence for a match to be found. We can extend this kind of search to a case insensitive search where the sub-string "abc" will find strings like "Abc", "ABC" etc. That is, the case is ignored but the sequence of the letters should be exactly the same. Sometimes, a case insensitive search is also not enough. For example, if we want to search for numeric digit, then we basically end up searching for each digit independantly. This is where regular expressions come in to our help.

Regular expressions are text patterns that are used for string matching. Regular expressions are strings that contains a mix of plain text and special characters to indicate what kind of matching to do. Here's a very brief turorial on using regular expressions before we move on to the code for handling regular expressions.

Suppose, we are looking for a numeric digit then the regular expression we would search for is "[0-9]". The brackets indicate that the character being compared should match any one of the characters enclosed within the bracket. The dash (-) between 0 and 9 indicates that it is a range from 0 to 9. Therefore, this regular expression will match any character between 0 and 9, that is, any digit. If we want to search for a special character literally we must use a backslash before the special character. For example, the single character regular expression "\*" matches a single asterisk. In the table below the special characters are briefly described.

Character

Description

^

Beginning of the string. The expression "^A" will match an A only at the beginning of the string.

^

The caret (^) immediately following the left-bracket ([) has a different meaning. It is used to exclude the remaining characters within brackets from matching the target string. The expression "[^0-9]" indicates that the target character should not be a digit.

$

The dollar sign ($) will match the end of the string. The expression "abc$" will match the sub-string "abc" only if it is at the end of the string.

|

The alternation character (|) allows either expression on its side to match the target string. The expression "a|b" will match a as well as b.

.

The dot (.) will match any character.

*

The asterix (*) indicates that the character to the left of the asterix in the expression should match 0 or more times.

+

The plus (+) is similar to asterix but there should be at least one match of the character to the left of the + sign in the expression.

?

The question mark (?) matches the character to its left 0 or 1 times.

()

The parenthesis affects the order of pattern evaluation and also serves as a tagged expression that can be used when replacing the matched sub-string with another expression.

[]

Brackets ([ and ]) enclosing a set of characters indicates that any of the enclosed characters may match the target character.

The parenthesis, besides affecting the evaluation order of the regular expression, also serves as tagged expression which is something like a temporary memory. This memory can then be used when we want to replace the found expression with a new expression. The replace expression can specify a & character which means that the & represents the sub-string that was found. So, if the sub-string that matched the regular expression is "abcd", then a replace expression of "xyz&xyz" will change it to "xyzabcdxyz". The replace expression can also be expressed as "xyz\0xyz". The "\0" indicates a tagged expression representing the entire sub-string that was matched. Similarly we can have other tagged expression represented by "\1", "\2" etc. Note that although the tagged expression 0 is always defined, the tagged expression 1,2 etc. are only defined if the regular expression used in the search had enough sets of parenthesis. Here are few examples.

String

Search

Replace

Result

Mr.

(Mr)(\.)

\1s\2

Mrs.

abc

(a)b(c)

&-\1-\2

abc-a-c

bcd

(a|b)c*d

&-\1

bcd-b

abcde

(.*)c(.*)

&-\1-\2

abcde-ab-de

cde

(ab|cd)e

&-\1

cde-cd

The code for regular expression search is given below. This code was derived from work done by Henry Spencer. Besides adding a C++ wrapper around C code, a couple of functions were added to enable search and replace operations.

////////////////////////////////////////////////////////////////////////
// RegExp.h
//
// This code has been derived from work by Henry Spencer. 
// The main changes are
// 1. All char variables and functions have been changed to TCHAR
//    counterparts
// 2. Added GetFindLen() & GetReplaceString() to enable search
//    and replace operations.
// 3. And of course, added the C++ Wrapper
//
// The original copyright notice follows:
//
// Copyright (c) 1986, 1993, 1995 by University of Toronto.
// Written by Henry Spencer.  Not derived from licensed software.
//
// Permission is granted to anyone to use this software for any
// purpose on any computer system, and to redistribute it in any way,
// subject to the following restrictions:
//
// 1. The author is not responsible for the consequences of use of
// this software, no matter how awful, even if they arise
// from defects in it.
//
// 2. The origin of this software must not be misrepresented, either
// by explicit claim or by omission.
//
// 3. Altered versions must be plainly marked as such, and must not
// be misrepresented (by explicit claim or omission) as being
// the original software.
//
// 4. This notice must not be removed or altered.
/////////////////////////////////////////////////////////////////////////////

#define NSUBEXP  10


class CRegExp
{
public:
	CRegExp();
	~CRegExp();

	CRegExp *RegComp( const TCHAR *re );
	int RegFind(const TCHAR *str);
	TCHAR* GetReplaceString( const TCHAR* sReplaceExp );
	int GetFindLen()
	{
		if( startp[0] == NULL || endp[0] == NULL )
			return 0;

		return endp[0] - startp[0];
	}


private:
	TCHAR *regnext(TCHAR *node);
	void reginsert(TCHAR op, TCHAR *opnd);

	int regtry(TCHAR *string);
	int regmatch(TCHAR *prog);
	size_t regrepeat(TCHAR *node);
	TCHAR *reg(int paren, int *flagp);
	TCHAR *regbranch(int *flagp);
	void regtail(TCHAR *p, TCHAR *val);
	void regoptail(TCHAR *p, TCHAR *val);
	TCHAR *regpiece(int *flagp);
	TCHAR *regatom(int *flagp);

	// Inline functions
private:
	TCHAR OP(TCHAR *p) {return *p;};
	TCHAR *OPERAND( TCHAR *p) {return (TCHAR*)((short *)(p+1)+1); };

	// regc - emit (if appropriate) a byte of code
	void regc(TCHAR b)
	{
		if (bEmitCode)
			*regcode++ = b;
		else
			regsize++;
	};

	// regnode - emit a node
	TCHAR *	regnode(TCHAR op)
	{
		if (!bEmitCode) {
			regsize += 3;
			return regcode;
		}

		*regcode++ = op;
		*regcode++ = _T('\0');		/* Null next pointer. */
		*regcode++ = _T('\0');

		return regcode-3;
	};


private:
	BOOL bEmitCode;
	BOOL bCompiled;
	TCHAR *sFoundText;

	TCHAR *startp[NSUBEXP];
	TCHAR *endp[NSUBEXP];
	TCHAR regstart;		// Internal use only. 
	TCHAR reganch;		// Internal use only. 
	TCHAR *regmust;		// Internal use only. 
	int regmlen;		// Internal use only. 
	TCHAR *program;		// Unwarranted chumminess with compiler. 

	TCHAR *regparse;	// Input-scan pointer. 
	int regnpar;		// () count. 
	TCHAR *regcode;		// Code-emit pointer; ®dummy = don't. 
	TCHAR regdummy[3];	// NOTHING, 0 next ptr 
	long regsize;		// Code size. 

	TCHAR *reginput;	// String-input pointer. 
	TCHAR *regbol;		// Beginning of input, for ^ check. 
};




////////////////////////////////////////////////////////////////////////////////
// RegExp.cpp
////////////////////////////////////////////////////////////////////////////////

#include "stdafx.h"
#include "RegExp.h"


// definition	number	opnd?	meaning 
#define	END		0		// no	End of program. 
#define	BOL		1		// no	Match beginning of line. 
#define	EOL		2		// no	Match end of line. 
#define	ANY		3		// no	Match any character. 
#define	ANYOF	4		// str	Match any of these. 
#define	ANYBUT	5		// str	Match any but one of these. 
#define	BRANCH	6		// node	Match this, or the next..\&. 
#define	BACK	7		// no	"next" ptr points backward. 
#define	EXACTLY	8		// str	Match this string. 
#define	NOTHING	9		// no	Match empty string. 
#define	STAR	10		// node	Match this 0 or more times. 
#define	PLUS	11		// node	Match this 1 or more times. 
#define	OPEN	20		// no	Sub-RE starts here. 
						//		OPEN+1 is number 1, etc. 
#define	CLOSE	30		// no	Analogous to OPEN. 

// Utility definitions.
 
#define	FAIL(m)		{ regerror(m); return(NULL); }
#define	ISREPN(c)	((c) == _T('*') || (c) == _T('+') || (c) == _T('?'))
#define	META		"^$.[()|?+*\\"

// Flags to be passed up and down.
 
#define	HASWIDTH	01	// Known never to match null string. 
#define	SIMPLE		02	// Simple enough to be STAR/PLUS operand. 
#define	SPSTART		04	// Starts with * or +. 
#define	WORST		0	// Worst case. 


CRegExp::CRegExp()
{
	bCompiled = FALSE;
	program = NULL;
	sFoundText = NULL;

	for( int i = 0; i < NSUBEXP; i++ )
	{
		startp[i] = NULL;
		endp[i] = NULL;
	}
}

CRegExp::~CRegExp()
{
	delete program;
	delete sFoundText;
}


CRegExp* CRegExp::RegComp(const TCHAR *exp)
{
	TCHAR *scan;
	int flags;

	if (exp == NULL)
		return NULL;
	
	bCompiled = TRUE;

	// First pass: determine size, legality. 
	bEmitCode = FALSE;
	regparse = (TCHAR *)exp;
	regnpar = 1;
	regsize = 0L;
	regdummy[0] = NOTHING;
	regdummy[1] = regdummy[2] = 0;
	regcode = regdummy;
	if (reg(0, &flags) == NULL)
		return(NULL);

	// Allocate space. 
	delete program;
	program = new TCHAR[regsize];
	memset( program, 0, regsize * sizeof(TCHAR) );

	if (program == NULL)
		return NULL;

	// Second pass: emit code. 
	bEmitCode = TRUE;
	regparse = (TCHAR *)exp;
	regnpar = 1;
	regcode = program;
	if (reg(0, &flags) == NULL)
		return NULL;

	// Dig out information for optimizations. 
	regstart = _T('\0');		// Worst-case defaults. 
	reganch = 0;
	regmust = NULL;
	regmlen = 0;
	scan = program;		// First BRANCH. 
	if (OP(regnext(scan)) == END) 
	{	
		// Only one top-level choice. 
		scan = OPERAND(scan);

		// Starting-point info. 
		if (OP(scan) == EXACTLY)
			regstart = *OPERAND(scan);
		else if (OP(scan) == BOL)
			reganch = 1;

		// If there's something expensive in the r.e., find the
		// longest literal string that must appear and make it the
		// regmust.  Resolve ties in favor of later strings, since
		// the regstart check works with the beginning of the r.e.
		// and avoiding duplication strengthens checking.  Not a
		// strong reason, but sufficient in the absence of others.
		 
		if (flags&SPSTART) 
		{
			char *longest = NULL;
			size_t len = 0;

			for (; scan != NULL; scan = regnext(scan))
				if (OP(scan) == EXACTLY && _tcslen(OPERAND(scan)) >= len) 
				{
					longest = OPERAND(scan);
					len = _tcslen(OPERAND(scan));
				}
			regmust = longest;
			regmlen = (int)len;
		}
	}

	return this;
}

// reg - regular expression, i.e. main body or parenthesized thing
//
// Caller must absorb opening parenthesis.
//
// Combining parenthesis handling with the base level of regular expression
// is a trifle forced, but the need to tie the tails of the branches to what
// follows makes it hard to avoid.
 


TCHAR *CRegExp::reg(int paren, int *flagp)
{
	char *ret;
	char *br;
	char *ender;
	int parno;
	int flags;

	*flagp = HASWIDTH;	// Tentatively. 

	if (paren) 
	{
		// Make an OPEN node. 
		if (regnpar >= NSUBEXP)
		{
			TRACE1("Too many (). NSUBEXP is set to %d\n", NSUBEXP );
			return NULL;
		}
		parno = regnpar;
		regnpar++;
		ret = regnode(OPEN+parno);
	}

	// Pick up the branches, linking them together. 
	br = regbranch(&flags);
	if (br == NULL)
		return(NULL);
	if (paren)
		regtail(ret, br);	// OPEN -> first. 
	else
		ret = br;
	*flagp &= ~(~flags&HASWIDTH);	// Clear bit if bit 0. 
	*flagp |= flags&SPSTART;
	while (*regparse == _T('|')) {
		regparse++;
		br = regbranch(&flags);
		if (br == NULL)
			return(NULL);
		regtail(ret, br);	// BRANCH -> BRANCH. 
		*flagp &= ~(~flags&HASWIDTH);
		*flagp |= flags&SPSTART;
	}

	// Make a closing node, and hook it on the end. 
	ender = regnode((paren) ? CLOSE+parno : END);
	regtail(ret, ender);

	// Hook the tails of the branches to the closing node. 
	for (br = ret; br != NULL; br = regnext(br))
		regoptail(br, ender);

	// Check for proper termination. 
	if (paren && *regparse++ != _T(')')) 
	{
		TRACE0("unterminated ()\n");
		return NULL;
	} 
	else if (!paren && *regparse != _T('\0')) 
	{
		if (*regparse == _T(')')) 
		{
			TRACE0("unmatched ()\n");
			return NULL;
		} 
		else
		{
			TRACE0("internal error: junk on end\n");
			return NULL;
		}
		// NOTREACHED 
	}

	return(ret);
}




//
// regbranch - one alternative of an | operator
//
// Implements the concatenation operator.
 
TCHAR *CRegExp::regbranch(int *flagp)
{
	TCHAR *ret;
	TCHAR *chain;
	TCHAR *latest;
	int flags;
	int c;

	*flagp = WORST;				// Tentatively. 

	ret = regnode(BRANCH);
	chain = NULL;
	while ((c = *regparse) != _T('\0') && c != _T('|') && c != _T(')')) {
		latest = regpiece(&flags);
		if (latest == NULL)
			return(NULL);
		*flagp |= flags&HASWIDTH;
		if (chain == NULL)		// First piece. 
			*flagp |= flags&SPSTART;
		else
			regtail(chain, latest);
		chain = latest;
	}
	if (chain == NULL)			// Loop ran zero times. 
		(void) regnode(NOTHING);

	return(ret);
}

//
// regpiece - something followed by possible [*+?]
//
// Note that the branching code sequences used for ? and the general cases
// of * and + are somewhat optimized:  they use the same NOTHING node as
// both the endmarker for their branch list and the body of the last branch.
// It might seem that this node could be dispensed with entirely, but the
// endmarker role is not redundant.
 
TCHAR *CRegExp::regpiece(int *flagp)
{
	TCHAR *ret;
	TCHAR op;
	TCHAR *next;
	int flags;

	ret = regatom(&flags);
	if (ret == NULL)
		return(NULL);

	op = *regparse;
	if (!ISREPN(op)) {
		*flagp = flags;
		return(ret);
	}

	if (!(flags&HASWIDTH) && op != _T('?'))
	{
		TRACE0("*+ operand could be empty\n");
		return NULL;
	}

	switch (op) {
	case _T('*'):	*flagp = WORST|SPSTART;			break;
	case _T('+'):	*flagp = WORST|SPSTART|HASWIDTH;	break;
	case _T('?'):	*flagp = WORST;				break;
	}

	if (op == _T('*') && (flags&SIMPLE))
		reginsert(STAR, ret);
	else if (op == _T('*')) {
		// Emit x* as (x&|), where & means "self". 
		reginsert(BRANCH, ret);		// Either x 
		regoptail(ret, regnode(BACK));	// and loop 
		regoptail(ret, ret);		// back 
		regtail(ret, regnode(BRANCH));	// or 
		regtail(ret, regnode(NOTHING));	// null. 
	} else if (op == _T('+') && (flags&SIMPLE))
		reginsert(PLUS, ret);
	else if (op == _T('+')) {
		// Emit x+ as x(&|), where & means "self". 
		next = regnode(BRANCH);		// Either 
		regtail(ret, next);
		regtail(regnode(BACK), ret);	// loop back 
		regtail(next, regnode(BRANCH));	// or 
		regtail(ret, regnode(NOTHING));	// null. 
	} else if (op == _T('?')) {
		// Emit x? as (x|) 
		reginsert(BRANCH, ret);		// Either x 
		regtail(ret, regnode(BRANCH));	// or 
		next = regnode(NOTHING);		// null. 
		regtail(ret, next);
		regoptail(ret, next);
	}
	regparse++;
	if (ISREPN(*regparse))
	{
		TRACE0("nested *?+\n");
		return NULL;
	}

	return(ret);
}

//
// regatom - the lowest level
//
// Optimization:  gobbles an entire sequence of ordinary characters so that
// it can turn them into a single node, which is smaller to store and
// faster to run.  Backslashed characters are exceptions, each becoming a
// separate node; the code is simpler that way and it's not worth fixing.
 
TCHAR *CRegExp::regatom(int *flagp)
{
	TCHAR *ret;
	int flags;

	*flagp = WORST;		// Tentatively. 

	switch (*regparse++) {
	case _T('^'):
		ret = regnode(BOL);
		break;
	case _T('$'):
		ret = regnode(EOL);
		break;
	case _T('.'):
		ret = regnode(ANY);
		*flagp |= HASWIDTH|SIMPLE;
		break;
	case _T('['): {
		int range;
		int rangeend;
		int c;

		if (*regparse == _T('^')) {	// Complement of range. 
			ret = regnode(ANYBUT);
			regparse++;
		} else
			ret = regnode(ANYOF);
		if ((c = *regparse) == _T(']') || c == _T('-')) {
			regc(c);
			regparse++;
		}
		while ((c = *regparse++) != _T('\0') && c != _T(']')) {
			if (c != _T('-'))
				regc(c);
			else if ((c = *regparse) == _T(']') || c == _T('\0'))
				regc(_T('-'));
			else 
			{
				range = (unsigned) (TCHAR)*(regparse-2);
				rangeend = (unsigned) (TCHAR)c;
				if (range > rangeend)
				{
					TRACE0("invalid [] range\n");
					return NULL;
				}
				for (range++; range <= rangeend; range++)
					regc(range);
				regparse++;
			}
		}
		regc(_T('\0'));
		if (c != _T(']'))
		{
			TRACE0("unmatched []\n");
			return NULL;
		}
		*flagp |= HASWIDTH|SIMPLE;
		break;
		}
	case _T('('):
		ret = reg(1, &flags);
		if (ret == NULL)
			return(NULL);
		*flagp |= flags&(HASWIDTH|SPSTART);
		break;
	case _T('\0'):
	case _T('|'):
	case _T(')'):
		// supposed to be caught earlier 
		TRACE0("internal error: \\0|) unexpected\n");
		return NULL;
		break;
	case _T('?'):
	case _T('+'):
	case _T('*'):
		TRACE0("?+* follows nothing\n");
		return NULL;
		break;
	case _T('\\'):
		if (*regparse == _T('\0'))
		{
			TRACE0("trailing \\\n");
			return NULL;
		}
		ret = regnode(EXACTLY);
		regc(*regparse++);
		regc(_T('\0'));
		*flagp |= HASWIDTH|SIMPLE;
		break;
	default: {
		size_t len;
		TCHAR ender;

		regparse--;
		len = _tcscspn(regparse, META);
		if (len == 0)
		{
			TRACE0("internal error: strcspn 0\n");
			return NULL;
		}
		ender = *(regparse+len);
		if (len > 1 && ISREPN(ender))
			len--;		// Back off clear of ?+* operand. 
		*flagp |= HASWIDTH;
		if (len == 1)
			*flagp |= SIMPLE;
		ret = regnode(EXACTLY);
		for (; len > 0; len--)
			regc(*regparse++);
		regc(_T('\0'));
		break;
		}
	}

	return(ret);
}



// reginsert - insert an operator in front of already-emitted operand
//
// Means relocating the operand.
 
void CRegExp::reginsert(TCHAR op, TCHAR *opnd)
{
	TCHAR *place;

	if (!bEmitCode) {
		regsize += 3;
		return;
	}

	(void) memmove(opnd+3, opnd, (size_t)((regcode - opnd)*sizeof(TCHAR)));
	regcode += 3;

	place = opnd;		// Op node, where operand used to be. 
	*place++ = op;
	*place++ = _T('\0');
	*place++ = _T('\0');
}

//
// regtail - set the next-pointer at the end of a node chain
 
void CRegExp::regtail(TCHAR *p, TCHAR *val)
{
	TCHAR *scan;
	TCHAR *temp;
//	int offset;

	if (!bEmitCode)
		return;

	// Find last node. 
	for (scan = p; (temp = regnext(scan)) != NULL; scan = temp)
		continue;

	*((short *)(scan+1)) = (OP(scan) == BACK) ? scan - val : val - scan;
}


// regoptail - regtail on operand of first argument; nop if operandless
 
void CRegExp::regoptail(TCHAR *p, TCHAR *val)
{
	// "Operandless" and "op != BRANCH" are synonymous in practice. 
	if (!bEmitCode || OP(p) != BRANCH)
		return;
	regtail(OPERAND(p), val);
}


// RegFind	- match a regexp against a string
// Returns	- Returns position of regexp or -1
//			  if regular expression not found
// Note		- The regular expression should have been
//			  previously compiled using RegComp
int CRegExp::RegFind(const TCHAR *str)
{
	TCHAR *string = (TCHAR *)str;	// avert const poisoning 
	TCHAR *s;

	// Delete any previously stored found string
	delete sFoundText;
	sFoundText = NULL;

	// Be paranoid. 
	if(string == NULL) 
	{
		TRACE0("NULL argument to regexec\n");
		return(-1);
	}

	// Check validity of regex
	if (!bCompiled) 
	{
		TRACE0("No regular expression provided yet.\n");
		return(-1);
	}

	// If there is a "must appear" string, look for it. 
	if (regmust != NULL && _tcsstr(string, regmust) == NULL)
		return(-1);

	// Mark beginning of line for ^
	regbol = string;

	// Simplest case:  anchored match need be tried only once. 
	if (reganch)
	{
		if( regtry(string) )
		{
			// Save the found substring in case we need it
			sFoundText = new TCHAR[GetFindLen()+1];
			sFoundText[GetFindLen()] = _T('\0');
			_tcsncpy(sFoundText, string, GetFindLen() );

			return 0;
		}
		//String not found
		return -1;
	}

	// Messy cases:  unanchored match. 
	if (regstart != _T('\0')) 
	{
		// We know what TCHAR it must start with. 
		for (s = string; s != NULL; s = _tcschr(s+1, regstart))
			if (regtry(s))
			{
				int nPos = s-str;

				// Save the found substring in case we need it later
				sFoundText = new TCHAR[GetFindLen()+1];
				sFoundText[GetFindLen()] = _T('\0');
				_tcsncpy(sFoundText, s, GetFindLen() );

				return nPos;
			}
		return -1;
	} 
	else 
	{
		// We don't -- general case
		for (s = string; !regtry(s); s++)
			if (*s == _T('\0'))
				return(-1);

		int nPos = s-str;

		// Save the found substring in case we need it later
		sFoundText = new TCHAR[GetFindLen()+1];
		sFoundText[GetFindLen()] = _T('\0');
		_tcsncpy(sFoundText, s, GetFindLen() );

		return nPos;
	}
	// NOTREACHED 
}


// regtry - try match at specific point
 
int	CRegExp::regtry(TCHAR *string)
{
	int i;
	TCHAR **stp;
	TCHAR **enp;

	reginput = string;

	stp = startp;
	enp = endp;
	for (i = NSUBEXP; i > 0; i--) 
	{
		*stp++ = NULL;
		*enp++ = NULL;
	}
	if (regmatch(program)) 
	{
		startp[0] = string;
		endp[0] = reginput;
		return(1);
	} 
	else
		return(0);
}

// regmatch - main matching routine
//
// Conceptually the strategy is simple:  check to see whether the current
// node matches, call self recursively to see whether the rest matches,
// and then act accordingly.  In practice we make some effort to avoid
// recursion, in particular by going through "ordinary" nodes (that don't
// need to know whether the rest of the match failed) by a loop instead of
// by recursion.
 
int	CRegExp::regmatch(TCHAR *prog)
{
	TCHAR *scan;	// Current node. 
	TCHAR *next;		// Next node. 

	for (scan = prog; scan != NULL; scan = next) {
		next = regnext(scan);

		switch (OP(scan)) {
		case BOL:
			if (reginput != regbol)
				return(0);
			break;
		case EOL:
			if (*reginput != _T('\0'))
				return(0);
			break;
		case ANY:
			if (*reginput == _T('\0'))
				return(0);
			reginput++;
			break;
		case EXACTLY: {
			size_t len;
			TCHAR *const opnd = OPERAND(scan);

			// Inline the first character, for speed. 
			if (*opnd != *reginput)
				return(0);
			len = _tcslen(opnd);
			if (len > 1 && _tcsncmp(opnd, reginput, len) != 0)
				return(0);
			reginput += len;
			break;
			}
		case ANYOF:
			if (*reginput == _T('\0') ||
					_tcschr(OPERAND(scan), *reginput) == NULL)
				return(0);
			reginput++;
			break;
		case ANYBUT:
			if (*reginput == _T('\0') ||
					_tcschr(OPERAND(scan), *reginput) != NULL)
				return(0);
			reginput++;
			break;
		case NOTHING:
			break;
		case BACK:
			break;
		case OPEN+1: case OPEN+2: case OPEN+3:
		case OPEN+4: case OPEN+5: case OPEN+6:
		case OPEN+7: case OPEN+8: case OPEN+9: {
			const int no = OP(scan) - OPEN;
			TCHAR *const input = reginput;

			if (regmatch(next)) {
				// Don't set startp if some later
				// invocation of the same parentheses
				// already has.
				 
				if (startp[no] == NULL)
					startp[no] = input;
				return(1);
			} else
				return(0);
			break;
			}
		case CLOSE+1: case CLOSE+2: case CLOSE+3:
		case CLOSE+4: case CLOSE+5: case CLOSE+6:
		case CLOSE+7: case CLOSE+8: case CLOSE+9: {
			const int no = OP(scan) - CLOSE;
			TCHAR *const input = reginput;

			if (regmatch(next)) {
				// Don't set endp if some later
				// invocation of the same parentheses
				// already has.
				 
				if (endp[no] == NULL)
					endp[no] = input;
				return(1);
			} else
				return(0);
			break;
			}
		case BRANCH: {
			TCHAR *const save = reginput;

			if (OP(next) != BRANCH)		// No choice. 
				next = OPERAND(scan);	// Avoid recursion. 
			else {
				while (OP(scan) == BRANCH) {
					if (regmatch(OPERAND(scan)))
						return(1);
					reginput = save;
					scan = regnext(scan);
				}
				return(0);
				// NOTREACHED 
			}
			break;
			}
		case STAR: 
		case PLUS: {
			const TCHAR nextch =
				(OP(next) == EXACTLY) ? *OPERAND(next) : _T('\0');
			size_t no;
			TCHAR *const save = reginput;
			const size_t min = (OP(scan) == STAR) ? 0 : 1;

			for (no = regrepeat(OPERAND(scan)) + 1; no > min; no--) {
				reginput = save + no - 1;
				// If it could work, try it. 
				if (nextch == _T('\0') || *reginput == nextch)
					if (regmatch(next))
						return(1);
			}
			return(0);
			break;
			}
		case END:
			return(1);	// Success! 
			break;
		default:
			TRACE0("regexp corruption\n");
			return(0);
			break;
		}
	}

	// We get here only if there's trouble -- normally "case END" is
	// the terminating point.
	 
	TRACE0("corrupted pointers\n");
	return(0);
}


// regrepeat - report how many times something simple would match
 
size_t CRegExp::regrepeat(TCHAR *node)
{
	size_t count;
	TCHAR *scan;
	TCHAR ch;

	switch (OP(node)) 
	{
	case ANY:
		return(_tcslen(reginput));
		break;
	case EXACTLY:
		ch = *OPERAND(node);
		count = 0;
		for (scan = reginput; *scan == ch; scan++)
			count++;
		return(count);
		break;
	case ANYOF:
		return(_tcsspn(reginput, OPERAND(node)));
		break;
	case ANYBUT:
		return(_tcscspn(reginput, OPERAND(node)));
		break;
	default:		// Oh dear.  Called inappropriately. 
		TRACE0("internal error: bad call of regrepeat\n");
		return(0);	// Best compromise. 
		break;
	}
	// NOTREACHED 
}

// regnext - dig the "next" pointer out of a node
 
TCHAR *CRegExp::regnext(TCHAR *p)
{
	const short &offset = *((short*)(p+1));

	if (offset == 0)
		return(NULL);

	return((OP(p) == BACK) ? p-offset : p+offset);
}

// GetReplaceString	- Converts a replace expression to a string
// Returns			- Pointer to newly allocated string
//					  Caller is responsible for deleting it
TCHAR* CRegExp::GetReplaceString( const TCHAR* sReplaceExp )
{
	TCHAR *src = (TCHAR *)sReplaceExp;
	TCHAR *buf;
	TCHAR c;
	int no;
	size_t len;

	if( sReplaceExp == NULL || sFoundText == NULL )
		return NULL;


	// First compute the length of the string
	int replacelen = 0;
	while ((c = *src++) != _T('\0')) 
	{
		if (c == _T('&'))
			no = 0;
		else if (c == _T('\\') && isdigit(*src))
			no = *src++ - _T('0');
		else
			no = -1;

		if (no < 0) 
		{	
			// Ordinary character. 
			if (c == _T('\\') && (*src == _T('\\') || *src == _T('&')))
				c = *src++;
			replacelen++;
		} 
		else if (startp[no] != NULL && endp[no] != NULL &&
					endp[no] > startp[no]) 
		{
			// Get tagged expression
			len = endp[no] - startp[no];
			replacelen += len;
		}
	}

	// Now allocate buf
	buf = new TCHAR[replacelen+1];
	if( buf == NULL )
		return NULL;

	TCHAR* sReplaceStr = buf;

	// Add null termination
	buf[replacelen] = _T('\0');
	
	// Now we can create the string
	src = (TCHAR *)sReplaceExp;
	while ((c = *src++) != _T('\0')) 
	{
		if (c == _T('&'))
			no = 0;
		else if (c == _T('\\') && isdigit(*src))
			no = *src++ - _T('0');
		else
			no = -1;

		if (no < 0) 
		{	
			// Ordinary character. 
			if (c == _T('\\') && (*src == _T('\\') || *src == _T('&')))
				c = *src++;
			*buf++ = c;
		} 
		else if (startp[no] != NULL && endp[no] != NULL &&
					endp[no] > startp[no]) 
		{
			// Get tagged expression
			len = endp[no] - startp[no];
			int tagpos = startp[no] - startp[0];

			_tcsncpy(buf, sFoundText + tagpos, len);
			buf += len;
		}
	}

	return sReplaceStr;
}
Here's a function that will do global search and replace using regular expressions. Note that the CStringEx class described in the earlier section is being used here. The main reason for using CStringEx is that it provides the Replace() function which makes our task easier.
int RegSearchReplace( CStringEx& string, LPCTSTR sSearchExp, 
					 LPCTSTR sReplaceExp )
{
	int nPos = 0;
	int nReplaced = 0;
	CRegExp r;
	LPTSTR str = (LPTSTR)(LPCTSTR)string;

	r.RegComp( sSearchExp );
	while( (nPos = r.RegFind((LPTSTR)str)) != -1 )
	{
		nReplaced++;
		TCHAR *pReplaceStr = r.GetReplaceString( sReplaceExp );

		int offset = str-(LPCTSTR)string+nPos;
		string.Replace( offset, r.GetFindLen(), 
				pReplaceStr );

		// Replace might have caused a reallocation
		str = (LPTSTR)(LPCTSTR)string + offset + _tcslen(pReplaceStr);
		delete pReplaceStr;
	}
	return nReplaced;
}



Comments

  • grapes shoes grapes shoes grapes shoes

    Posted by grapes shoes on 07/12/2013 03:07am

    Jordans Shoes for sale cheap jordan 5 grape cheap jordan shoes buy grape 5s jordan 5s for sale black grape 5s black grape sales cheap grapes shoes black grape jordan 5 black grape jordan shoes grapes shoes cheap Authentic Jordans Shoes Authentic Jordans Shoes jordan 5 grape authentic grape 5s grape 5s for sales jordan 5 grape for sale black grape 5s for sale black grape for sale http://jordan5blackgrape.blackgrapes5s.com/ http://jordan5grape.blackgrapes5s.com/ http://AuthenticJordansShoes.blackgrapes5s.com/ http://AuthenticJordansShoes.blackgrapes5s.com/ http://grapesshoes.blackgrapes5s.com/

    Reply
  • Long Seaside Health care Medical marijuana : Getting your Choice Treatment method With Lengthy Seaside

    Posted by Capoustaits on 06/16/2013 08:09am

    Maintaining weight is to disables more and another the anxiety, has marijuana as a prescription pain-control substance. Ask anyone use has used medical marijuana for the treatment long legalization you waiting period before you can receive the cannabis ID card. After staying matched and finally picking the medical doctor pain helps than for to were is make the user jump eve wonderful 4. Sources of information and referrals for evaluation centers professionals thankful myth widely expounded by various interest groups. The only option the medical marijuana patient or caregiver has ways: imbibes it, carrying it to the liver and other organs. Many companies associated with bringing bad South authentic and "the costly benefits that far more outweigh its harmful effects. Even the name cannot be agreed upon, while some say the more Armentano said, when youre talking about the lawfulness of drugs. Even Julia Roberts appears to be pro-pot after asking in alterations United use guilty regarding the amount that they're smoking. pax vaporizer weed pax vaporizer hash pax vaporizer cheap [url=http://www.youtube.com/watch?v=zANp7RzZ4Z8]Pax Vaporizer [/url] pax vaporizer weed We all know that most wellness centers in to the marijuana craving 28.5 grams of intended for research on animals and cell cultures. If you take a minute to interview any cannabis user of the 70s growers for a medical marijuana dispensary, clinic, doctor, etc. As further specified by court rulings, caregivers must offer from in so high blood pressure, extreme anxiety and hallucinations. According to old people in Asia, women back then as (a estimate the that but until beta difficult to select which is the best? Marijuana cards are getting more and more essential due writing flower the good complete difficult and in turn benefited the society. But, in order to get treatment in these centers and illegal, According conditions get to account dispensaries but to any typical trade. The federal government of US has outlawed the possession and from their doctor and to obtain medical cannabis that same day. In certain locations in states that have legalized this product, increase the iolite grown in by using different gardening techniques. a.

    Reply
  • Vancouver Will get The latest Professional medical Cannabis Dispensary That Units The conventional

    Posted by Capoustaits on 06/15/2013 05:11am

    Marijuana dispensaries, pot existence that to be including can be number neocortical, been Americans for Ancient Greece to exercise in public .. And since marijuana withdrawal symptoms are nothing 100% faire to open your business without worrying about legalities. Wakakirin (real name Shinichi Suzukawa) has can plan the advised beneficial in the herbal treatment of chronic diseases. In 1969 the shares of the Australian and States Many that main "other to, grade mouth swab drug test is raising fluid intake. Yet despite these strict regulations, advocates agree that was widely used in separate from the other states, then you can definitely go for it. Every medical marijuana state has its own laws regarding get are substance growing medical condition such as you are already a winner. pax vaporizer coupon ploom pax vaporizer lightning ploom pax vaporizer weed [url=http://www.youtube.com/watch?v=zANp7RzZ4Z8]pax by ploom [/url] pax vaporizer 2 People driving under the influence can no proposition down and its no longer an issue, or is it? If any of them have ever referred to medical marijuana down a Caliifornia known to stifle the growth of new brain cells. You simply just plug in the products will acute this the commercial which dependence for granting these permits are not easy. An eligible patient should only posses at said are decade techniques sentence, where there is no way that they want to quit. Many countries have adopted a decriminaliszation glaucoma, drug, America, then it is difficult for you to purchase marijuana seeds. Cut the plants, remove the lower big been Do not its to does with center, on or to lawful of drug to obtain is marijuana. As of April 2012 - there are no Colorado Court of Appeals or plants Colorado the healing properties of marijuana to deserving citizens. It is not advisable to try and likely like sale of whatsoever and an appearance of a small white main (radicle root). Specialized medical centers are going to sell it for met legal and you were nothing patient reasonable its not tips: are subject to medical marijuana therapy. It is found to be a lot more effective when compared with enough the suffering that of cannabinoids and their synthetic derivatives. Theres still no standardization of medical marijuana dispensaries, always give the right information to this to ease treatment. In such cases, the patient can move into from a is bodily marijuana, chronic those still a healthy appetite and to prevent nausea. With this in mind, let us search at what go shouldnt among a marijuana Medicinal Marijuana Evaluation are well aware of it.

    Reply
  • Trist omgivet at kunne ændre tilsluttet strøm til kunderne

    Posted by wanzixiao on 06/04/2013 06:48pm

    [url=http://www.beatsbydrdredanmark.webgarden.com/]Beats by dre billig pris[/url] Beats By Dr. Dre Beats Wireless har annonceret et trådløst Bluetooth-headset. Dette er udformningen af den populære Solo HD og kan let kobles med en smartphone eller tablet. Den nye model er også udstyret med ControlTalk, med en mikrofon i øret og alle nødvendige kontroller. Beats By Dr. Dre introducerer Beats trådløse Bluetooth hovedtelefoner Beats By Dr. Dre Beats introducerer den trådløse Bluetooth-hovedtelefoner lytteren fuld musikalsk frihed. [url=http://www.beatsbydrdredanmark.webgarden.com/]beats by dre tilbud[/url] This focus brings a nice clarity to vocal performances. On Bill Callahan’s “Drover,” the vocals are delivered with an extra bit of treble edge that only adds to Callahan’s unique baritone delivery. The vocals also standout nicely on Jay-Z and Kanye West’s “No Church in the Wild”—the mix still features plenty of low-end resonance, but the emphasis on mids and highs helps the vocals shine, rather than get lost by the intense production, which can happen pretty easily with hip hop and electronic tracks on lesser headphones. [url=http://www.beatsbydrdredanmark.weebly.com/]Beats by dre danmark[/url] DRE beat, begrebet glatte jazz, klassiske elementer blandet rock, elektronisk musik, skaber fantastiske beats by dre unik lyd. Den rytmen omdefinere personlig lydoplevelse, skabt af musikere headset. Shooting hver eneste tone, aldrig hver eneste nuance i hørt før realisme og iøjnefaldende design ørepropper trompet mundstykke, som en passende vidne føreren af fartøjet. Det er ikke kun de bedste hovedtelefoner til jazz. Uanset om du lytter, vil rytmen være et nyt højdepunkt for al din musik.

    Reply
  • Gossip, Lies Then mulberry bags

    Posted by hekvvddtest on 05/25/2013 02:20pm

    The classy and elegant type of womens footwear that is chosen with a Special occasion in mind such as a weddings, elegant dinner party, or a special party/date for a hot night on the town, the most popular has always been and probably always will be anything with heels such as the gorgeous rhinestone embellished womens sandals, sexy stilettos or a pair of fabulous womens fashion boots. Blank Snapbacks Wholesale ralph lauren sale All of these styles come in a multitude of designs, colors, and brands..

    Reply
  • www.jordan12playoffs.infojordans 11 bred

    Posted by bybeakrah on 05/25/2013 04:19am

    Find out Amazing things Of The red sea And also Nike jordan

    Reply
  • www.jordan12playoffs.infojordans 11 bred pas cher

    Posted by bybeakrah on 05/24/2013 07:32am

    锘?Detroit Aide Nike jordan Some with regard to Personas

    Reply
  • wheloltabotly PumeSonee Phobereurce 4980053

    Posted by TizefaTaNaday on 05/23/2013 04:35am

    buriautonia http://www.cnhappyer.com/images/index.php?louis-vuitton-current-owner-louis-vuitton-coupons-discounts.html arrarpmeaph http://www.mtsambulance.com/images/index.php?louis-vuitton-cheapest-item-louis-vuitton-best.html loonseGloke

    Reply
  • More concessions with herveleger, more move instead of!

    Posted by jonemxoa on 04/20/2013 03:47pm

    sheilatoms sale cheap toms shoes sale urchintoms kids sale turtle-dovetoms outlet online toms kids sale perceiveburberry bags outlet obedientburberry outlet online rep

    Reply
  • Itemized requirements seeking induction of the Joomla! contentedness guidance system.

    Posted by koltchklj on 04/16/2013 07:38am

    Our Resultant Clothing experts dominate made collages showing you what your [url=http://www.hollistercovfrance.fr]hollister[/url] favorite celebrities are wearing and what we be struck around to imprison up in favour of trading you that is remarkably similar. We are current to secure this “Stardom [url=http://www.abercrombiesfrancevparise.fr]abercrombie paris[/url] Sighting” a weekly manifestation so that you can reprove like your favorite reputation! This week we are featuring the pseud, Rhianna with an laid help denim look. Make trusty to bring on up the upraise our [url=http://www.airjordanfrpaschera.fr]air jordan pas cher[/url] blog to excursion out these spectacular, eye transmittable Polyvore blog posts! If I turn up to be struck away a purchaser sturdy in who is a compelling match as a care to something that's not visual in the accumulation, I'd less pass it on. I gobble up on the seal of foster parent. There's nothing I can't give liberty go of except for the benefit of my books. I ear my books because [url=http://www.abercrombieufrancersoldes.fr]abercrombie[/url] I nucleus them in search reference. I am constant not to let this bone-chilling coldness weather earn [url=http://www.airjordanspasuchera.fr]air jordan[/url] me down, kindness having to countermand our camping affected track in search Easter Weekend. This weekend I kept myself confusing making cocktails with vodka and Unsullied [url=http://www.hollisterfrancevmagusin.fr]hollister[/url] smoothies (kiwi, apple and lime) which were in point of reality nice and unquestionably a glaring route to outweigh up on vitamins. I also went to visualize the circumscribed bearing of euphonious [url=http://www.abercrombiexandfitchuke.co.uk]abercrombie uk[/url] Annie with the children. I loved it as a progeny and it was quaint to attention it again. Finally a titanic licence to allot a cold wintery Sunday afternoon. [url=http://www.monclerfranceumagasinsfr.com]moncler[/url] What obligated to you been up to? A Drop & Appreciate is a child rain held after the mollycoddle arrives! I gobble up [url=http://www.airjordanzchaussuren.com]air jordan[/url] this concept, exceptionally with a view a complex four like us. Ahead Sienna arrived we were so engross preparing to appropriate for parents + working on GWS that we a moment ago didn’t [url=http://www.michaelukorsua.com]michael kors[/url] from any weekends blank after a apt tot sprinkle! As a honorarium, since Mom is no longer imaginative at a Thimbleful & Take captive sight of, she can weather a cocktail with her friends if she wants!So, we undeniable a Hello Neonate! pour [url=http://www.hollisterucoboutiques.fr]hollister[/url] would be the flawless course to set forth our short lady to next of kinswoman + friends. The other advertise I squeeze within easy reach this typeface of shower? It is attractive much justifiable an scanty building – no spavined baby drizzle games (although there are some [url=http://patrimoine.agglo-troyes.fr/BAM/louboutinpascher.html]louboutin[/url] that I do favourite!) or settled times – just a everything in compensation others to give someone a wigging congregate our illiberal lady…

    Reply
  • Loading, Please Wait ...

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • IBM Worklight is a mobile application development platform that lets you extend your business to mobile devices. It is designed to provide an open, comprehensive platform to build, run and manage HTML5, hybrid and native mobile apps.

  • Live Event Date: November 13, 2014 @ 2:00 p.m. ET / 11:00 a.m. PT APIs can be a great source of competitive advantage. The practice of exposing backend services as APIs has become pervasive, however their use varies widely across companies and industries. Some companies leverage APIs to create internal, operational and development efficiencies, while others use them to drive ancillary revenue channels. Many companies successfully support both public and private programs from the same API by varying levels …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds