Originally posted by: Ed Maher
One small niggle, myself and a colleague ported the code to run under HP-UX, and found that we kept getting a Bus Error
in regnext, because an arbitrary pointer (not necessarily word-aligned) was being chaged to be a pointer to a short, and then used to access the value as a short:
const short &offset = *((short*)(p+1));
Accessing 'offset' when p+1 is not word aligned causes the Bus Error. Word alignment seems to be required when manipulating integers on this architecture.
I replaced the code with explicit access to each character, as below:
static inline char* regnext( char* p )
{
//ESM 20010330 Casting pointers, and using them is not permitted under some Unix
//ESM 20010330 architectures that are fussy about alignment
//ESM 20010330 -- const short &offset = *((short*)(p+1));
short offset = (short)(((short)*(p+2)) + (256*((short)*(p+1))));
if (offset == 0)
return(NULL);
return((OP(p) == BACK) ? p-offset : p+offset);
}
I also changed the regtail function to explicitly store the data in the same order as it is retrieved, so there are no problems moving between big and little endian architectures:
void CRegCompiler::regtail(char* p, char* val)
{
char* scan;
char* temp;
// Find last node.
for (scan = p; (temp = regnext(scan)) != NULL; scan = temp)
continue;
//ESM 20010330 Casting pointers, and using them is not permitted under some Unix
//ESM 20010330 architectures that are fussy about alignment
//ESM 20010330 -- *((short *)(scan+1)) = (short)((OP(scan) == BACK) ? scan - val : val - scan);
int scanvalue=0;
if (OP(scan) == BACK)
{
scanvalue=(scan - val);
}
else
{
scanvalue=(val - scan);
}
*(scan+1)=(char)(scanvalue / 256); //MSB
*(scan+2)=(char)(scanvalue % 256); //LSB
Values=" << (int) *(scan+1) << "," << (int) *(scan+2) << endl;
}
Reply
Originally posted by: Julian W. Girouard Jr.
To address the problem of Unicode command line parameters (for example, in the Test1 program), it looks like you have to use wmain instead of main if you want to access unicode command line parameters using argv. This is easy to accomplish by changing this line of code:
void main (int argc, TCHAR ** argv)
to this ugly thing:
void
-Julian
First off, let me just say I love this Regexp class - very useful!
#ifdef _UNICODE
wmain
#else
main
#endif
(int argc, TCHAR ** argv)
Works for me! :)
May your RAM and good logic never fail you!
Originally posted by: Dmitry
Is there any way to make this class not be greedy?
Originally posted by: Bill Bell
The following code is a console app generated using the VC++6 wizard and modified slightly to use one of the examples in the doc. I've pared it down almost as much as possible. When run in debug mode an assertion fails. Remove the line containing 're[1]' and 're[2]' (or eliminate debug calls) and all appears well.
Again, my thanks for the use of this code.
(Following stuff appears in output window:
"Detected memory leaks!
// try regexp.cpp : Defines the entry point for the console application.
#include "stdafx.h"
#ifdef _DEBUG
/////////////////////////////////////////////////////////////////////////////
CWinApp theApp;
using namespace std;
int _tmain(int argc, TCHAR* argv[], TCHAR* envp[])
// initialize MFC and print and error on failure
Regexp re( "^[\t ]*(.*)[\t ]*\\((.*)\\)" );
scanf ( "%s", &c );
}
return nRetCode;
I appreciate having the use of 'regexp'. However, I seem to have detected a memory leak.
Dumping objects ->
strcore.cpp(118) : {44} normal block at 0x00A71DF0, 40 bytes long.
Data: < wyrd> 01 00 00 00 1B 00 00 00 1B 00 00 00 77 79 72 64
Object dump complete.
The thread 0xFFC59745 has exited with code 3 (0x3).
The program 'C:\AGENTPROJECT\PWAFilter\try regexp\Debug\try regexp.exe' has exited with code 3 (0x3).")
//
#include "try regexp.h"
#include "regexp.h"
#define new DEBUG_NEW
#undef THIS_FILE
static char THIS_FILE[] = __FILE__;
#endif
// The one and only application object
{
int nRetCode = 0;
if (!AfxWinInit(::GetModuleHandle(NULL), NULL, ::GetCommandLine(), 0))
{
// TODO: change error code to suit your needs
cerr << _T("Fatal Error: MFC initialization failed") << endl;
nRetCode = 1;
}
else
{
CString str( "wyrdrune.com!kelly (Kelly)\n" );
char c;
if ( re.Match( str ) && re.SubStrings() == 2 ) {
cout << "name: " << ( LPCTSTR ) re[2] << ", addr: " << ( LPCTSTR ) re[1] << endl;
}
}
Originally posted by: Mervyn Quah
It appears that escaped special characters (ie, newline - "\n", tab - "\t", etc.) are not parsed the same as escaped parenthesis ("\(","\)") and replace strings ("\1", "\2", etc.). When "\n" is entered by say, an edit box, what is passed is actually the string "\n"; that is, two characters "\" + "n". Regexp doesn't recognize these two characters as an escaped newline but rather an escaped "n" (that is, "\\n"), and does a search for "n". Regexp does however recognize the two characters "\" + "(" or "\" + "1" as escape sequences.
The only workaround is to pre-process the string passed by the edit box and replace all occurences of the two characters "\" + "n" with one newline character.
ReplyOriginally posted by: Nils Antonsen
original:
changed to:
This seems to correct the problem.
regards Nils
There is a bug in regexp::regexec() in the for() clause at line 1100
causing the input string to be overrunned by missing the zero termination.
for ( LPTSTR s = string; s != NULL; s = _tcschr( s+1 , regstart ) )
if ( executor.regtry( s) ) return true;
return false;
for ( LPTSTR s = string; s != NULL; s = _tcschr( s+1 , regstart ) )
{
if ( executor.regtry( s) ) return true;
if (*s == 0) return false;
}
return false;
Originally posted by: Lothar A. Haensler
Just downloaded the source and compiled it to a DLL.
I have problem with Substrings.
Given a pattern "NLSGet(\(\".*\"\))"
and a string like "this is an nlsGet(\"test\")" everything works fine (re[1] return "test").
As soon as there is more than one sub-pattern in my regexp, the substrings returned by the [] operator are all bogus.
e.g.
"(NLSGet|NLSGetv)(.*)"
and re.Match("this is an NLSGet(\"test\")")
SubStrings returns 2 (OK)
rgx[1] returns "N" which is wrong (should be "NLSGet" or "NLSGetv"; I debugged operator[] and the length is really 6, but after memcpy only 1 character is in the buffer (more specifically there is a 0x0 in the 2nd position of the buffer).
I tried lots of things:
compiling different versions with and without _MBCS, with and without _UNICODE. Without the _MBCS I get link errors; your documentation says MBCS is not supported.
I couldn't get any of them to work.
Please help.
What are the "right" compiler switches? Is there a VC 6 compatible version?
Desperately,
Lothar
Originally posted by: Stefano Lazzaretto
Great class. It seems however that the sub-expressions "\0", "\1", "\2"... do not work with "word mathing" patterns. For instance with (\<foo\>) as a regular expression, I can specify "&", but not "\0" nor "\1" in the replace expression. I'm not so familiar with Harry Spencer's code to find out how to add it, but some comments in your source seem to confirm this lack. Some hints?
ReplyOriginally posted by: Thomas Hill
Does this class work with Visual C++ 16 bit. If it does not can you make it work with a 16 bit compiler.
Thanks
Tom Hill
Reply