Case Insensitive strstr
Environment: C/C++
It's a frequent task to make a case insensitive search for a string into another string. I believe everyone is able to write code that does this. And I am sure everyone did this many times :) I don't want to bring up here a discussion about code reusability and the benefits of having functions to handle one task instead of pasting same code locally wherever it is needed. However, my experience with programs written by professional developers led me to implement case insensitive search with the same interface as strstr() and post it here, so that everyone can take it and put it to their utility library.
I am including two different ways how to do the case insensitive search. They show the classical memory/speed trade-off. I have the feeling that the reason why case insensitive search function is not included in the runtime library is because there's no solution which is clearly better than all others. The first technique allocates memory for copies of both strings, but it is significantly more time efficient for large strings. The other way has no memory overhead, but takes more CPU time. I didn't compare performance for small strings, where I expect the efficiency to be pretty much the same. If anyone wants to investigate, please post your results.
The second implementation has the additional benefit that it is possible to specify a substring of the first string that will be searched.
I wrote the functions as C++ templates, but they can be converted to C functions very easily.
A. stristr - makes internal copies, converts to lower case and uses strstr
#include <string.h>
#include <malloc.h>
#include <tchar.h>
template <typename CHAR_TYPE>
CHAR_TYPE *stristr
(
CHAR_TYPE * szStringToBeSearched,
const CHAR_TYPE * szSubstringToSearchFor
)
{
CHAR_TYPE * pPos = NULL;
CHAR_TYPE * szCopy1 = NULL;
CHAR_TYPE * szCopy2 = NULL;
// verify parameters
if ( szStringToBeSearched == NULL ||
szSubstringToSearchFor == NULL )
{
return szStringToBeSearched;
}
// empty substring - return input (consistent with strstr)
if ( _tcslen(szSubstringToSearchFor) == 0 ) {
return szStringToBeSearched;
}
szCopy1 = _tcslwr(_tcsdup(szStringToBeSearched));
szCopy2 = _tcslwr(_tcsdup(szSubstringToSearchFor));
if ( szCopy1 == NULL || szCopy2 == NULL ) {
// another option is to raise an exception here
free((void*)szCopy1);
free((void*)szCopy2);
return NULL;
}
pPos = strstr(szCopy1, szCopy2);
if ( pPos != NULL ) {
// map to the original string
pPos = szStringToBeSearched + (pPos - szCopy1);
}
free((void*)szCopy1);
free((void*)szCopy2);
return pPos;
} // stristr(...)
B. strnistr - uses strnicmp in a loop, and may search in a substring of the searched string
#include <string.h>
#include <malloc.h>
#include <tchar.h>
template <typename CHAR_TYPE>
CHAR_TYPE *strnistr
(
CHAR_TYPE * szStringToBeSearched,
const CHAR_TYPE * szSubstringToSearchFor,
const int nStringLen = -1
)
{
int nLen;
int nOffset;
int nMaxOffset;
CHAR_TYPE * pPos;
int nStringLenInt;
// verify parameters
if ( szStringToBeSearched == NULL ||
szSubstringToSearchFor == NULL )
{
return szStringToBeSearched;
}
// get length of the substring
nLen = _tcslen(szSubstringToSearchFor);
// empty substring-return input (consistent w/ strstr)
if ( nLen == 0 ) {
return szStringToBeSearched;
}
if ( nStringLen == -1 || nStringLen >
(int)_tcslen(szStringToBeSearched) )
{
nStringLenInt = _tcslen(szStringToBeSearched);
} else {
nStringLenInt = nStringLen;
}
nMaxOffset = nStringLenInt - nLen;
pPos = szStringToBeSearched;
for ( nOffset = 0; nOffset <= nMaxOffset; nOffset++ ) {
if ( _tcsnicmp(pPos, szSubstringToSearchFor, nLen) == 0 ) {
return pPos;
}
// move on to the next character
pPos++; //_tcsinc was causing problems :(
}
return NULL;
} // strnistr(...)
Here's a fragment of code I used to compare the performance of the functions:
{
int i;
char x1[3 * 100000];
char x2[3 * 10000];
for ( i = 0; i < sizeof(x1); i++ ) x1[i] = 'A'+ (i % 3);
for ( i = 0; i < sizeof(x2); i++ ) x2[i] = 'a'+ (i % 3);
x1[sizeof(x1) - 2] = x2[sizeof(x2) - 2] = 'y';
x1[sizeof(x1) - 1] = x2[sizeof(x2) - 1] = 0;
// use a profiler to compare
printf("%p\n", stristr(x1, x2));
printf("%p\n", strnistr(x1, x2));
}

Comments
May be it is more simple way?
Posted by Legacy on 01/28/2002 12:00amOriginally posted by: Alexander
char* strstri(char *s1, char *s2)
{
int i,j,k;
for(i=0;s1[i];i++)
for(j=i,k=0;tolower(s1[j])==tolower(s2[k]);j++,k++)
if(!s2[k+1])
return (s1+i);
return NULL;
}
-
ReplySimpler, quicker, but not practical for everyone...
Posted by GuruJudge on 04/24/2004 06:08amchar *strcasestr(const char *arg1, const char *arg2) { const char *a, *b; for(;*arg1;*arg1++) { a = arg1; b = arg2; while((*a++ | 32) == (*b++ | 32)) if(!*b) return (arg1); } return(NULL); }ReplyQuestion about CString format
Posted by Legacy on 11/01/2001 12:00amOriginally posted by: Yubo Dong
Given a float number,for instance, 1123.2345, anyone
know how to convert it a CString in the format of "#.######E+03", i.e. 1.123235E+03, using Format() function? Please notice that the only TWO digits after + sign, not default three digits.
Replydifferent code pages
Posted by Legacy on 10/14/2001 12:00amOriginally posted by: Robert
If you implement a simple weight table, you can easily modify the program to be a case insenstive, diacritic senstive/insensitive string search.
ReplyBoyer-Moore is even faster (maybe fastest)
Posted by Legacy on 10/12/2001 12:00amOriginally posted by: Paul McKenzie
Just to let you know, the Boyer-Moore search algorithm is even faster for large strings than simple loops, and is the method used if speed is the ultimate goal.
Reply