Case Insensitive strstr

Environment: C/C++

It's a frequent task to make a case insensitive search for a string into another string. I believe everyone is able to write code that does this. And I am sure everyone did this many times :) I don't want to bring up here a discussion about code reusability and the benefits of having functions to handle one task instead of pasting same code locally wherever it is needed. However, my experience with programs written by professional developers led me to implement case insensitive search with the same interface as strstr() and post it here, so that everyone can take it and put it to their utility library.

I am including two different ways how to do the case insensitive search. They show the classical memory/speed trade-off. I have the feeling that the reason why case insensitive search function is not included in the runtime library is because there's no solution which is clearly better than all others. The first technique allocates memory for copies of both strings, but it is significantly more time efficient for large strings. The other way has no memory overhead, but takes more CPU time. I didn't compare performance for small strings, where I expect the efficiency to be pretty much the same. If anyone wants to investigate, please post your results.

The second implementation has the additional benefit that it is possible to specify a substring of the first string that will be searched.

I wrote the functions as C++ templates, but they can be converted to C functions very easily.

A. stristr - makes internal copies, converts to lower case and uses strstr

#include <string.h>
#include <malloc.h>
#include <tchar.h>

template <typename CHAR_TYPE>
CHAR_TYPE *stristr
(
   CHAR_TYPE         *  szStringToBeSearched, 
   const CHAR_TYPE   *  szSubstringToSearchFor
)
{
   CHAR_TYPE   *  pPos = NULL;
   CHAR_TYPE   *  szCopy1 = NULL;
   CHAR_TYPE   *  szCopy2 = NULL;

   // verify parameters
   if ( szStringToBeSearched == NULL || 
        szSubstringToSearchFor == NULL ) 
   {
      return szStringToBeSearched;
   }

   // empty substring - return input (consistent with strstr)
   if ( _tcslen(szSubstringToSearchFor) == 0 ) {
      return szStringToBeSearched;
   }

   szCopy1 = _tcslwr(_tcsdup(szStringToBeSearched));
   szCopy2 = _tcslwr(_tcsdup(szSubstringToSearchFor));

   if ( szCopy1 == NULL || szCopy2 == NULL  ) {
      // another option is to raise an exception here
      free((void*)szCopy1);
      free((void*)szCopy2);
      return NULL;
   }

   pPos = strstr(szCopy1, szCopy2);

   if ( pPos != NULL ) {
      // map to the original string
      pPos = szStringToBeSearched + (pPos - szCopy1);
   }

   free((void*)szCopy1);
   free((void*)szCopy2);

   return pPos;
} // stristr(...)

B. strnistr - uses strnicmp in a loop, and may search in a substring of the searched string

#include <string.h>
#include <malloc.h>
#include <tchar.h>

template <typename CHAR_TYPE>
CHAR_TYPE *strnistr
(
   CHAR_TYPE         *  szStringToBeSearched,
   const CHAR_TYPE   *  szSubstringToSearchFor, 
   const int            nStringLen = -1
)
{
   int            nLen;
   int            nOffset;
   int            nMaxOffset;
   CHAR_TYPE   *  pPos;
   int            nStringLenInt;

   // verify parameters
   if ( szStringToBeSearched == NULL || 
        szSubstringToSearchFor == NULL ) 
   {
      return szStringToBeSearched;
   }

   // get length of the substring
   nLen = _tcslen(szSubstringToSearchFor);

   // empty substring-return input (consistent w/ strstr)
   if ( nLen == 0 ) {
      return szStringToBeSearched;
   }

   if ( nStringLen == -1 || nStringLen > 
               (int)_tcslen(szStringToBeSearched) ) 
   {
      nStringLenInt = _tcslen(szStringToBeSearched);
   } else {
      nStringLenInt = nStringLen;
   }

   nMaxOffset = nStringLenInt - nLen;

   pPos = szStringToBeSearched;

   for ( nOffset = 0; nOffset <= nMaxOffset; nOffset++ ) {

      if ( _tcsnicmp(pPos, szSubstringToSearchFor, nLen) == 0 ) {
         return pPos;
      }
      // move on to the next character
      pPos++; //_tcsinc was causing problems :(
   }

   return NULL;
} // strnistr(...)

Here's a fragment of code I used to compare the performance of the functions:

{
   int      i;
   char     x1[3 * 100000];
   char     x2[3 * 10000];

   for ( i = 0; i < sizeof(x1); i++ ) x1[i] = 'A'+ (i % 3);
   for ( i = 0; i < sizeof(x2); i++ ) x2[i] = 'a'+ (i % 3);
   x1[sizeof(x1) - 2] = x2[sizeof(x2) - 2] = 'y';
   x1[sizeof(x1) - 1] = x2[sizeof(x2) - 1] = 0;
   // use a profiler to compare
   printf("%p\n", stristr(x1, x2));
   printf("%p\n", strnistr(x1, x2));
}



Comments

  • May be it is more simple way?

    Posted by Legacy on 01/28/2002 12:00am

    Originally posted by: Alexander

    char* strstri(char *s1, char *s2)
    {
    int i,j,k;

    for(i=0;s1[i];i++)
    for(j=i,k=0;tolower(s1[j])==tolower(s2[k]);j++,k++)
    if(!s2[k+1])
    return (s1+i);

    return NULL;
    }

    • Simpler, quicker, but not practical for everyone...

      Posted by GuruJudge on 04/24/2004 06:08am

      char *strcasestr(const char *arg1, const char *arg2)
      {                  
         const char *a, *b;
                         
         for(;*arg1;*arg1++) {
                         
           a = arg1;
           b = arg2;
           
           while((*a++ | 32) == (*b++ | 32))
             if(!*b) 
               return (arg1);
           
         }
           
         return(NULL);
      }

      Reply
    Reply
  • Question about CString format

    Posted by Legacy on 11/01/2001 12:00am

    Originally posted by: Yubo Dong

    Given a float number,for instance, 1123.2345, anyone
    know how to convert it a CString in the format of "#.######E+03", i.e. 1.123235E+03, using Format() function? Please notice that the only TWO digits after + sign, not default three digits.

    Reply
  • different code pages

    Posted by Legacy on 10/14/2001 12:00am

    Originally posted by: Robert

    If you implement a simple weight table, you can easily modify the program to be a case insenstive, diacritic senstive/insensitive string search.

    Reply
  • Boyer-Moore is even faster (maybe fastest)

    Posted by Legacy on 10/12/2001 12:00am

    Originally posted by: Paul McKenzie

    Just to let you know, the Boyer-Moore search algorithm is even faster for large strings than simple loops, and is the method used if speed is the ultimate goal.

    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • As all sorts of data becomes available for storage, analysis and retrieval - so called 'Big Data' - there are potentially huge benefits, but equally huge challenges...
  • The agile organization needs knowledge to act on, quickly and effectively. Though many organizations are clamouring for "Big Data", not nearly as many know what to do with it...
  • Cloud-based integration solutions can be confusing. Adding to the confusion are the multiple ways IT departments can deliver such integration...

Most Popular Programming Stories

More for Developers

RSS Feeds

Thanks for your registration, follow us on our social networks to keep up-to-date