Case Insensitive strstr

Environment: C/C++

It's a frequent task to make a case insensitive search for a string into another string. I believe everyone is able to write code that does this. And I am sure everyone did this many times :) I don't want to bring up here a discussion about code reusability and the benefits of having functions to handle one task instead of pasting same code locally wherever it is needed. However, my experience with programs written by professional developers led me to implement case insensitive search with the same interface as strstr() and post it here, so that everyone can take it and put it to their utility library.

I am including two different ways how to do the case insensitive search. They show the classical memory/speed trade-off. I have the feeling that the reason why case insensitive search function is not included in the runtime library is because there's no solution which is clearly better than all others. The first technique allocates memory for copies of both strings, but it is significantly more time efficient for large strings. The other way has no memory overhead, but takes more CPU time. I didn't compare performance for small strings, where I expect the efficiency to be pretty much the same. If anyone wants to investigate, please post your results.

The second implementation has the additional benefit that it is possible to specify a substring of the first string that will be searched.

I wrote the functions as C++ templates, but they can be converted to C functions very easily.

A. stristr - makes internal copies, converts to lower case and uses strstr

#include <string.h>
#include <malloc.h>
#include <tchar.h>

template <typename CHAR_TYPE>
CHAR_TYPE *stristr
(
   CHAR_TYPE         *  szStringToBeSearched, 
   const CHAR_TYPE   *  szSubstringToSearchFor
)
{
   CHAR_TYPE   *  pPos = NULL;
   CHAR_TYPE   *  szCopy1 = NULL;
   CHAR_TYPE   *  szCopy2 = NULL;

   // verify parameters
   if ( szStringToBeSearched == NULL || 
        szSubstringToSearchFor == NULL ) 
   {
      return szStringToBeSearched;
   }

   // empty substring - return input (consistent with strstr)
   if ( _tcslen(szSubstringToSearchFor) == 0 ) {
      return szStringToBeSearched;
   }

   szCopy1 = _tcslwr(_tcsdup(szStringToBeSearched));
   szCopy2 = _tcslwr(_tcsdup(szSubstringToSearchFor));

   if ( szCopy1 == NULL || szCopy2 == NULL  ) {
      // another option is to raise an exception here
      free((void*)szCopy1);
      free((void*)szCopy2);
      return NULL;
   }

   pPos = strstr(szCopy1, szCopy2);

   if ( pPos != NULL ) {
      // map to the original string
      pPos = szStringToBeSearched + (pPos - szCopy1);
   }

   free((void*)szCopy1);
   free((void*)szCopy2);

   return pPos;
} // stristr(...)

B. strnistr - uses strnicmp in a loop, and may search in a substring of the searched string

#include <string.h>
#include <malloc.h>
#include <tchar.h>

template <typename CHAR_TYPE>
CHAR_TYPE *strnistr
(
   CHAR_TYPE         *  szStringToBeSearched,
   const CHAR_TYPE   *  szSubstringToSearchFor, 
   const int            nStringLen = -1
)
{
   int            nLen;
   int            nOffset;
   int            nMaxOffset;
   CHAR_TYPE   *  pPos;
   int            nStringLenInt;

   // verify parameters
   if ( szStringToBeSearched == NULL || 
        szSubstringToSearchFor == NULL ) 
   {
      return szStringToBeSearched;
   }

   // get length of the substring
   nLen = _tcslen(szSubstringToSearchFor);

   // empty substring-return input (consistent w/ strstr)
   if ( nLen == 0 ) {
      return szStringToBeSearched;
   }

   if ( nStringLen == -1 || nStringLen > 
               (int)_tcslen(szStringToBeSearched) ) 
   {
      nStringLenInt = _tcslen(szStringToBeSearched);
   } else {
      nStringLenInt = nStringLen;
   }

   nMaxOffset = nStringLenInt - nLen;

   pPos = szStringToBeSearched;

   for ( nOffset = 0; nOffset <= nMaxOffset; nOffset++ ) {

      if ( _tcsnicmp(pPos, szSubstringToSearchFor, nLen) == 0 ) {
         return pPos;
      }
      // move on to the next character
      pPos++; //_tcsinc was causing problems :(
   }

   return NULL;
} // strnistr(...)

Here's a fragment of code I used to compare the performance of the functions:

{
   int      i;
   char     x1[3 * 100000];
   char     x2[3 * 10000];

   for ( i = 0; i < sizeof(x1); i++ ) x1[i] = 'A'+ (i % 3);
   for ( i = 0; i < sizeof(x2); i++ ) x2[i] = 'a'+ (i % 3);
   x1[sizeof(x1) - 2] = x2[sizeof(x2) - 2] = 'y';
   x1[sizeof(x1) - 1] = x2[sizeof(x2) - 1] = 0;
   // use a profiler to compare
   printf("%p\n", stristr(x1, x2));
   printf("%p\n", strnistr(x1, x2));
}



Comments

  • Discount Oakley Twenty sale for cheap

    Posted by yobhdsagw on 06/29/2013 10:25pm

    Fake RayBan ,A brand Oakley progress, it is by accepting reality of some duration and job definitely grab the development of new style. Because of this , each sunshade found in manufacturing the most up-to-date glasses trick. In most cases, the sunglasses have a particular affinity to decorate, they might be flamboyant, after which it staggering, to ensure the plan Oakley generation will flaunt it there isn't a treatment design. Fake ray bans ,Whether you are in a family, skating about the beach, or just nursing your hangover, cheap Oakley sunglasses to share with you your perfect accessories. Choice problems, you can get benefits. Select the specific type of sunglasses to choose a model, it truly is made for the actual required level of your taste and personality. Oakley Jawbone ,The many defending the clothing colors must protect yourself, and sometimes terrible events, for the reason that situation is like sunlight: unique. Whether you've experience of an oval-shaped or spherical, square, or extensive, you don't fret, Oakley collection, you'll discover this can be specifically for anyone. In the direct sun will never remove the threat, nevertheless expressed in skin colour and eyes can tolerate the expression with the extended spread of ultraviolet illumination damage. A person's eye are unable to view the usage of Oakley sunglasses as a way to reduce the photovoltaic UV. If even a fake Oakleys elderly, they in no way desire, loss of fresh design and style - it is, there exists a new eyes, every a month. They choose the hormone laboratory, together with wood products. They're usually used plus in winter sports activities, with boating. Product design or selection of materials, has become a combination of advanced scientific experiments and testing to be sure the comfort and high-quality, high level of integration and functionality and fashion. Oakley males and females, brand sunglasses brand leader, you wear the most effective art films, high-quality materials and better technology made sunglasses. The durable anti-allergy frame three-dimensional metal sculpture and built ultra-light titanium, making sure that any weather or light conditions, and comfy sunglasses. Oakley sunglasses for just a class itself, it's got proven to be an authentic fashion craze, and possess found their distance to some well-known Hollywood film. As a way to meet the different needs of shoppers, either now or in the future, Oakley will not stop the pace of progress.

    Reply
  • May be it is more simple way?

    Posted by Legacy on 01/28/2002 12:00am

    Originally posted by: Alexander

    char* strstri(char *s1, char *s2)
    {
    int i,j,k;

    for(i=0;s1[i];i++)
    for(j=i,k=0;tolower(s1[j])==tolower(s2[k]);j++,k++)
    if(!s2[k+1])
    return (s1+i);

    return NULL;
    }

    • Simpler, quicker, but not practical for everyone...

      Posted by GuruJudge on 04/24/2004 06:08am

      char *strcasestr(const char *arg1, const char *arg2)
      {                  
         const char *a, *b;
                         
         for(;*arg1;*arg1++) {
                         
           a = arg1;
           b = arg2;
           
           while((*a++ | 32) == (*b++ | 32))
             if(!*b) 
               return (arg1);
           
         }
           
         return(NULL);
      }

      Reply
    Reply
  • Question about CString format

    Posted by Legacy on 11/01/2001 12:00am

    Originally posted by: Yubo Dong

    Given a float number,for instance, 1123.2345, anyone
    know how to convert it a CString in the format of "#.######E+03", i.e. 1.123235E+03, using Format() function? Please notice that the only TWO digits after + sign, not default three digits.

    Reply
  • different code pages

    Posted by Legacy on 10/14/2001 12:00am

    Originally posted by: Robert

    If you implement a simple weight table, you can easily modify the program to be a case insenstive, diacritic senstive/insensitive string search.

    Reply
  • Boyer-Moore is even faster (maybe fastest)

    Posted by Legacy on 10/12/2001 12:00am

    Originally posted by: Paul McKenzie

    Just to let you know, the Boyer-Moore search algorithm is even faster for large strings than simple loops, and is the method used if speed is the ultimate goal.

    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Packaged application development teams frequently operate with limited testing environments due to time and labor constraints. By virtualizing the entire application stack, packaged application development teams can deliver business results faster, at higher quality, and with lower risk.

  • A modern mobile IT strategy is no longer an option, it is an absolute business necessity. Today's most productive employees are not tied to a desk, an office, or a location. They are mobile. And your company's IT strategy has to be ready to support them with easy, reliable, 24/7 access to the business information they need, from anywhere in the world, across a broad range of communication devices. Here's how some of the nation's most progressive corporations are meeting the many needs of their mobile workers …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds