CString Extension for String Parsing [sscanf()]

Environment: Win98SE, Visual C++ 6

Introduction

The class provided here extends the CString class by one function: Scanf().

The CString class does have a function Format(), which writes formated data into the CString object (just like sprintf() formats data into a character array). There is no equivalent to the sscanf() function. This is quite annoying, because every time you want to parse a string, you have to code something like this:

int      i;
CString  str("100");

if ( sscanf( str, "%d", &i ) == 1 )
{
   ...
}

Wouldn't it be much easier and clearer to write:

if ( s.Scanf( "%d", &i ) == 1 )
{
   ...
}
Or with integrated error detection:
     if ( s.Scanf( 1, "%d", &i ) )
     {
        ...
     }

I came up with the solution presented here after some unsuccessful attempts to code this function and I think you may be interested in them.

First try

Simple take the argument list pointer and pass it through to sscanf():

int CStringEx::Scanf( LPCTSTR format, ... )
{
   va_list argList;
   va_start( argList, format );
   return _stscanf( GetBuffer(0), format, argList );
}

That does not work, because you cannot pass variable argument lists like that. There exists a function for text formating (vsprintf()) which supports that feature, but there is no equivalent function for sscanf()!

Note: some compilers do have the function vsscanf() in their runtime library, but not the one Microsoft ships.

Second try

Find out what sscanf() is doing and reprogram it.

sscanf() is converting the string into a stream and then calls _input(). I reprogrammed that function to get the desired vsscanf() function:

int __cdecl vsscanf ( REG2 const char *string, 
                      const char *format, 
                      va_list arglist )
{
  FILE str;
  REG1 FILE *infile = &str;
  REG2 int retval;

  _ASSERTE(string != NULL);
  _ASSERTE(format != NULL);

  infile->_flag = _IOREAD|_IOSTRG|_IOMYBUF;
  infile->_ptr = infile->_base = (char *) string;
  infile->_cnt = strlen(string);

  retval = (_input(infile,format,arglist));

  return(retval);
}

This does not work either, because the _input() function is not exported in the DLL version of MFC (as far as I know it is present in the static version of the MFC library, but I haven't tested that). As a result you get an unresolved symbol when linking the program.

Third try

Manually adjust the required arguments for function sscanf().

I found an interesting article from Emmanuel Mogenet in dejanews: How to vfscanf in Win32.

He implemented the fscanf() function in his own class by manually creating the required arguments for the fscanf() call. He did that by inserting the necessary arguments in the stack frame using assembly instructions.

static uint32_t savedESP;
static uint32_t savedEAX;
static uint32_t savedRET;
static uint32_t savedThis;

static void *truefscanf= (void*) fscanf;

class File
{
public:
   ...
   int     scanf( const char *, ... );
   ...
private:
   void    *file;
};

__declspec(naked) int File::scanf( const char *, ... )
{
  _asm
  {
    mov   dword ptr savedESP,esp    // Save esp in static
    mov   eax, dword ptr [esp]      // eax <- return address
    mov   dword ptr savedRET,eax    // Save return address in static
    mov   eax,dword ptr [esp+4]     // eax <- this
    mov   dword ptr savedThis,eax   // Save this in static
    add   esp, 8                    // esp+= 8 (crush retaddr,this)
    mov   eax,dword ptr [eax]       // eax <- this->filePtr
    push  eax                       // Push filePtr on argStack
    call  truefscanf                // Call regular fscanf
    mov   dword ptr savedEAX,eax    // Save eax value in static
    mov   esp, dword ptr savedESP   // Restore esp
    mov   eax, dword ptr savedRET   // Restore return value
    mov   dword ptr [esp], eax
    mov   eax, dword ptr savedThis  // Restore this
    mov   dword ptr [esp+4], eax
    mov   eax, dword ptr savedEAX   // Restore eax
    ret
  }
}
I tried to get it to work for sscanf() stopping after half an hour for the following reasons:
  • I am no assembler freak.
  • I don't like using code that I don't understand in a core function.
  • It only works for SBCS.
  • It's not thread safe.

Solution

Finally I came up with a solution which is not perfect, but works. The key idea is that all arguments passed to sscanf() must be pointers. This is important, because to retrieve an argument with va_arg() you have to know its type (va_arg() needs to know the size it occupies on the stack). With this knowledge it is possible to retrieve all arguments, store them locally and call the sscanf() function manually with the string stored in the CString derived class. The only problem left is to find out how many arguments were passed to the function.

The easiest approach is to pass the no. of arguments in the format specifier as an extra argument to the Scanf() function. The implementation then is straight forward. It also makes the function interface clearer, because this version of Scanf() returns FALSE if sscanf() fails.

BOOL CStringEx::Scanf( int numArgs, LPCTSTR format, ... )
{
  int       i;
  int       numScanned;
  void      *argPointer[CSTRINGEX_SCANF_MAX_ARGS];
  va_list   arglist;

  if ( numArgs > CSTRINGEX_SCANF_MAX_ARGS )
  {
    // This function can only handle <CSTRINGEX_SCANF_MAX_ARGS> 
    //(10) arguments!
    ASSERT( !TRUE );
    return FALSE;
  }

  va_start( arglist, format );
  {
    // Get all arguments from stack
    for ( i=0; i<numArgs; i++ )
    {
      argPointer[i] = va_arg( arglist, void * );
    }
  }
  va_end( arglist );

  // Call sscanf with correct no. of arguments
  numScanned = sscanfWrapper( numArgs, format, argPointer );

  return (numScanned == numArgs);
}

Don't worry about the sscanfWrapper() function. It is just a wrapper which calls sscanf() with the correct no. of arguments. The following code fragment shows how it works:

int CStringEx::sscanfWrapper( int numArgs, LPCTSTR format, void **p )
{
  switch ( numArgs )
  {
    case  0: return 0;
    case  1: return _stscanf( m_pchData, format, ADD_ARGS_1  );
    ...
    case  9: return _stscanf( m_pchData, format, ADD_ARGS_9  );
    case 10: return _stscanf( m_pchData, format, ADD_ARGS_10 );
  }
        
  // When extending max. no. of arguments [CSTRINGEX_SCANF_MAX_ARGS],
  // this function must be updated!
  ASSERT( !TRUE );
  return 0;
}

The macros ADD_ARGS_1 ... ADD_ARGS_10 expand the variable arument list:

#define ADD_ARGS_1  p[0]
#define ADD_ARGS_2  p[0], p[1]
   ...

When the number of arguments is not known, we extract that information from the format string. Note that the algorithm I use is very basic and will not work for all possible cases. However, most of the time all you want to do is scan a simple value from the string.

The format specifier string is parsed in the following way:

  1. Two consecutive percent characters ("%%") are ignored (they represent the percentage character).
  2. If an asterix follows immediatelly to a percentage character ("%*"), then the field is scanned, but not stored.
  3. In all other cases, when a percentage character appears, an argument from the stack is retrieved and stored.
int CStringEx::Scanf( LPCTSTR format, ... )
{
  int       numArgs;
  int       numScanned;
  void      *argPointer[CSTRINGEX_SCANF_MAX_ARGS];
  va_list   arglist;
  LPTSTR    currBuff;
  _TXCHAR   currChar;
  
  numArgs	= 0;
  currBuff = _tcschr( format, _TXCHAR( '%' ) );
  if ( currBuff == NULL )
  {
    // No valid format specifier!
    ASSERT( !TRUE );
    return 0;
  }
        
  va_start( arglist, format );
  {
    do
    {
      // Move pointer to next character
      currBuff = _tcsinc( currBuff );
      currChar = _TXCHAR( *currBuff );
             
      if ( currChar == NULL )
      {
        // End of string
        //      -> processing will stop!
      }
      else if ( currChar == _TXCHAR( '*' ) )
      {
        // "%*" suppresses argument assignment
        //      -> do not get argument from stack!
      }
      else if ( currChar == _TXCHAR( '%' ) )
      {
        // "%%" substitutes "%" character!
        //      -> do not get argument from stack!
        //      -> Increment to next character
                
        currBuff = _tcsinc( currBuff );
      }
      else
      {
        if ( numArgs >= CSTRINGEX_SCANF_MAX_ARGS )
        {
          // This function can only handle
          // <CSTRINGEX_SCANF_MAX_ARGS> (10) arguments!
          ASSERT( !TRUE );
          return 0;
        }
                 
        argPointer[numArgs++] = va_arg( arglist, void * );
      }
              
      currBuff = _tcschr( currBuff, _TXCHAR( '%' ) );
          
    } while ( currBuff != NULL );
  }
  va_end( arglist );
        
  // Call sscanf with correct no. of arguments
  numScanned = sscanfWrapper( numArgs, format, argPointer );
        
  return numScanned;
}

Remarks

  • The maximum no of variable arguments the Scanf functions can handle is defined as a constant CSTRINGEX_SCANF_MAX_ARGS (currently 10). If this is a limitation for you, just increase the constant and adapt the sscanfWrapper() function. The reason I used the constant is that I do not have to allocate/free any memory.


  • The code should work with SBCS, MBCS and UNICODE. However I haven't tested the UNICODE version.


  • When you move the mouse on a CString object while debugging, a popup window occurs showing the current value of the object. In order to have the same effect for a CStringEx object do the following:


    1. Open file ~\Common\MSDev98\Bin\autoexp.dat, where ~ is the path of your Developer Studio installation.


    2. Goto section [AutoExpand] and add the following line:

      CStringEx =<m_pchData,st>

Note 1: You have to restart developer studio after saving the changed file.
Note 2: This very handy file is described in detail in Ramon de Klein's article Tune the debugger using AutoExp.dat

Downloads

The source files contain only the above described funtions and cannot be used as a replacement for CString. Personally, I use Zafir Anjum's CString Extension class which can be found at www.codeguru.com. So if you want a complete working example, you should download his class and add my functions to it.

Download source - 2 Kb



Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds