CString Extension for String Parsing [sscanf()]

Environment: Win98SE, Visual C++ 6

Introduction

The class provided here extends the CString class by one function: Scanf().

The CString class does have a function Format(), which
writes formated data into the CString object (just like sprintf()
formats data into a character array). There is no equivalent to the sscanf() function. This is quite annoying, because every time you want to parse a string, you have to code something like this:


int i;
CString str(“100”);

if ( sscanf( str, “%d”, &i ) == 1 )
{

}

Wouldn’t it be much easier and clearer to write:


if ( s.Scanf( “%d”, &i ) == 1 )
{

}

Or with integrated error detection:


if ( s.Scanf( 1, “%d”, &i ) )
{

}

I came up with the solution presented here after some unsuccessful attempts to code
this function and I think you may be interested in them.

First try

Simple take the argument list pointer and pass it through to sscanf():


int CStringEx::Scanf( LPCTSTR format, … )
{
va_list argList;
va_start( argList, format );
return _stscanf( GetBuffer(0), format, argList );
}

That does not work, because you cannot pass variable argument lists like that.
There exists a function for text formating (vsprintf()) which supports that feature,
but there is no equivalent function for sscanf()!

Note: some compilers do have the function vsscanf() in their runtime library, but not the one Microsoft ships.

Second try

Find out what sscanf() is doing and reprogram it.

sscanf() is converting the string into a stream and then calls _input(). I reprogrammed that function to get the desired vsscanf() function:


int __cdecl vsscanf ( REG2 const char *string,
const char *format,
va_list arglist )
{
FILE str;
REG1 FILE *infile = &str;
REG2 int retval;

_ASSERTE(string != NULL);
_ASSERTE(format != NULL);

infile->_flag = _IOREAD|_IOSTRG|_IOMYBUF;
infile->_ptr = infile->_base = (char *) string;
infile->_cnt = strlen(string);

retval = (_input(infile,format,arglist));

return(retval);
}

This does not work either, because the _input() function is not exported in the DLL version of MFC (as far as I know it is present in the static version of the MFC library, but I haven’t tested that). As a result you get an unresolved symbol when linking the program.

Third try

Manually adjust the required arguments for function sscanf().

I found an interesting article from Emmanuel Mogenet in dejanews:


How to vfscanf in Win32
.

He implemented the fscanf() function in his own class by manually creating the required arguments for the fscanf() call. He did that by inserting the necessary arguments in the stack frame using assembly instructions.


static uint32_t savedESP;
static uint32_t savedEAX;
static uint32_t savedRET;
static uint32_t savedThis;

static void *truefscanf= (void*) fscanf;

class File
{
public:

int scanf( const char *, … );

private:
void *file;
};

__declspec(naked) int File::scanf( const char *, … )
{
_asm
{
mov dword ptr savedESP,esp // Save esp in static
mov eax, dword ptr [esp] // eax <- return address
mov dword ptr savedRET,eax // Save return address in static
mov eax,dword ptr [esp+4] // eax <- this
mov dword ptr savedThis,eax // Save this in static
add esp, 8 // esp+= 8 (crush retaddr,this)
mov eax,dword ptr [eax] // eax <- this->filePtr
push eax // Push filePtr on argStack
call truefscanf // Call regular fscanf
mov dword ptr savedEAX,eax // Save eax value in static
mov esp, dword ptr savedESP // Restore esp
mov eax, dword ptr savedRET // Restore return value
mov dword ptr [esp], eax
mov eax, dword ptr savedThis // Restore this
mov dword ptr [esp+4], eax
mov eax, dword ptr savedEAX // Restore eax
ret
}
}

I tried to get it to work for sscanf() stopping after half an hour for the following reasons:

  • I am no assembler freak.
  • I don’t like using code that I don’t understand in a core function.
  • It only works for SBCS.
  • It’s not thread safe.

Solution

Finally I came up with a solution which is not perfect, but works. The key idea is that all arguments passed to sscanf() must be pointers. This is important, because to retrieve an argument with va_arg() you have to know its type (va_arg() needs to know the size it occupies on the stack). With this knowledge it is possible to retrieve all arguments, store them locally and call the sscanf() function manually with the string stored in the CString derived class. The only problem left is to find out how many arguments were passed to the function.

The easiest approach is to pass the no. of arguments in the format specifier as an extra argument to the Scanf() function. The implementation then is straight forward. It also makes the function interface clearer, because this version of Scanf() returns FALSE if
sscanf() fails.


BOOL CStringEx::Scanf( int numArgs, LPCTSTR format, … )
{
int i;
int numScanned;
void *argPointer[CSTRINGEX_SCANF_MAX_ARGS];
va_list arglist;

if ( numArgs > CSTRINGEX_SCANF_MAX_ARGS )
{
// This function can only handle <CSTRINGEX_SCANF_MAX_ARGS>
//(10) arguments!

ASSERT( !TRUE );
return FALSE;
}

va_start( arglist, format );
{
// Get all arguments from stack
for ( i=0; i<numArgs; i++ )
{
argPointer[i] = va_arg( arglist, void * );
}
}
va_end( arglist );

// Call sscanf with correct no. of arguments
numScanned = sscanfWrapper( numArgs, format, argPointer );

return (numScanned == numArgs);
}

Don’t worry about the sscanfWrapper() function. It is just a wrapper which calls sscanf() with the correct no. of arguments. The following code fragment shows how it works:


int CStringEx::sscanfWrapper( int numArgs, LPCTSTR format, void **p )
{
switch ( numArgs )
{
case 0: return 0;
case 1: return _stscanf( m_pchData, format, ADD_ARGS_1 );

case 9: return _stscanf( m_pchData, format, ADD_ARGS_9 );
case 10: return _stscanf( m_pchData, format, ADD_ARGS_10 );
}

// When extending max. no. of arguments [CSTRINGEX_SCANF_MAX_ARGS],
// this function must be updated!
ASSERT( !TRUE );
return 0;
}

The macros ADD_ARGS_1 … ADD_ARGS_10 expand the variable arument list:


#define ADD_ARGS_1 p[0]
#define ADD_ARGS_2 p[0], p[1]

When the number of arguments is not known, we extract that information from the format string. Note that the algorithm I use is very basic and will not work for all possible cases. However, most of the time all you want to do is scan a simple value from the string.

The format specifier string is parsed in the following way:

  1. Two consecutive percent characters (“%%”) are ignored (they represent the percentage character).
  2. If an asterix follows immediatelly to a percentage character (“%*”), then the field is scanned, but not stored.
  3. In all other cases, when a percentage character appears, an argument from the stack is retrieved and stored.


int CStringEx::Scanf( LPCTSTR format, … )
{
int numArgs;
int numScanned;
void *argPointer[CSTRINGEX_SCANF_MAX_ARGS];
va_list arglist;
LPTSTR currBuff;
_TXCHAR currChar;

numArgs = 0;
currBuff = _tcschr( format, _TXCHAR( ‘%’ ) );
if ( currBuff == NULL )
{
// No valid format specifier!
ASSERT( !TRUE );
return 0;
}

va_start( arglist, format );
{
do
{
// Move pointer to next character
currBuff = _tcsinc( currBuff );
currChar = _TXCHAR( *currBuff );

if ( currChar == NULL )
{
// End of string
// -> processing will stop!
}
else if ( currChar == _TXCHAR( ‘*’ ) )
{
// “%*” suppresses argument assignment
// -> do not get argument from stack!
}
else if ( currChar == _TXCHAR( ‘%’ ) )
{
// “%%” substitutes “%” character!
// -> do not get argument from stack!
// -> Increment to next character

currBuff = _tcsinc( currBuff );
}
else
{
if ( numArgs >= CSTRINGEX_SCANF_MAX_ARGS )
{
// This function can only handle
// <CSTRINGEX_SCANF_MAX_ARGS> (10) arguments!
ASSERT( !TRUE );
return 0;
}

argPointer[numArgs++] = va_arg( arglist, void * );
}

currBuff = _tcschr( currBuff, _TXCHAR( ‘%’ ) );

} while ( currBuff != NULL );
}
va_end( arglist );

// Call sscanf with correct no. of arguments
numScanned = sscanfWrapper( numArgs, format, argPointer );

return numScanned;
}

Remarks

  • The maximum no of variable arguments the Scanf functions can handle is defined as a constant
    CSTRINGEX_SCANF_MAX_ARGS (currently 10). If this is a limitation for you, just increase the constant and adapt the sscanfWrapper() function. The reason I used the constant is that I do not have to allocate/free any memory.
  • The code should work with SBCS, MBCS and UNICODE. However I haven’t tested the UNICODE version.
  • When you move the mouse on a CString object while debugging, a popup window occurs
    showing the current value of the object. In order to have the same effect for a CStringEx object do the following:
    1. Open file ~CommonMSDev98Binautoexp.dat, where ~ is
      the path of your Developer Studio installation.
    2. Goto section [AutoExpand] and add the following line:

      CStringEx =<m_pchData,st>

Note 1: You have to restart developer studio after saving the changed file.

Note 2: This very handy file is described in detail in Ramon de Klein’s article
Tune the debugger using AutoExp.dat

Downloads

The source files contain only the above described funtions and cannot be used
as a replacement for CString. Personally, I use Zafir Anjum’s
CString Extension class which can be found at www.codeguru.com.
So if you want a complete working example, you should download his class and add my functions to it.

Download source – 2 Kb

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read