Building the Right Environment to Support AI, Machine Learning and Deep Learning
Typical applications contain lots of string operations, and MFC includes the CString class for precisely that purpose. Unfortunately, it suffers from major problems. Maybe the three most important are:
- CStrings cannot be extended - their header file is buried within MFC
- CStrings are slow. Catenating a simple value requires copying the string into a new buffer.
- CStrings internally call malloc/free so often that memory becomes very fragmented, and your application incurs a major performance hit.
- Reference counting (the ability to quickly assign one CStr to another without copying the characters) was first implemented in the MFC library accompanying Visual C++ 5. Besides, it's not that efficient.
This article describes a class named CStr, which in many respects is similar to CString -- and in most cases can be used interchangeably. However, the class improves much in the following areas:
- The definition and implementation are open - you can easily edit its header file to include much-needed facilities.
- The class is compatible both with MFC and with simple Win32-based applications.
- The class includes much better method for reference counting. It also supports a buffer larger than the number of characters in the string, so catenation (and assignment of longer strings) becomes a super-fast process.
- The class caches data blocks of commonly used sizes (typically 4, 8, 12, etc - up to 320, this is configurable). When your program destroys a string object, the data is not returned to the memory manager, but is kept in a cache pool. The next time CStr needs a block of that size (and that happens very often), it gets the block very quickly. And memory fragmentation is severly reduced in that way.
CStr supports most of the features of CString. The following snippet from CStr.h shows some of the more important features and friend functions:
// Construction, copying, assignment
CStr(const CStr& source);
CStr(const char* s, CPOS prealloc = 0);
void operator=(const CStr& source);
void operator=(const char* s);
CStr(const CString& source, CPOS prealloc = 0);
void operator=(const CString& source);
// Get attributes, get data, compare
BOOL IsEmpty() const;
CPOS GetLength() const;
operator const char* () const;
const char* GetString() const; // Same as above
char GetFirstChar() const;
char GetLastChar() const;
char operator(CPOS idx) const;
char GetAt(CPOS idx) const; // Same as above
void GetLeft (CPOS chars, CStr& result);
void GetRight (CPOS chars, CStr& result);
void GetMiddle (CPOS start, CPOS chars, CStr& result);
int Find (char ch, CPOS startat = 0) const;
int ReverseFind (char ch, CPOS startat = (CPOS) -1) const;
int Compare (const char* match) const; // -1, 0 or 1
int CompareNoCase (const char* match) const; // -1, 0 or 1
// Operators == and != are also predefined
// Global modifications
void Empty(); // Sets length to 0, but keeps buffer around
void Reset(); // This also releases the buffer
void GrowTo(CPOS size);
void Compact(CPOS only_above = 0);
static void CompactFree();
void Format(const char* fmt, ...);
void FormatRes(UINT resid, ...);
BOOL LoadString(UINT resid);
// Catenation, truncation
void operator += (const CStr& obj);
void operator += (const char* s);
void AddString(const CStr& obj); // Same as +=
void AddString(const char* s); // Same as +=
void AddChar(char ch);
void AddChars(const char* s, CPOS startat, CPOS howmany);
void AddStringAtLeft(const CStr& obj);
void AddStringAtLeft(const char* s);
void AddInt(int value);
void AddDouble(double value, UINT after_dot);
void RemoveLeft(CPOS count);
void RemoveMiddle(CPOS start, CPOS count);
void RemoveRight(CPOS count);
void TruncateAt(CPOS idx);
friend CStr operator+(const CStr& s1, const CStr& s2);
friend CStr operator+(const CStr& s, const char* lpsz);
friend CStr operator+(const char* lpsz, const CStr& s);
// Window operations and other utilities
void GetWindowText (CWnd* wnd);
// Miscellaneous implementation methods
// These may be reimplemented by the user
static void ThrowIfNull(void* p);
static void ThrowPgmError();
static void ThrowNoUnicode();
static void ThrowBadIndex();
static void ThrowTooLarge();
BOOL operator ==(const CStr& s1, const CStr& s2);
BOOL operator ==(const CStr& s1, LPCTSTR s2);
BOOL operator ==(LPCTSTR s1, const CStr& s2);
BOOL operator !=(const CStr& s1, const CStr& s2);
BOOL operator !=(const CStr& s1, LPCTSTR s2);
BOOL operator !=(LPCTSTR s1, const CStr& s2);
Tech note: the CPOS type and length limitations
Normally, CStr supports strings with up to 65500 characters. This increases the speed a bit, and saves 4 bytes per string. In some cases you might need to work with very large strings. To do this, define the symbol CSTR_LARGE_STRINGS before including CStr.h in your project
CPOS is a custom type identifying either the character length of a string, or a character position. It is defined as either a 16-bit WORD (with normal strings) or 32-bit UINT (when supporting strings with up to 2^32 characters)
If you work at compiler warning level 3, you will be able to freely mix UINT with CPOS. If you work at level 4, you may need to typecast, or use the CPOS type to prevent warnings.
Tech note: using CStr in place of LPCSTR (const char*)
Like CString, the class described here also has a predefined operator typecast to LPCSTR. This is why you can use CStr where LPCSTR is expected. You can even use CStr in functions declared as requiring CString, but this is not very efficient, since the compiler will generate a temporary CString instance for you.
Tech note: managing buffer length
One of the most important advantages of CStr is that it allows you to specify the length of the buffer that will hold the string. If you anticipate a string will soon grow to 80 bytes, you can request a buffer of that size, even if its initial content is only 7 bytes long. This saves a huge amount of reallocation and copy operations if you add to the string later.
To specify a larger buffer when constructing the string, use the following definitions:
// No preallocation
CStr(const char* s, CPOS prealloc = 0); // Buffer chars as second param
CStr(CPOS prealloc); // Buffer chars as only param
To increase the buffer size for an existing string, call
void GrowTo(CPOS size); // If the buffer is smaller, increases it
Attempting to grow to buffer to a value larger than what's currently allocated is harmless.
At some point (particularly if you store many strings in memory) you may decided that a given string won't be changed, and its originally allocated buffer could be too large and could waste memory. On this occasion you may call
void Compact(CPOS only_above = 0)
Passing only_above=4, for example, means "reallocate and copy to smaller buffer only if 4 or more bytes would be saved".
It is important to know that the buffers for freed strings are not really deallocated. Thus, at certain points in your program (for example, after a large memory-consuming operation) you may wish to invoke a manual "garbage collection" that will return all pooled memory to the memory manager. To do this, call
CStr::CompactFree() // static method
You should always call this method in the ExitInstance() method of your CWinApp class; otherwise, MFC will complain about memory leaks.
Tech note: using Format and FormatRes
CStr::Format and FormatRes are sprintf-like functions. The only difference is that the first takes a pointer to a const character string describing the format parameters (like sprintf does), while the second loads the string from the resource table.
Format specifiers can be looked up in your C++ RTL documentation under "printf"
Tech note: using Empty() or Reset()
Note that when you assign a number of characters to a CStr object, its buffer may be icreased if necessary, but it will not be decreased. This is valid even if you call Empty() - this leaves the string with zero length, but the allocated buffer stays intact for further use.
If you know that you will not assign to this empty string for a long time, it is better to call Reset() instead of Empty(). This will not only set the length to 0 characters, but also deallocate (or rather, return to the cache pool) the string buffer. This is especially important if you reset a large string (say, 512 bytes or more)
Note the presence of the CSTR_DITEMS constant in CStr.h This constant identifies the maximum string for which the "cached buffers" mechanisms will be in effect. Strings larger than this size are always passed to malloc/free. This, if you load a 500-kilobyte text file in CStr, you need not worry that the memory will not be released when you destroy the object.
Tech note: using the string support in single-threaded applications
The supplied class is designed to be completely safe in a multithreaded application, and uses some critical sections to achieve this. If you have a single-threaded app, or you are sure to use CStr from only one thread (and I really mean sure!) you can define the symbol CSTR_NOT_MT_SAFE.
This will omit any references to cirtical sections, and may speed your string operations between 5% (if you manage the string data itself) and 30% (if you do a lot of string assignments and reassignments)
How to use CStr
Remember that CStr is NOT compatible with UNICODE yet (if enough interest gathers, I will make a UNICODE version). When including the class in an MFC project, I suggest that you put the following references:
In stdafx.h: #include "CStr.h" // This, in turn, includes cstrimp.h
In stdafx.cpp: #include "CStrMgr.cpp" // Put this include after everything else
Thus, the string support headers will be precompiled, and you won't need to include them everywhere.
Note that CStrMgr.cpp (the implementation file) is designed to be included in another CPP, not added to the project. If you don't like this, just insert a #include "stdafx.h" in its beginning, and add it to the project file.
There are some conditional symbols you may wish to define right before including CStr.h
- #define CSTR_LARGE_STRINGS: normally, strings can hold up to 65500 characters. Define this conditional to increase the range to 2^32 characters. This incurs a 4 byte penalty per object, and probably some small speed hit.
- #define CSTR_OWN_ERRORS: you will probably want to define this symbol in larger applications. If you do, you will have to implement a couple of methods and functions that handle critical situations, such as out-of-memory conditions and program errors (e.g. out-of-bounds character reference)
- If you do NOT use MFC, you will have to provide a body for the get_stringres() function. It should just return an instance handle so that CStr knows where to load string resources from. The sample application shows an implementation of this function.
- #define CSTR_NOT_MT_SAFE: If you have a single-threaded application, or are completely sure that only one of your threads will use the string support subsystem, this will improve speed significantly. Be very careful - defining this and using CStr from multiple threads might cause very hard-to-detect errors in your application!
Have fun using CStr! Any comments are welcome. Also, I will be glad to add features provided that they seem to be useful to at least 3-4 people, and they do not take too much time. Write to me at email@example.com
Download source - 11 KB Updated October 17, 1998