CString In A Nutshell
- Passing CString by value is bad
- Using CString causes memory fragmentation
- CString is slow
Inside CString
The CString data type is 32 bits. Passing CString by value is no more bulky than passing an int by value. You can verify this with an assertion ASSERT(sizeof(CString) == 4);
class CString { ... LPTSTR m_pchData; // pointer to ref counted string data };
This is the "header" structure of every string:
struct CStringData { long nRefs; // reference count int nDataLength; // length of data int nAllocLength; // length of allocation // TCHAR data[nAllocLength+1] TCHAR* data() // TCHAR* to managed data { return (TCHAR*)(this+1); } // this+1 == ((void*)this)+12 };
CString str("hello");
First CString calls CString::AllocBuffer(5). This actually allocates 5 + 1 + 12 bytes (chunk + EOS + CStringData). nAllocLength will be set to 5 as will nDataLength. You might think that nDataLength should be 18, but since the extra 13 bytes are ALWAYS allocated, it's more efficient for CString to leave off those extra 13. In release builds, your strings are allocated in blocks of 64, 128, 256, or 512, this is where nDataLength comes in handy. In the case of our 5 character string, nDataLength would be 64. Using blocks reduces memory fragmentation and speeds up operations like adding. Reduction of memory fragmentation is achieved by the use of CFixedAlloc. This class never actually frees the memory allocated (until it is destroyed or explicitly told to), but returns free'd blocks to it's "free pool", so no memory fragmentation occurs. CFixedAlloc can be found in the MFC source directory in FIXEDALLOC.H and FIXEDALLOC.CPP if your curious. For strings larger than 512 characters, the memory is allocated and freed the same as in debug builds.
nRefs is set to 1
m_pchData is set like this: m_pchData = pData->data(); pData is the block of memory allocated by AllocBuffer and cast to CStringData. So what we get looks like this:
1 5 5 h e l l o \0 ---- ---- ---- - - - - - - <-bytes ^m_pchData
Of course to free the block of memory, CString cannot free m_pchData, but instead frees (BYTE*)GetData(); GetData() returns ((CStringData*)m_pchData)-1. Remember that it's casting the pointer to a 12-byte structure and subtracting one structure from it (or 12 bytes).
Reference Counting
So how does reference counting help speed things up? Whenever you use the copy constructor or the operator=(const CString& stringSrc), the only thing that happens is this:
m_pchData = stringSrc.m_pchData GetData()->nRefs++
If m_pchData had been == stringSrc.m_pchData, nothing at all happens.
So this bit of code is very fast:
void foo(CString strPassed) { } CString str("Hello"); foo(str);
No string copy occurs, and no memory is allocated. A 32-bit value is pushed on the stack, that value is set (strPassed.m_pchData = str.m_pchData), and an integer is incremented (strPassed.GetData()->nRefs++). That's only one operation more than passing an int by value where: A 32-bit value is pushed on the stack, and that value is set. Now granted, it's definetly quite a few more assembly instructions, but that's why we have 500Mhz CPUs, so don't sweat cycles. When it comes to user interfaces, there's no reason to sweat CPU cycles, the computer is capable of executing billions of instructions in a time frame perceivable by a human. Obviously if your doing intensive graphics animation or massive quantities of data manipulation you might wanna look at your inner loops and optimize there.
The reason reference counts are kept is so that CString knows that it's "sharing" a string buffer with another CString object. If foo were to modify strPassed, CString would first allocate a new buffer and copy the string into that buffer (setting it's ref count to 1). Of course if foo never modifies strPassed, the allocation and copy never occur.
Empty Strings
An empty or uninitialized string m_pchData is set to _afxPchNil which looks like this:
-1 0 0 \0 (EOS) ---- ---- ---- - (_afxInitData) ^_afxPchNil
Note that a -1 ref count means that the string is "locked" and so modifying and empty string always results in a new allocation.
Epilogue
Anyhow, that's CString in a nutshell. It's really a fun class to dig into. So if you've ever worried about passing CString objects all over the place, remember that your really essentially only passing a pointer around. It's quite efficient and if you have need to manage dynamic structured data, you might even consider this model.
Please note that this information is accurate as of VC++ 6.0. I've heard that not all of this is true for previous versions of MFC, but I have not personally verified this.
Comments
Just thought about this when I woke up this morning...
Posted by Legacy on 08/05/2003 12:00amOriginally posted by: Leonhardt Wille
Hi there, nice article!
I have just one point to add:
The CString's equality operator is very uncomfortable...
I think that NO (absolutely NO) professional programmer compares two strings without making them lower! Okay, for some purpose you MAY do so (cheap pwd protection etc.)...
I just tried to override the CString provided by .NET, then I thought about downgrading to VC6 again... This ATL-**** sucks balls!
Please, if anyone has a nice CMyString class, please email me... I just wanted to get rid of this idea :D
regards
Replyleo
then someone should be able to explain this
Posted by Legacy on 05/08/2003 12:00amOriginally posted by: majoob
mk4vc60s_mfc.lib(CMk4DataContainer.obj) : error LNK2001: unresolved external symbol "char const * const _afxPchNil" (?_afxPchNil@@3PBDB)
ReplyAllocation question
Posted by Legacy on 12/17/2002 12:00amOriginally posted by: Larry Trussell
I have a question about heap usage with the CString class. The sample application is a debug build and the 64/128/... allocation buffers are not used. The buffer is allocated to the necessary size of the string. (BTW, the problem described below doesn't go away in the release build)
I have a test application (non-unicode) with a class containing an array of 40 CStrings. I then allocate 10,000 of those classes. When I look at the heap usage before and after this operation, there is a consumption of 4 bytes / string. That is what is expected.
Next, I set the value of each string to "A". Now my heap usage jumps up to 92 bytes / string! This exact same usage is seen for string values up to a 20 character string. At that point, usage jumps up to 108 bytes / string.
If I go into the test application and change the array of 40 CStrings to an array of 40 char[32], I see exactly 32 bytes per string in all cases with the test application.
Can you offer any suggestions as to why CStrings are consuming so much RAM?
Thanks,
LT
Very good article. Was helpful to me in debugging situation.
Posted by Legacy on 12/03/2002 12:00amOriginally posted by: JoeBrennan
I had a release-build library that yielded unresolved
Replyextern against my release task. No problem with debug-
build equivalents.
Since unresolved extern was _AfxPchNil I was able to
search on that term, find your article, and infer that
an uninitiliazed CString (which occurs only in the
library) was not being linked successfully to the
empty string Kahuna.
Life being short, I simply intialize the string
explicitly and problem gone. Thanks.
Converting CString to int
Posted by Legacy on 11/02/2001 12:00amOriginally posted by: Milk
Is there a way to Convert CStrings to Integers?
ReplyPassing CString by reference is faster
Posted by Legacy on 09/05/2001 12:00amOriginally posted by: Brangdon
Passing a CString by const reference will be faster, and take less code, than passing it by value.
This is mainly because the reference-counting code is out-of-line. (It is also fairly complex, with various tests for special cases etc.) Think about it: when you call:
void func( CString copy );
the compiler starts by calling the string's copy constructor, which is like:
CString::CString( const CString &rhs );
which necessarily passes by reference. So every pass by value has a pass by reference included. So it will be slower.
Reference counting means the difference isn't much. However, pass-by-const-reference is the norm for large objects. There is no reason to treat CString any differently to the norm. If you think pass-by-value is a worth-while optimisation, you're wrong: it's a pessimisation.
Reply
How to convert float to a CString?
Posted by Legacy on 08/07/2001 12:00amOriginally posted by: Henry Park
I'm trying to convert a float to a CString. I am using _fcvt(22.2, &decimal, &sign) to do the conversion. I plan to use the "decimal" value to place the "." in the proper string position.
ReplyCString in a Nutshell
Posted by Legacy on 07/11/2001 12:00amOriginally posted by: praveen nimbagiri
Excellent information about CString!
Well done douglas..
Slightly Confused
Posted by Legacy on 07/06/2001 12:00amOriginally posted by: Ben Slavin
I was looking for information on the use of the CString class and how it works. I thought "CString in a Nutshell" to be a fitting title for a page with such information. How wrong I was.
This page provides practically no information on how one would start using CString. It doesn't even mention the include calls which need to be made to use it! I was disappointed upon seeing this, and am saddened that I was so misguided. I'd like to see a REAL explaination of CString as opposed to someone trying to simply show the usefulness of it.
Regards,
Reply--Ben
How to convert CString to char
Posted by Legacy on 06/15/2001 12:00amOriginally posted by: RRemzie
How can i convert a CString to a char?
Greetz RRemzie
Loading, Please Wait ...