CString In A Nutshell

WEBINAR: On-demand webcast

How to Boost Database Development Productivity on Linux, Docker, and Kubernetes with Microsoft SQL Server 2017 REGISTER >

I've heard several misconceptions about the use of CStrings and thought it would be beneficial to some of you to clear these up. In this document I will describe how CString works and address 3 key misconceptions:
  • Passing CString by value is bad
  • Using CString causes memory fragmentation
  • CString is slow

Inside CString

The CString data type is 32 bits. Passing CString by value is no more bulky than passing an int by value. You can verify this with an assertion ASSERT(sizeof(CString) == 4);


class CString
{
   ...
   LPTSTR m_pchData;   // pointer to ref counted string data
};

This is the "header" structure of every string:


struct CStringData
{
   long nRefs;             // reference count
   int nDataLength;        // length of data
   int nAllocLength;       // length of allocation
   // TCHAR data[nAllocLength+1]
   TCHAR* data()           // TCHAR* to managed data
      { return (TCHAR*)(this+1); } // this+1 == ((void*)this)+12
};

Lets say you create a CString object like this:

CString str("hello");

First CString calls CString::AllocBuffer(5). This actually allocates 5 + 1 + 12 bytes (chunk + EOS + CStringData). nAllocLength will be set to 5 as will nDataLength. You might think that nDataLength should be 18, but since the extra 13 bytes are ALWAYS allocated, it's more efficient for CString to leave off those extra 13. In release builds, your strings are allocated in blocks of 64, 128, 256, or 512, this is where nDataLength comes in handy. In the case of our 5 character string, nDataLength would be 64. Using blocks reduces memory fragmentation and speeds up operations like adding. Reduction of memory fragmentation is achieved by the use of CFixedAlloc. This class never actually frees the memory allocated (until it is destroyed or explicitly told to), but returns free'd blocks to it's "free pool", so no memory fragmentation occurs. CFixedAlloc can be found in the MFC source directory in FIXEDALLOC.H and FIXEDALLOC.CPP if your curious. For strings larger than 512 characters, the memory is allocated and freed the same as in debug builds.

nRefs is set to 1

m_pchData is set like this: m_pchData = pData->data(); pData is the block of memory allocated by AllocBuffer and cast to CStringData. So what we get looks like this:


   1    5    5 h e l l o \0
---- ---- ---- - - - - - -   <-bytes
               ^m_pchData

Of course to free the block of memory, CString cannot free m_pchData, but instead frees (BYTE*)GetData(); GetData() returns ((CStringData*)m_pchData)-1. Remember that it's casting the pointer to a 12-byte structure and subtracting one structure from it (or 12 bytes).

Reference Counting

So how does reference counting help speed things up? Whenever you use the copy constructor or the operator=(const CString& stringSrc), the only thing that happens is this:


m_pchData = stringSrc.m_pchData
GetData()->nRefs++

If m_pchData had been == stringSrc.m_pchData, nothing at all happens.

So this bit of code is very fast:


void foo(CString strPassed)
{
}

CString str("Hello");

foo(str);

No string copy occurs, and no memory is allocated. A 32-bit value is pushed on the stack, that value is set (strPassed.m_pchData = str.m_pchData), and an integer is incremented (strPassed.GetData()->nRefs++). That's only one operation more than passing an int by value where: A 32-bit value is pushed on the stack, and that value is set. Now granted, it's definetly quite a few more assembly instructions, but that's why we have 500Mhz CPUs, so don't sweat cycles. When it comes to user interfaces, there's no reason to sweat CPU cycles, the computer is capable of executing billions of instructions in a time frame perceivable by a human. Obviously if your doing intensive graphics animation or massive quantities of data manipulation you might wanna look at your inner loops and optimize there.

The reason reference counts are kept is so that CString knows that it's "sharing" a string buffer with another CString object. If foo were to modify strPassed, CString would first allocate a new buffer and copy the string into that buffer (setting it's ref count to 1). Of course if foo never modifies strPassed, the allocation and copy never occur.

Empty Strings

An empty or uninitialized string m_pchData is set to _afxPchNil which looks like this:


  -1    0    0 \0 (EOS)
---- ---- ---- - (_afxInitData)
               ^_afxPchNil

Note that a -1 ref count means that the string is "locked" and so modifying and empty string always results in a new allocation.

Epilogue

Anyhow, that's CString in a nutshell. It's really a fun class to dig into. So if you've ever worried about passing CString objects all over the place, remember that your really essentially only passing a pointer around. It's quite efficient and if you have need to manage dynamic structured data, you might even consider this model.

Please note that this information is accurate as of VC++ 6.0. I've heard that not all of this is true for previous versions of MFC, but I have not personally verified this.



Comments

  • Just thought about this when I woke up this morning...

    Posted by Legacy on 08/05/2003 12:00am

    Originally posted by: Leonhardt Wille

    Hi there, nice article!

    I have just one point to add:
    The CString's equality operator is very uncomfortable...
    I think that NO (absolutely NO) professional programmer compares two strings without making them lower! Okay, for some purpose you MAY do so (cheap pwd protection etc.)...

    I just tried to override the CString provided by .NET, then I thought about downgrading to VC6 again... This ATL-**** sucks balls!

    Please, if anyone has a nice CMyString class, please email me... I just wanted to get rid of this idea :D

    regards
    leo

    Reply
  • then someone should be able to explain this

    Posted by Legacy on 05/08/2003 12:00am

    Originally posted by: majoob

    mk4vc60s_mfc.lib(CMk4DataContainer.obj) : error LNK2001: unresolved external symbol "char const * const _afxPchNil" (?_afxPchNil@@3PBDB)

    Reply
  • Allocation question

    Posted by Legacy on 12/17/2002 12:00am

    Originally posted by: Larry Trussell

    I have a question about heap usage with the CString class. The sample application is a debug build and the 64/128/... allocation buffers are not used. The buffer is allocated to the necessary size of the string. (BTW, the problem described below doesn't go away in the release build)

    I have a test application (non-unicode) with a class containing an array of 40 CStrings. I then allocate 10,000 of those classes. When I look at the heap usage before and after this operation, there is a consumption of 4 bytes / string. That is what is expected.

    Next, I set the value of each string to "A". Now my heap usage jumps up to 92 bytes / string! This exact same usage is seen for string values up to a 20 character string. At that point, usage jumps up to 108 bytes / string.

    If I go into the test application and change the array of 40 CStrings to an array of 40 char[32], I see exactly 32 bytes per string in all cases with the test application.

    Can you offer any suggestions as to why CStrings are consuming so much RAM?

    Thanks,
    LT

    Reply
  • Very good article. Was helpful to me in debugging situation.

    Posted by Legacy on 12/03/2002 12:00am

    Originally posted by: JoeBrennan

    I had a release-build library that yielded unresolved
    extern against my release task. No problem with debug-
    build equivalents.
    Since unresolved extern was _AfxPchNil I was able to
    search on that term, find your article, and infer that
    an uninitiliazed CString (which occurs only in the
    library) was not being linked successfully to the
    empty string Kahuna.
    Life being short, I simply intialize the string
    explicitly and problem gone. Thanks.

    Reply
  • Converting CString to int

    Posted by Legacy on 11/02/2001 12:00am

    Originally posted by: Milk

    Is there a way to Convert CStrings to Integers?

    Reply
  • Passing CString by reference is faster

    Posted by Legacy on 09/05/2001 12:00am

    Originally posted by: Brangdon

    Passing a CString by const reference will be faster, and take less code, than passing it by value.

    This is mainly because the reference-counting code is out-of-line. (It is also fairly complex, with various tests for special cases etc.) Think about it: when you call:
    void func( CString copy );

    the compiler starts by calling the string's copy constructor, which is like:
    CString::CString( const CString &rhs );

    which necessarily passes by reference. So every pass by value has a pass by reference included. So it will be slower.

    Reference counting means the difference isn't much. However, pass-by-const-reference is the norm for large objects. There is no reason to treat CString any differently to the norm. If you think pass-by-value is a worth-while optimisation, you're wrong: it's a pessimisation.

    Reply
  • How to convert float to a CString?

    Posted by Legacy on 08/07/2001 12:00am

    Originally posted by: Henry Park

    I'm trying to convert a float to a CString. I am using _fcvt(22.2, &decimal, &sign) to do the conversion. I plan to use the "decimal" value to place the "." in the proper string position.

    Reply
  • CString in a Nutshell

    Posted by Legacy on 07/11/2001 12:00am

    Originally posted by: praveen nimbagiri

    Excellent information about CString!
    Well done douglas..


    Reply
  • Slightly Confused

    Posted by Legacy on 07/06/2001 12:00am

    Originally posted by: Ben Slavin

    I was looking for information on the use of the CString class and how it works. I thought "CString in a Nutshell" to be a fitting title for a page with such information. How wrong I was.

    This page provides practically no information on how one would start using CString. It doesn't even mention the include calls which need to be made to use it! I was disappointed upon seeing this, and am saddened that I was so misguided. I'd like to see a REAL explaination of CString as opposed to someone trying to simply show the usefulness of it.

    Regards,
    --Ben

    Reply
  • How to convert CString to char

    Posted by Legacy on 06/15/2001 12:00am

    Originally posted by: RRemzie

    How can i convert a CString to a char?


    Greetz RRemzie

    Reply
  • Loading, Please Wait ...

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • As all sorts of data becomes available for storage, analysis and retrieval - so called 'Big Data' - there are potentially huge benefits, but equally huge challenges...
  • The agile organization needs knowledge to act on, quickly and effectively. Though many organizations are clamouring for "Big Data", not nearly as many know what to do with it...
  • Cloud-based integration solutions can be confusing. Adding to the confusion are the multiple ways IT departments can deliver such integration...

Most Popular Programming Stories

More for Developers

RSS Feeds

Thanks for your registration, follow us on our social networks to keep up-to-date