Function Static Variables in Multi-Threaded Environments

The instantiation process of function static variables isn't necessarily what you expect it to be. Raymond Chen posted a nice bit on this some time back (http://blogs.msdn.com/oldnewthing/archive/2004/03/08/85901.aspx). If you don't feel like reading it, I'll give a quick sum-up after the jump.

Consider the following function:

void foo()
{
   static int x = calcSomething();
}

It seems simple enough, and it is. The static variable will be initialized once, based on the result of the calcSomething function. With non-volatile constant values, the compiler can optimize the generated code to use the memory address of the value. In this case, where a function is called, a function you know nothing about might I add, it doesn't necessarily have that luxury. Looking at the generated assembly code, you'll see something like this:

mov          eax,1
test         byte ptr [$S1],al
jne          foo+1Dh
or           dword ptr [$S1],eax
call         calcSomething
mov          dword ptr [x)],eax

Loosely translated to pseudo C++, this will be:

void foo()
{
   static bool x_set = false;
   static int x;
   if(!x_set)
   {
      x_set = true;
      x = calcSomething();
   }
}

As you can see, there's no interlocking code here. This essentially means that the function will be anything but thread safe. One thread may reach, but not execute, x_set = true, only to be swapped out in favor of another thread that does the same. The result would be that calcSomething is executed two or more times—which is likely to be a bad thing.

That's it for the recap. Now, if you'd like to fix this problem, what comes to mind? Interlocking, obviously. One very simple way to do this interlocking is to use the InterlockedCompareExchange function, which is guaranteed to be a synchronized operation, such as this:

void foo()
{
   volatile static long initialized = 0;
   static int x;

   if(InterlockedCompareExchange(&initialized, 1, 0) == 0)
   {
      x = calcSomething();
   }

   // ... Do whatever with x
}

The interlocked intrinsic takes three parameters: destination, exchange and, comperand. If the value pointed to by destination matches that of comperand, exchange will be put in destination; otherwise, nothing will happen. This will assure that no two threads or calls will have the same value of 0 returned. That, of course, means that as long as initialized is never reset to 0, calcSomething will be called only once. And, that's what you wanted, right?

Here's the disassembly:

push         offset initialized
call         dword ptr [__imp__InterlockedCompareExchange@12]
cmp          eax,0
jne          foo+1Ah
call         calcSomething
mov          dword ptr [x],eax

First, the return value of InterlockedCompareExchange is compared to 0. If it's equal, which means that this was the first run, calcSomething will be called, and x will be initialized to its return value.

A quick digression: Why does InterlockedCompareExchange work the way it does? It's all very simple. Consider the following disassembly:

mov          ecx,dword ptr [esp+4]  
mov          edx,dword ptr [esp+8]  
mov          eax,dword ptr [esp+0Ch]
lock cmpxchg dword ptr [ecx],edx   
ret          0Ch                    

The interesting part here is the cmpxchg lock. The cmpxchg operation exchanges the content of the destination (whatever ecx points to) with that of the source (edx), if and only if eax is equal to the destination value. No matter what happens, the initial value of the destination is returned to the caller through eax.

When this function is run in the example above, ecx will point to the static variable called initialized. In the first run, initialized will be 0. As cmpxchg is being executed, ecx points to initialized's 0, edx holds the exchange value of 1 and eax is the comperand, 0. Because the destination value and eax match up, the requirement for cmpxchg to act is met, and the 1 in edx will be placed in initialized. The eax registry will be set to the original destination value of 0, and the function returns. If you look back at the previous disassembly, namely cmp eax, 0, you will see that this evaluates to true for the first run—and so calcSomething will be called. For the second InterlockedCompareExchange, ecx will point to 1. When this is compared with, and found not to match, the 0 in eax; the exchange will not take place—and the original value of 1 will be returned.

In a single processor or core environment, the cmpxchg alone would be enough to do a "synchronized" increment. Only one instruction is needed to preform the compare and exchange, as well as make a snapshot. If one thread manages to execute the cmpxchg, then be swapped out, and another thread also does the cmpxchg, the value ecx points to will be only be set once, the first thread will hold a value of 0 in its eax, and the second thread will have a value of 1. Under no circumstances can the two threads end up with the same value in eax, and that's why you want to make sure that calcSomething is called only once.

In a multi-processor environment, you need the lock prefix to the cmpxchg above. This statement makes the memory access synchronized as well. Without the lock, two processors, or cores, may very well execute the cmpxchg at the exact same time, which could possibly lead to two threads with the same eax. With the lock in place, the two+ processors or cores may not access the memory at the same time. Only one thread will be allowed to preform the cmpxchg instruction for the given piece of destination memory (such as the initialized variable), at any one time. Thus, you are assured that eax will have expected values for each call to InterlockedCompareExchange. And again, that's what you want.

As commenter DaMagic's keen eye noticed, what's been discussed thus far isn't the whole truth. Yes, you're guaranteed that calcSomething is only called once, as was the initial goal. What isn't covered is the fact that in the example solution, a thread may find itself passing the if-block, with x still being calculated by another thread. In that case, x may be used without being fully initialized.

Given this additional requirement, you need to expand the locking to wait for an indication that the calculation is completed. While we're at it, the calculation will be replaced by a critical section, which gives further flexibility in successive runs.

void foo()
{
   volatile static long initialized = 0;
   static CRITICAL_SECTION cs;
   static int x;

   if(initialized != 2)
   {
      if(InterlockedCompareExchange(&initialized, 1, 0) == 0)
      {
         InitializeCriticalSection(&cs);
         InterlockedCompareExchange(&initialized, 2, 1);
      }
      else
      {
         while(initialized != 2)
         {
            Sleep(5);
            SwitchToThread();
         }
      }
   }

   EnterCriticalSection(&cs);
   // ... Do synchronized operations
   LeaveCriticalSection(&cs);

   // ... Do unsynchronized operations, if wanted.
   // At this point x is guaranteed to be initialized.
}

This time around, there's an outer if block, to minimize the impact on runs which occur after the initialization has been completed. The overhead of post-init runs will be a mere three instructions, which I'm sure you can live with. If initialized happens to be something other than 2, you know that the initialization is either being done, or hasn't begun at all. Like the previous example, InterlockedCompareExchange is used to assure that only one thread steps into the block that does the initial processing on the static variable (in this case the critical section object "cs"). Unlike the previous example, you deal with the threads that reach the inner initialization check, but aren't needed to do the actual initialization; and that's being done in a very simple way.

As long as initialized isn't 2, waiting threads will briefly sleep, and then pass control on to other threads in the system (by using SwitchToThread). As soon as the initialization is completed, the loop will end, and you're free to do as you please with the critical section for the remainder of the function body; for example, to deal with your old x and calcSomething.

As you may observe, the full-blown version of the synchronized initialization is somewhat of a bloat. It's by no means convenient to go through this procedure for each function that contains non-constant static variable initialization (such as calcSomething). In a real-world case, it would be a whole lot better to stick with externally initialized critical sections or mutexes, and perhaps leave the statics out completely. The point of this text, however, was to show a couple of the potential troubles if you actually should choose to use local static variables. Consider yourself warned.



About the Author

Einar Otto Stangvik

My name is Einar Otto Stangvik, and I'm a programmer based in Oslo, Norway. I mainly develop applications and software architectures targetting C++ on the Windows platform, but I have also got experience doing the same on Unix and Linux. Lately, I've looked to C# for some projects, but native C++ is still my main focus. See my site, http://www.indev.no, for more information. My code blog can be found at http://einaros.blogspot.com.

Comments

  • Cheap Oakley Monster Dog free shipping

    Posted by yoyqbvnya on 06/27/2013 04:40pm

    Discount Oakley Sunglasses ,The combination of the very most comprehensive eye care products on the planet, Oakley sunglasses sale, some sort of free of the oldest and quite a few well-known and many respected healthcare brands. Polarized sunglasses are actually welcomed by a growing number of consumers opt for a polarized sunglasses may possess a better, clearer, healthier perspective. Oakleys Clearance ,If you need to know, not in the slightest using the white ray ban sunglasses, with out compensation with this task - that is completely outside attack, as well as the inclusion of gay and absolute achievement, see more flattering! Oakley sunglasses sold in this market of every age group and also a selection of colors, shapes and fashions. Regardless if you are which kind of person, and a two of sunglasses, it truly is available for you. Cheap Oakley Dispatch ,Generally in the lens design of Oakley sunglasses sale, or a greater various colors in the fashion choice, more high-level, there are plenty of female friends prefer to wear accustomed to match the design of Oakley sunglasses. Sunglasses with numerous sorts of different types of sunglasses about the protective effect of ultraviolet rays, glare, glare is not the same. If you actually want to act as the sun's rays, then the most suitable option is black or dark green and shiny. Green Oakley sunglasses reduce visible light entering the attention, especially green level, but absorb heat. Oakley, oftentimes, the cisco kid of saws, such as training or saw, to avoid unhealthy for the eye area from speeding debris. Oakley Penny Sunglasses strength to weight ratio is amazing, the opinion is surprisingly light. You can find three suitable to hold the lens precise optical alignment of comfort as well as. Various forms of medical researchers of Oakley shades all of the challenge with vision goggles can be purchased. There is an reasons behind 4 Select Oakley shadow in the sun. Oakley glasses, the innovation has become brought numerous new service technologies, and constantly challenged the typical wisdom. Security: Medical experts advise along the ought to wear Oakley sunglasses, in order to protect the populace across the UV radiation through your vision. Woman much they plan beforehand, looking forward to their handbags and wallet-year-old along with the use of discount Oakley sunglasses Oakley sunglasses include the most beneficial, in truth, I have tried personally the awning. Copy cannot be compared Oakley high standards, and will not be seriously considered should you be looking for just a good fresh set of two sunglasses!

    Reply
  • TPEpI Zmp Vfdc

    Posted by APhkdgcuhk on 11/15/2012 06:21pm

    buy soma online cheap soma online no rx - soma hair drug test

    Reply
  • Help to check my singleton mechanism

    Posted by coldstar on 05/21/2011 09:31pm

    Hi,
    I use local static variable to implement my singleton pattern. But I want it to have a thread-safe constructor. Following is my implementation. Could you help me to check if there is any problem?
    
    /*! Singleton is a class that initializes its member in constructor.
     */
    class Singleton
    {
        private:
            int     m_a;
            int     m_b;
            
        public:
            Singleton() : m_a(5), m_b(6) {}
            void doSomething() {...}
    };
    
    /*! Returns local static object
     */
    Singleton & GetSingleton()
    {
        static Singleton    s;
        return s;
    }
    
    Singleton & OneTimeInitialize()
    {
        volatile static long    initialized = 0;
        
        if (initialized != 2)
        {
            if(InterlockedCompareExchange(&initialized, 1, 0) == 0)
            {
                // I call GetSingleton() first time here for initialization.
                GetSingleton();
                /* I didn't call InterlockedCompareExchange(),
                   because I don't know why we need to do comparing here.
                */
                InterlockedExchange(&initialized, 2);
            }
            else
            {
                while (initialized != 2)
                {
                    SwitchToThread();
                }
            }
        }
        
        // Finally, we can safely call GetSingleton() again and return the initialized singleton object.
        return GetSingleton();
    }
    
    void main()
    {
        // call OneTimeInitialize() to get my singleton object.
        OneTimeInitialize().doSomething();
    }

    Reply
  • Externally initialized mutexes

    Posted by SyRenity on 02/13/2007 07:26am

    Hi. In the end of the article you mention that mutex'es should be initialized externally. Is there a particular reason that a static class member mutex can't be used? Regards, Stas.

    • Re: Externally initialized mutexes

      Posted by einaros on 02/13/2007 07:43am

      Poor choice of words on my part, I fear. The point I was trying to make was that function static mutexes should be avoided entirely. Class static mutex objects are perfectly safe, granted that the class, and indirectly the mutex, isn't referenced by the contstructor of another static class (in which case you'd be at the mercy of the static initialization order).

      Reply
    Reply
  • A lighter weight alternative

    Posted by hyperbaric on 12/16/2006 06:18pm

    Each thread will have to wait at most once for the static to be intialised - except the first thread that gets to do the work.
    
    So why not extent the use of initialised and combine it with a spin lock?
    
    void foo()
    {
        volatile static long initialised = 0;
        static int x;
        while( 2 != ::InterlockedCompareExchange( &initialised ,2 ,2))
        {
            if( 0 == ::InterlockedCompareExchangeAcquire( &initialised, 1, 0))
            {
                x = calcSomething();
                ::InterlockedCompareExchangeRelease( &initialised, 2, 1);
            }
            else
            {
                Sleep(0);     //Spin - here
            }
        }
    }

    Reply
  • But x is still uninitialized for thread 2.

    Posted by DaMagic on 11/30/2006 12:57pm

    While the 1st thread is within the function "calcSomeThing()" which computes very slowly, the 2nd thread increments "initialized" to 2 and skips the if-block, sets "initialized" back to 1 after and continues within "foo()" with an uninitialized value of "x". So I think it's not the problem that "calcSomething()" is called multiple times. More problem is that one of the two threads works with an uninitialized value x.

    • RE: But x is still uninitialized for thread 2.

      Posted by einaros on 12/05/2006 08:18am

      The update was posted on November 30th, but there's still nothing here. Hopefully something will happen soon.

      Reply
    • RE: But x is still uninitialized for thread 2.

      Posted by einaros on 11/30/2006 03:21pm

      Yes, you're absolutely right. I wasn't going to take the article to full interlocking, as that really should be done by use of critical sections, mutexes or similar. Now that you've mentioned it, however, I've submitted an update with an added implementation. Feel free to check back for that :) Thanks for taking the time to comment!

      Reply
    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • The explosion in mobile devices and applications has generated a great deal of interest in APIs. Today's businesses are under increased pressure to make it easy to build apps, supply tools to help developers work more quickly, and deploy operational analytics so they can track users, developers, application performance, and more. Apigee Edge provides comprehensive API delivery tools and both operational and business-level analytics in an integrated platform. It is available as on-premise software or through …

  • As mobile devices have pushed their way into the enterprise, they have brought cloud apps along with them. This app explosion means account passwords are multiplying, which exposes corporate data and leads to help desk calls from frustrated users. This paper will discover how IT can improve user productivity, gain visibility and control over SaaS and mobile apps, and stop password sprawl. Download this white paper to learn: How you can leverage your existing AD to manage app access. Key capabilities to …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds