Function Static Variables in Multi-Threaded Environments

The instantiation process of function static variables isn’t necessarily what you expect it to be. Raymond Chen posted a nice bit on this some time back (http://blogs.msdn.com/oldnewthing/archive/2004/03/08/85901.aspx). If you don’t feel like reading it, I’ll give a quick sum-up after the jump.

Consider the following function:

void foo()
{
   static int x = calcSomething();
}

It seems simple enough, and it is. The static variable will be initialized once, based on the result of the calcSomething function. With non-volatile constant values, the compiler can optimize the generated code to use the memory address of the value. In this case, where a function is called, a function you know nothing about might I add, it doesn’t necessarily have that luxury. Looking at the generated assembly code, you’ll see something like this:

mov          eax,1
test         byte ptr [$S1],al
jne          foo+1Dh
or           dword ptr [$S1],eax
call         calcSomething
mov          dword ptr [x)],eax

Loosely translated to pseudo C++, this will be:

void foo()
{
   static bool x_set = false;
   static int x;
   if(!x_set)
   {
      x_set = true;
      x = calcSomething();
   }
}

As you can see, there’s no interlocking code here. This essentially means that the function will be anything but thread safe. One thread may reach, but not execute, x_set = true, only to be swapped out in favor of another thread that does the same. The result would be that calcSomething is executed two or more times—which is likely to be a bad thing.

That’s it for the recap. Now, if you’d like to fix this problem, what comes to mind? Interlocking, obviously. One very simple way to do this interlocking is to use the InterlockedCompareExchange function, which is guaranteed to be a synchronized operation, such as this:

void foo()
{
   volatile static long initialized = 0;
   static int x;

   if(InterlockedCompareExchange(&initialized, 1, 0) == 0)
   {
      x = calcSomething();
   }

   // ... Do whatever with x
}

The interlocked intrinsic takes three parameters: destination, exchange and, comperand. If the value pointed to by destination matches that of comperand, exchange will be put in destination; otherwise, nothing will happen. This will assure that no two threads or calls will have the same value of 0 returned. That, of course, means that as long as initialized is never reset to 0, calcSomething will be called only once. And, that’s what you wanted, right?

Here’s the disassembly:

push         offset initialized
call         dword ptr [__imp__InterlockedCompareExchange@12]
cmp          eax,0
jne          foo+1Ah
call         calcSomething
mov          dword ptr [x],eax

First, the return value of InterlockedCompareExchange is compared to 0. If it’s equal, which means that this was the first run, calcSomething will be called, and x will be initialized to its return value.

A quick digression: Why does InterlockedCompareExchange work the way it does? It’s all very simple. Consider the following disassembly:

mov          ecx,dword ptr [esp+4]
mov          edx,dword ptr [esp+8]
mov          eax,dword ptr [esp+0Ch]
lock cmpxchg dword ptr [ecx],edx
ret          0Ch

The interesting part here is the cmpxchg lock. The cmpxchg operation exchanges the content of the destination (whatever ecx points to) with that of the source (edx), if and only if eax is equal to the destination value. No matter what happens, the initial value of the destination is returned to the caller through eax.

When this function is run in the example above, ecx will point to the static variable called initialized. In the first run, initialized will be 0. As cmpxchg is being executed, ecx points to initialized’s 0, edx holds the exchange value of 1 and eax is the comperand, 0. Because the destination value and eax match up, the requirement for cmpxchg to act is met, and the 1 in edx will be placed in initialized. The eax registry will be set to the original destination value of 0, and the function returns. If you look back at the previous disassembly, namely cmp eax, 0, you will see that this evaluates to true for the first run—and so calcSomething will be called. For the second InterlockedCompareExchange, ecx will point to 1. When this is compared with, and found not to match, the 0 in eax; the exchange will not take place—and the original value of 1 will be returned.

In a single processor or core environment, the cmpxchg alone would be enough to do a “synchronized” increment. Only one instruction is needed to preform the compare and exchange, as well as make a snapshot. If one thread manages to execute the cmpxchg, then be swapped out, and another thread also does the cmpxchg, the value ecx points to will be only be set once, the first thread will hold a value of 0 in its eax, and the second thread will have a value of 1. Under no circumstances can the two threads end up with the same value in eax, and that’s why you want to make sure that calcSomething is called only once.

In a multi-processor environment, you need the lock prefix to the cmpxchg above. This statement makes the memory access synchronized as well. Without the lock, two processors, or cores, may very well execute the cmpxchg at the exact same time, which could possibly lead to two threads with the same eax. With the lock in place, the two+ processors or cores may not access the memory at the same time. Only one thread will be allowed to preform the cmpxchg instruction for the given piece of destination memory (such as the initialized variable), at any one time. Thus, you are assured that eax will have expected values for each call to InterlockedCompareExchange. And again, that’s what you want.

As commenter DaMagic’s keen eye noticed, what’s been discussed thus far isn’t the whole truth. Yes, you’re guaranteed that calcSomething is only called once, as was the initial goal. What isn’t covered is the fact that in the example solution, a thread may find itself passing the if-block, with x still being calculated by another thread. In that case, x may be used without being fully initialized.

Given this additional requirement, you need to expand the locking to wait for an indication that the calculation is completed. While we’re at it, the calculation will be replaced by a critical section, which gives further flexibility in successive runs.

void foo()
{
   volatile static long initialized = 0;
   static CRITICAL_SECTION cs;
   static int x;

   if(initialized != 2)
   {
      if(InterlockedCompareExchange(&initialized, 1, 0) == 0)
      {
         InitializeCriticalSection(&cs);
         InterlockedCompareExchange(&initialized, 2, 1);
      }
      else
      {
         while(initialized != 2)
         {
            Sleep(5);
            SwitchToThread();
         }
      }
   }

   EnterCriticalSection(&cs);
   // ... Do synchronized operations
   LeaveCriticalSection(&cs);

   // ... Do unsynchronized operations, if wanted.
   // At this point x is guaranteed to be initialized.
}

This time around, there’s an outer if block, to minimize the impact on runs which occur after the initialization has been completed. The overhead of post-init runs will be a mere three instructions, which I’m sure you can live with. If initialized happens to be something other than 2, you know that the initialization is either being done, or hasn’t begun at all. Like the previous example, InterlockedCompareExchange is used to assure that only one thread steps into the block that does the initial processing on the static variable (in this case the critical section object “cs”). Unlike the previous example, you deal with the threads that reach the inner initialization check, but aren’t needed to do the actual initialization; and that’s being done in a very simple way.

As long as initialized isn’t 2, waiting threads will briefly sleep, and then pass control on to other threads in the system (by using SwitchToThread). As soon as the initialization is completed, the loop will end, and you’re free to do as you please with the critical section for the remainder of the function body; for example, to deal with your old x and calcSomething.

As you may observe, the full-blown version of the synchronized initialization is somewhat of a bloat. It’s by no means convenient to go through this procedure for each function that contains non-constant static variable initialization (such as calcSomething). In a real-world case, it would be a whole lot better to stick with externally initialized critical sections or mutexes, and perhaps leave the statics out completely. The point of this text, however, was to show a couple of the potential troubles if you actually should choose to use local static variables. Consider yourself warned.

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read