Win32 Thread Synchronization, Part I: Overview

Introduction

In a multithreaded Win32 environment, it is necessary to synchronize the activities of threads that access common data to prevent memory corruption. Part 1 of this article gives a general explanation of processes and threads and describes a couple of thread synchronization techniques. Part 2 introduces thread synchronization helper classes, their implementation, and includes sample projects.

Multithreading Primer

To explain why synchronization is necessary, you must first take a moment to understand processes, threads, and thread scheduling. This is a basic overview, so more advanced readers are advised to skip over to the SlowCopy Example section or to jump ahead to Part 2.

What is a process?

A process is a 4 Gb virtual address space that contains an instance of an executable program or application. The process itself doesn't perform any work and can be thought of as a container created by the system to hold the program EXE, any required DLLs, and program memory. The system tracks a process via a process handle and when the program exits, the system frees up any resources associated with the process. As mentioned, the process itself doesn't perform any work—well how does any work get performed? This is where threads come in. Each process must have at least one thread of execution (or 'thread'). Fortunately, the system is nice enough to create this for you when the program executes, so you don't need to do any extra work. When the system creates the process space, it also creates a single, primary thread for the program. In fact, it is this primary thread that the main or winmain function executes in.

What is a thread?

A thread is a path of execution—this is where the actual work is done. If you create a simple console app, and step through the main function in a debugger, you are stepping through the primary thread of the application. Threads are cool; they can be stopped, paused, started, and new threads can be created.

Thread scheduling

The system is very fair about which thread gets executed and will execute a thread in a round robin fashion. For example, say there are five applications running on the system, each with one thread. The system will execute the thread in app1 for a bit, then move on to app2 and execute its thread for a bit, and so on. The little bit that I refer to is called a time slice. As a side note, I'd like to mention that on multiprocessor machines, the round robin method is still used except the system doles out the workload to more than one processor. In other words, each thread still gets a time slice, but more than one thread can be executed simultaneously on different processors.

The algorithm Windows uses to determine thread scheduling is based on many factors, the details of which aren't really important to writing correct multithreaded applications. What is important is that Windows remains in control of how long each thread gets executed. The application itself doesn't have this control and this is a good thing; otherwise, a 'piggy app' will hog the processor time on a system.

Windows 3.1 used to behave this way. It relied on properly behaving apps to execute code in small chunks and then give control back to the operating system. Unfortunately, applications didn't always behave properly, so poorly written apps would tie up the system.

Thread priorities

The thread scheduling is slightly more complicated by the fact that threads can have different priorities. Without going into too much detail, the system will always execute threads with the highest priority before moving on to lower priority threads. A complete explanation of thread priorities is beyond the scope of this article, so for now I are going to assume that all your threads have equal priority. By default, a thread is created with a normal priority, and this will be sufficient for our needs; in fact, most of the time threads are generally created with normal priority.

Pre-emptive multitasking

On the NT-based operating systems such as Win2000 and XP, the system uses something called pre-emptive multitasking that keeps the system in control of the thread scheduling. With pre-emptive multitasking, the system uses the round robin scheme to give each thread its appropriate time slice. When a thread has used up its allocated time slice, the system puts that thread to sleep and moves on to giving the next thread a time slice. With pre-emptive multitasking, it is much more difficult for an errant application to take over the system (although not impossible).

Why care about thread scheduling?

As I've said earlier, threads exist to perform work and this work often includes reading and writing data to memory. Thread scheduling and synchronization become important when more than one thread needs to access the memory (usually referred to as 'shared memory'). By the waw, I keep referring to shared memory, but really I'm referring to shared resources whether it's a file, memory, or some other resource.

Because of the pre-emptive nature of the OS, you are not guaranteed that a section of memory written by one thread has completed fully before the OS has interrupted the thread to run another thread. The problem occurs when this second thread needs to read the memory because the first thread may not have completed the write operation.

Article Source Code (Parts I & II)

The complete source code for Parts 1 and 2 is included in the ThreadSync.Zip file located in the link at the bottom of this article. The ThreadSync.sln consists of the following projects:

  • SlowCopy: This console example illustrates sharing memory between threads with 1) no synchronization (Part I); 2) native synchronization using a critical section (Part I); and 3) synchronization using the helper classes (Part II)
  • LogSend/LogRcv (Part I): These are two applications that illustrate using helper classes to protect an std::queue shared between threads and to use a mutex to protect resources shared between multiple processes.
  • OnlyOne (Part II) This is an MFC application that uses a mutex to limit it to a single instance. In addition, this project uses a memory mapped file to share the hWnd. This second instance uses this hWnd to bring the first instance into the foreground before exiting.

SlowCopy Example

To illustrate what can go wrong with sharing data between threads, you are going to need an example that creates a couple of threads and performs some operation with shared data that will exhibit memory corruption.

Enter the SlowCopy project in the ThreadSync solution. In the SlowCopy project, you create two threads: T1 and T2 that share a string. T2's job is to sit in a loop and display the string whereas the job of T1 is to copy data into the string (while T2 is displaying the data).

SlowCopy Structure and Classes

The SlowCopy project is actually multiple projects in one, but rather than having separate projects—one that copies a string without synchronization, one that copies with native critical section synchronization, and finally one that copies using synchronization via the helper classes—I've created one project that uses three different classes derived from a base class. To view the project without synchronization or with one of the synchronization methods, the reader is asked to uncomment the appropriate class in the program main.

Table 1: SlowCopy Classes
Class Type Description
CSlowCopy Base Base class that handles creation of the secondary display thread. Also declares two virtual functions used to perform the string copy and display the string.
CSlowCopyNoSync DerivedfromCSlowCopy Virtual methods perform string copy and display of the shared string without any synchronization.
CSlowCopyNativeCS DerivedfromCSlowCopy Virtual methods perform string copy and display of the shared string using synchronization via the native Win32 Critical Section.
CSlowCopyAutoLockCS DerivedfromCSlowCopy Virtual methods perform string copy and display of the shared string using synchronization via the helper classes. This portion of the example is looked at in Part 2 of the article.

Why use a base class in SlowCopy?

One may ask why use C++ inheritance and polymorphism in such a simple example? At first, this might seem as though it adds unnecessary complexity. However, because our goal is to take the reader through the synchronization levels from non-synchronized data sharing to synchronization using native Win32 to finally synchronization using the helper classes, it makes sense to pull all the thread creation and other common data and methods into a base class. I feel this is ultimately clearer to the reader because the reader only has to look at changes to the two virtual methods in each of the derived classes to understand the code changes of each synchronization level.

Win32 Thread Synchronization, Part I: Overview

Running SlowCopy Without Synchronization

You will first look at running the SlowCopy example without any synchronization to demonstrate what occurs when unprotected shared memory is accessed from multiple threads.

Compiling the source

To compile the example, first unzip the accompanying zipped source file and Open the ThreadSync.sln solution in VC .NET 2003. The SlowCopy project should be the 'Active' project; this can be verified as the project in highlighted in bold in the Solution Explorer. If another project is displayed in bold, simply set the SlowCopy project active by right-clicking on the 'SlowCopy' node and choosing 'Set As Startup Project' item in the popup menu. Next, verify that you will be compiling the non-synchronization portion of the project. Simply open the SlowCopy.cpp file, scroll to the bottom, and verify that the CSlowCopyNoSync class in the _tmain function is uncommented. Note: the line containing the CSlowCopyNativeCS and CSlowCopyAutoLockCS classes should remain commented. The function should look similar to:

int _tmain(int argc, _TCHAR* argv[])
{
   {
      CSlowCopyNoSync sc;
      // CSlowCopyNativeCS sc;
      // CSlowCopyAutoLockCS sc;

      sc.PrintHeader();
      sc.CreateDisplayThread();
      sc.PerformCopy();
      sc.PrintFooter();
   }
   return 0;
}

Program description

When compiled using the CSlowCopyNoSync class, the program shares string data between two threads without any synchronization. This class creates a secondary thread (T2) that loops to display string data. The primary thread (T1) performs a string copy on the shared string while T2 is displaying the string. In this example, you expect to see memory corruption in the output.

Source code listing

The two interesting functions of the SlowCopy example are the SlowStrCopy function that executes in the primary thread and the secondary thread DisplayThread function.

// DisplayThread - runs in secondary thread
UINT WINAPI DisplayThread( )
{
      int nLineNumber = 1;

      while( TRUE )
      {
         std::cout
            << setiosflags( std::ios::right )
            << std::setw(3)
            << nLineNumber++
            << _T(") ")
            << m_szDest
            << std::endl;

         // Check if the shutdown event has been set; if so, exit
         // the thread
         if( WAIT_OBJECT_0 == WaitForSingleObject( m_hShutdownEvent,
                                                   0 ) )
         {
            return 0;
         }
      }

      // In this example, we never reach here
      return 1;
   }

// SlowStrCopy - runs in primary thread
LPTSTR SlowStrCpy( LPTSTR szDest, LPCTSTR szSource )
{
   LPTSTR szStart = szDest;
   while( *szSource != _T('\0') )
   {
      *szDest++ = *szSource++;

      // To illustrate what happens when a thread gets pre-empted
      // before its work has completed, we are going to copy a char
      // and then give up our remaining time slice by calling Sleep(0).
      // This will cause the secondary thread to display the
      // [partially copied] string.
      Sleep( 0 );
   }

   *szDest = _T('\0');

   return szStart;
}

Program output

Here is the output from the SlowCopy program executing the SlowCopyNoSync class:

-------------------------------------------------------------------
-------------------------------------------------------------------
SlowCopy Example - Illustrates using non-synchronized shared memory
between two threads.

Source: The source string that will replace the dest string.
Dest: Original destination string that will be overwritten.

Secondary Thread Started
1) Original destination string that will be overwritten.
2) The snal destination string that will be overwritten.
3) The sourcdestination string that will be overwritten.
4) The source stination string that will be overwritten.
5) The source strintion string that will be overwritten.
6) The source string tn string that will be overwritten.
7) The source string that ring that will be overwritten.
8) The source string that will that will be overwritten.
9) The source string that will reat will be overwritten.
10) The source string that will replacill be overwritten.
11) The source string that will replace t be overwritten.
12) The source string that will replace the doverwritten.
13) The source string that will replace the dest written.
14) The source string that will replace the dest strtten.
15) The source string that will replace the dest string..
16) The source string that will replace the dest string.
Secondary Thread Exited

Destination string after SlowCopyNoSync:
The source string that will replace the dest string.

End of program!
-------------------------------------------------------------------
-------------------------------------------------------------------

Program interpretation

As you can see from the output, the destination string gets slowly overwritten. Keep in mind that this output is produced by the secondary thread (T2) and the primary thread (T1) is the thread doing the copying. Because both threads run at the same time, they both are accessing the data at the same time. It works something like this: T1 partially copies the string, and then gets pre-empted by the OS (remember the OS pre-empts the thread by putting it to sleep). T2 runs, and displays the complete string (the std::cout function always displays the whole string). T1 runs again and copies more of the string before it gets pre-empted to let T2 run, and so on.

It may look cool in the output (especially since I've colored the newly written destination string in blue), but rarely would you ever want this to occur in an actual program; this type of behavior is what is commonly called memory corruption. Usually, you would want to synchronize the string access such that it is either accessed for copying or displaying; one or the other, but not both simultaneously.

When properly protected, only one thread will be able to access the shared data at any one time. This means that the display thread (T2) will be blocked from accessing the string until the primary thread (T1) has finished the copy operation. When the shared memory has been synchronized properly, you will expect the output to be similar to:

Secondary Thread Started
1) Original destination string that will be overwritten.
2) The source string that will replace the dest string.
Secondary Thread Exited

Win32 Thread Synchronization, Part I: Overview

How to Ensure Data Consistency with Data Shared Between Threads

The thread synchronization method used in Windows is very simple: Any thread that accesses shared data needs to tell the OS it is using the data via one of the synchronization primitives. The OS grants this usage on a first come, first serve basis. When the OS grants access to a thread, the thread is said to have acquired a lock. So, the first thread to use the data gets access and any subsequent thread that attempts to use the data will have to wait until the first thread has finished with it. When appropriate, the thread tells the OS it is finished with the data or has unlocked the data.

Table 2: Generic Thread Sequence
Sequence # Thread 1 Thread 2
0001 Request data lock (OS grants lock) ...
0002 Thread performs data operations Request data Lock (OS denies lock)
0003 Thread performs data operations Thread sleeps
0004 Thread performs data operations Thread sleeps
0005 Thread releases lock OS grants lock; OS wakes thread
0006 ... Thread performs data operations
0007 Request data lock (OS denies lock) Thread performs data operations
0008 Thread sleeps Thread releases lock
0009 OS grants lock; OS wakes thread ...
0010 Thread performs data operations ...
0011 Thread performs data operations ...
0012 Thread releases lock ...

Synchronization methods

Windows offers several synchronization primitives, sometimes called waitable objects. These are processes, threads, files, console input, file change notifications, mutexes, semaphores, events, waitable timers, and critical sections. With the exception of the user mode critical section, the others are all kernel objects and each can be used to synchronize threads.

However, for the purpose of this article, you are going to focus on critical sections, mutexes, and events.

Critical sections

Critical sections are user mode synchronization objects provided by the system. Because they operate in user mode, they are fast. Unfortunately, user-mode objects can't cross process boundaries, so critical sections won't work for synchronization tasks that run across processes. Although critical sections can't be used across process boundaries, they are very useful for in-process synchronization needs and are able to handle most simple synchronization tasks.

Mutexes

Mutexes are kernel mode synchronization objects. As such, they are slower than critical sections because it takes time to travel from user mode to kernel mode, but have the advantage to operate across process boundaries. They offer an additional advantage over Critical sections in that they can be 'named objects' whereby the mutex can be accessed by its handle or by its name. This 'named' ability is extremely handy when using the same mutex from different processes. I should mention that an alternate to using a named mutex is to create the mutex in one process and then use the DuplicateHandle function to allow its use in another process. This approach can incur additional overhead because it requires the handle value and process id from the first process to be passed to the second process.

Events

Events are not really used to protect shared data per se, but are used to signal when an action has occurred. For example, if T1 changes some data, it is useful for T1 to signal T2 when the data has changed. Because of events' signaling ability, they are very useful. A common mistake with developers new to multithreaded programming is to use a 'time based' operation rather than event to wait. These events are frequently used with the WaitFor family of functions: WaitForSingleObject, WaitForMultipleObjects, and MsgWaitForMultipleObjects. Events are very important when performing multithreaded programming.

Polling

Polling is an alternative to waiting on a synchronization object such as an event. The problem with polling is that, because it doesn't know when an action in another thread may occur, it must continuously check (or poll) for the change. Because of this, polling tends to waste system resources.

Running SlowCopy with Native Synchronization

The output from SlowCopy when compiled with the CSlowCopyNoSync class shows no synchronization and corrupted output. This time, look at protecting the data by using a critical section via the native Win32 APIs.

Compiling the source

To change the SlowCopy example to synchronize using a native critical section, simply comment out the CSlowCopyNoSync line and uncomment the CSlowCopyNativeCS line in the _tmain function. The function should look similar to this:

int _tmain(int argc, _TCHAR* argv[])
{
   {
      // CSlowCopyNoSync sc;
      CSlowCopyNativeCS sc;
      // CSlowCopyAutoLockCS sc;

      sc.PrintHeader();
      sc.CreateDisplayThread();
      sc.PerformCopy();
      sc.PrintFooter();
   }
   return 0;
}

Program description

When compiled with CSlowCopyNativeCS, the SlowCopy program runs the same as before, except it now protects the shared data with a critical section.

To add a critical section, wyou need to make the following code changes:

  1. Declare a CRITICAL_SECTION class member variable (m_cs).
  2. Initialize the critical section variable in the constructor.
  3. Use the critical section to guard the SlowStrCopy method in the primary thread.
  4. Use the critical section to guard the DisplayThread method in the secondary thread.
  5. Delete the critical section in the destructor.

Source code listing

Looking at the DisplayThread and SlowStrCopy methods, you can see the appropriate critical section lines have been added in red.

Note: The critical section variable declaration, initialization, and deletion code have been omitted from the text. Please refer to the source code for complete implementation.
UINT WINAPI DisplayThread()
{
   int nLineNumber = 1;

   while( TRUE )
   {
      // Prevent the primary thread from accessing
      // the m_szDest string
      EnterCriticalSection( &m_cs );

      std::cout
      << setiosflags( std::ios::right )
         << std::setw(3)
         << nLineNumber++
         << _T(") ")
         << m_szDest
         << std::endl;

      // Allow the primary thread to access the m_szDest string
      LeaveCriticalSection( &m_cs );

      // Check if the shutdown event has been set;
      // if so, exit the thread
      if( WAIT_OBJECT_0 == WaitForSingleObject( m_hShutdownEvent, 0 ) )
      {
         return 0;
      }
   }

   // In this example, we never reach here
   return 1;
}

LPTSTR SlowStrCpy( LPTSTR szDest, LPCTSTR szSource )
{
   // Prevent the secondary thread from accessing the m_szDest string
   EnterCriticalSection( &m_cs );

   LPTSTR szStart = szDest;
   while( *szSource != _T('\0') )
   {
      *szDest++ = *szSource++;

      // To illustrate what happens when a thread gets pre-empted before
      // its work has completed, we are going to copy a char and then
      // give up our remaining time slice. This will cause the
      // secondary thread to display the string.

      // NOTE: Unlike the SlowCopy example, the Sleep here doesn't
      // really do anything because access to the string is synchronized.
      Sleep( 0 );
   }

   *szDest = _T('\0');

   // Allow the secondary thread to access the m_szDest string
   LeaveCriticalSection( &m_cs );

   return szStart;
}

Win32 Thread Synchronization, Part I: Overview

Program output

Here is the output from the SlowCopy program executing the CSlowCopyNativeCS class:

-------------------------------------------------------------------
-------------------------------------------------------------------
SlowCopyNativeCS Example - Illustrates synchronizing shared memory
between two threads using a native critical section.
Source: The source string that will replace the dest string.
Dest: Original destination string that will be overwritten.

Secondary Thread Started
1) Original destination string that will be overwritten.
2) Original destination string that will be overwritten.
3) Original destination string that will be overwritten.
4) The source string that will replace the dest string.
Secondary Thread Exited

Destination string after SlowCopy:
The source string that will replace the dest string.

End of program!
-------------------------------------------------------------------
-------------------------------------------------------------------

Program interpretation

The output for the SlowCopy example compiled using the CSlowCopyNativeCS class is quite different from the previous output. As you can see, once the string copy operation started, it completed entirely before the string was displayed in the secondary thread.

As you can see from the sequence table below, the OS guards the string resource by only allowing one thread access to the string at a time. It should be mentioned that the OS doesn't guard the string resource directly per se; rather, it protects any operations (and resources) that occur within the EnterCriticalSection/LeaveCriticalSection block. Because you are only accessing one piece of shared data (our shared string) within this block, the OS is protecting your shared string.

Table 3: CSlowCopyNativeCS Sequence
Sequence # Thread 1 Thread 2
0001 Sleeping (Thread induced sleep) Requests lock (OS grants lock)
0002 Sleeping Displays string
0003 Sleeping Releases lock
0004 Sleeping Requests lock (OS grants lock)
0005 Requests lock (OS denies lock) Displays string
0006 Sleeping (OS forced sleep) Releases lock
0007 OS wakes thread; grants lock Requests lock (OS denies lock)
0008 Slow string copy in progress Sleeping (OS forced sleep)
0009 Slow string copy in progress Sleeping
0010 Releases lock OS wakes thread; grants lock
0011 Thread sets shut down event then waits for secondary thread to complete Displays string
0012 Sleeping Releases lock
0013 Sleeping Checks shutdown event; thread exits
0014 Secondary thread exits; primary thread wakes ...
0015 Primary thread exits; program ends ...

Using a Mutex

For protecting shared memory between threads in-process, mutexes really aren't able to do anything more than a critical section in this case. However, mutexes have a few properties that make them different than critical sections:

  1. They can be used with synchronization functions WaitForSingleObject, WaitForMultipleObjects, and MsgWaitForMultipleObjects. Critical sections cannot be used with these functions.
  2. Where critical sections are useful for protecting shared memory between threads within a single process, mutexes can do the same and can be used to protect resources shared across processes.
  3. Mutexes can be identified by name.

Mutex: A word on obtaining a lock

Lock and unlock operations for a mutex are slightly different than those of a critical section. Whereas a critical section uses EnterCriticalSection and LeaveCriticalSection to lock and unlock, a mutex can be locked using any of the WaitForXXX APIs and unlocked using the ReleaseMutex function. Table 4 outlines the initialization, lock, unlock, and close functions for critical sections and mutexes.

Table 4: Critical Section and Mutex Functions
Operation Synchronization Object
Critical Section Mutex
Initialize InitializeCriticalSection CreateMutex
Lock EnterCriticalSection WaitForSingleObjectWaitForMultipleObjectsMsgWaitForMultipleObjects
Unlock LeaveCriticalSection ReleaseMutex
Close DeleteCriticalSection CloseHandle

Win32 Thread Synchronization, Part I: Overview

A Mutex code sample

To illustrate the across process benefits of a mutex, you would have to create a couple of applications that share data. Because the goal is to demonstrate the helper classes (that include a mutex class) and the creation of such an example would require quite a bit of work, you are going to defer creating a mutex example until the helper classes are introduced in Part 2. Just to tide the reader over, I'll demonstrate below how to create, lock, unlock, and close the mutex.

// Create the mutex
HANDLE hMutex = ::CreateMutex( NULL, FALSE, _T("theMutex") ) )

// Lock it
::WaitForSingleObject( hMutex, INFINITE );

// Unlock it
::ReleaseMutex( hMutex );

// Close the mutex
::CloseHandle( m_hMutex );

What Can Go Wrong when Using Critical Sections and Mutexes

As you can see from the text and code examples, using critical sections and mutexes to protect shared memory seems pretty straightforward, so what can go wrong? Because they're simple to use, there aren't many things that can go wrong, but here are the most common:

  1. Forgetting to unlock the sync object. This may seem too obvious to mention, but this is probably the most common error. There are many ways for this to happen, such the developer forgetting to call the unlock operation, or in the case of global objects unlocking the wrong object. Program changes often cause sync objects not to get unlocked, such as exiting a function early before the unlock operation gets called.
  2. Forgetting to clean up the object by not calling DeleteCriticalSection or CloseHandle. On programs that run for long periods, these types of handle leaks can lead to excessive resource consumption.
  3. Locking too early or unlocking too late. A good general rule is to lock for as short a time as possible and lock the required resources only.

These common errors may not seem significant, but on complex systems they are often difficult to diagnose. It's not always apparent which thread has failed to unlock a resource when sharing resources over many threads or across processes.

Summary

In Part 1, I've briefly explained processes, threads, and thread scheduling. You also have explored using the Win32 synchronization objects. have included a few code examples describing the general problem of data sharing between threads and have offered a couple of solutions using the native synchronization APIs. Finally, you have seen several potential problems associated with using these APIs. In Part 2, I will introduce the synchronization class helpers and provide samples to demonstrate their usage to help overcome the common problems typically encountered when using the synchronization APIs directly.



Downloads

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • This paper introduces IBM Java on the IBM PowerLinux 7R2 server and describes IBM's implementation of the Java platform, which includes IBM's Java Virtual Machine and development toolkit.

  • Agile methodologies give development and test teams the ability to build software at a faster rate than ever before. Combining DevOps with hybrid cloud architectures give teams not just the principles, but also the technology necessary to achieve their goals. By combining hybrid cloud and DevOps: IT departments maintain control, visibility, and security Dev/test teams remain agile and collaborative Organizational barriers are broken down Innovation and automation can thrive Download this white paper to …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds