Over the last year or two, processor speeds appear to have peaked. 4 GHz, which was expected last year, has still not made it out of the labs. In fact, the trend now appears to be towards doing more work in parallel, as seen in multi-core processors, rather than speeding up processors. However, multi-core processors do not translate directly into performance boosts for all software applications. Rather, multi-core, multi-processor systems would favour well-behaved, multi-threaded applications.
With this scenario in mind, applications in the future would need to make use of threads (and fibers) intelligently to make optimal use of available resources.
A Quick Look at Threading
"A thread is basically a path of execution through a program. It is also the smallest unit of execution that Win32 schedules. A thread consists of a stack, the state of the CPU registers, and an entry in the execution list of the system scheduler. Each thread shares all of the process's resources."
To create a thread, Windows provides the CreateThread function that creates a thread to execute within the virtual address space of the calling process, based on the LPTHREAD_START_ROUTINE parameter, which is the starting address of the thread, around which the thread's stack is created. This is a rather expensive call because a thread is a Kernel object, and CreateThread needs to drop into Kernel mode (briefly) and back into User mode, and requires a significant amount of memory (the default stack size is 1 Mb per thread).
So, even though you may need to paralellize various activities within your application, doing so in an ad-hoc manner by creating threads at whim would actually degrade your application performance.
Because creating a thread is expensive, the obvious solution is to create threads once and reuse them. In other words: Thread pooling.
To achieve this, the thread function—in other words, the LPTHREAD_START_ROUTINE parameter—would need to be a generic function that would service queued requests. So ideally, an application would create a pool of threads, based on the number of processors available. And then, based on the work to be performed, it would divide the work into various discrete activities or jobs, and parcel them out to the various threads in the queue. This parcelling out would be based on interdependancies amaong the jobs as well as the load on each individual thread in the pool.
Analyzing the interdependancies among various jobs, and coordinating them is an extremely tricky task and frequently error prone. To further compound these problems, there are very few modelling tools or techniques that one can apply to solve them. Whereas Erlang has made significant steps towards paralellization, there is still a lot of work to be done, and I am most certainly not qualified to advise the reader on how to solve his/her application-specific problems.
However, the thread pooling mechanism is a rather generic piece of code, and there are various sophisticated approaches to implementing them. The easiest approach (which this article illustrates) would make use of the facilities Windows offers—such as messages and message queues.
This approach would require wrapping a thread in a shallow wrapper class. The thread would be based on a static member function of this class, which then would run a message loop. Any jobs to be submitted to this thread would be posted as a message to this thread. Whenever the thread receives a message, it would break up the message received into certain pre-specified parts to get the job and the data, and then invoke the job.
The Approach in Depth
Now that you have looked at the need for pooling as well as had a bird's eye view of the implementation, take a deeper look at the implementation. The zip file with this article contains a ThreadPool project (that compiles into a static lib, with no MFC dependencies) and a Test stub (An MFC-based dialog application).
The ThreadPool lib contains three classes:
- CThreadPoolMgr: This class manages the creation, maintenance, and freeing of the thread pool.
- CWorkerThread: This class wraps a Win32 Thread. It has a static function that forms the base of the thread.
- CJob: This represents a job that needs to be performed on any thread in the pool.
From the application lifecycle perspective, the interaction among the instances of these classes is as follows:
- Startup: The CThreadPoolMgr creates a vector of CWorkerThread (one per thread in the pool). In the constructor of the CWorkerThread, CreateThread is invoked with the static method ThreadFunc as the LPTHREAD_START_ROUTINE.
- Submitting a job: Whenever a client wants to run a job, they need to create a CJob object and pass it to the CThreadPoolMgr. The CJob contains a function pointer that corresponds to the job, the parameters for the job, and a notification function pointer, to be called when the function completes. When the client submits a job to the CThreadPoolMgr, the CThreadPoolMgr locates a suitable thread from the pool, and then posts the CJob pointer to the thread. As the thread is running a message loop, it receives the CJob pointer as part of the MSG structure, which it then proceeds to unpack. After unpacking, it first calls the job function, and then the notification function.
- Shutdown: The CThreadPoolMgr iterates over the vector of threads, posting a WM_QUIT message to each of the threads and then waits on it. When a thread finishes processing all its jobs, it then would reach the WM_QUIT message, and return (that is, incidentally, the only clean way for a thread to terminate—TerminateThread, EndThread, and so forth—are not clean ways to terminate a thread).
The ThreadPool is NOT production-grade code because:
- It does not have reliable error handling/propagating mechanisms.
- It assumes all jobs are equal. No mechanism for weighting/prioritizing jobs is provided.
- The pool size is static. It should be dynamic, based on the number of jobs, processors, processor free time, and so on.
- The pool manager cannot clear jobs from the queue.
- There are no reporting mechanisms/statistics for jobs in each thread's queue.
- It does not support job cancellation directly.
- It is not object oriented. (Ideally, functors should be used for jobs as well as notifiers. I have retained them as function pointers to make it easy to "plug and play" your existing multi-threaded code).
- It has been kept simple and easy to read rather than reliable/bug free because it is for demonstration purposes only. (The fact that I am lazy has nothing to do with it whatsoever, of course.)
Also, take a look at Herb Sutter's great article on concurrency available online at http://www.gotw.ca/publications/concurrency-ddj.htm.