Click to See Complete Forum and Search --> : Posix or Boost


PredicateNormative
May 31st, 2007, 11:39 AM
Which is the best library to use for multi-threading?

MikeAThon
May 31st, 2007, 12:01 PM
Neither. Unless cross-platform compatibility is important to you, use the native APIs.

Mike

PredicateNormative
May 31st, 2007, 02:04 PM
Unfortunately, all code we are developing for the new project I am working on has the requirement of being cross platform compatible which is a complete nightmare, because it's image processing software, and the cross-platform requirement is a killer as far as optimisation goes! It also needs to be multi-threaded, hence the question.

JVene
May 31st, 2007, 04:44 PM
You're in my backyard :)

I thread frequently, and on a whim, and crossplatform.

The issues in threading are not well covered in any framework I've seen.

For cross platform work, as you've discovered I'm sure, the different platforms you target have similar but not identical API services for threading.

An object library of your own creation, possibly based on Boost somewhat, may be you best option.

You need to consider what things you want and need for dealing with threading in your work.

In my own library, a queue object is paramount. I use mine (1 of 3 flavors) everywhere. Instantiating one represents the launch of a thread, which then waits for work.

The key in cross platform work is to create an interface for your application(s) that is consistent, where the objects that provide that interface are built for each of the platforms.

Launching a thread in Windows vs Linux is conceptually similar, though the API calls differ. Once the thread gets launched, the way you wait on an event differs between platforms, but the results are nearly identical.

I couldn't develop threaded work easily without something that represents a pointer to a member function, or a pointer to a function, with permutations of parameters to be forwarded, which I supply to the queue. The Queue object can accept a large volume of such 'procs', which it services in turn.

Windows has the critical section, which you should use there because it's fast, light weight and works well. It accepts re-entrant locks, which the Linux counterpart doesn't. You have to decide which way you want things to work, and emulate that in the alternate platforms.

Linux will require a mutex in place of the critical section. In my own implementation, I allow re-entrant locks by emulation. So far, in over 100 applications, it's been relatively quick and reliable. Application code 'knows' no difference (it's written to one, common API).

Smart pointers with reference counting is another very important part of my own library. Boost's smart pointers are useful here, with a caveat.

In my own testing I discovered that SOME threaded work that generates lots of objects may end up queuing behind each other at the allocation of RAM. This is because most CRT allocation systems are not threaded - they're thread SAFE, but they lock the entire heap for every allocation. 8 and 16 core tasks that start out allocating lots of material lock against each other, stealing throughput. While this happens with or without smart pointers, I found that the penalty is heavier with smart pointers because there's an implied allocation of a 'node' interior to the reference counting mechanism that doubles the number of allocations.

Smart pointers are important because they solve the issue of containment in threading (the which thread deletes what question). The do exact a minor performance penalty, but its usually smaller than counterparts found in Java and C#.

The solution to the allocation issue, for my own library, was to multiplex the allocation system for certain key objects. My own system creates a thread local storage of cached allocations, so that only about 1 in 300 lock against the general heap. I implement this with overloaded new and delete operators on those key objects. The allocation is faster even in single threaded work because of the allocation's specialization (when you can assume all of the objects are of the same size, allocation logic is simpler, faster and the side effects of fragmentation are controlled).

This latter point may easily be overkill for most applications - in some cases, though, I was able to move certain algorithms on 8 and 16 core machines, which had literally starved all but 1 core into an idle, back up to 100% usage of all 8 or 16 cores.

Applied to all smart pointers, it means that the 'weight' of the smart pointer is reduced by about 75% (that is, only 25% of the penalty one normally associates with smart pointers, small already, remained).

For a single application, just Queues (thread objects), a way to submit calls to the que (function pointer template objects), and synchronization objects go a long way. I built a library for my own work over the course of 15 years of development, so lending a 1% gain here or there is now trivial. It accumulates, though, such that overall I gain about 5% to 15% performance gains over 'manual' approaches, and in development time I enjoy near trivial effort to deploy threads.

Summarized another way, I agree that you should work with the local API, but for cross platform, encapsulate that into a common framework that looks identical to your application code for all platforms.

PredicateNormative
June 1st, 2007, 04:34 AM
Thanks JVene, that is a really helpful post with some good pointers. Our section of the division has opted to use the gcc compiler for both Windows (MinGW) and Linux. Prior to the decision, I did some run-time performance comparison tests on the same single threaded code but compiled with MinGW, Visual Studio 2003 and Visual Studio 2005. The Visual Studio 2005 compiler optimisation is light years ahead of the other two and yeilded significant run-time performance improvements (difference between MinGW and 2003 was 6 and half a dozen over all). However, in spite of this, we opted to use MinGW for political reasons. Developing an in house object library for threading based on Boost (as you have suggested) could prove to be a good use of time.

JVene
June 1st, 2007, 12:54 PM
During the development phase in particular, using GCC may be a good choice.

So far, I've found that code which VS2K5 is comfortable with will not pass GCC. This has usually been associated with the order of code appearance relative to template classes. GCC appears to be a little stricter here, so by focusing on GCC first you should have fewer portability issues moving toward a VS2K5 build, if the team simply wants to pass it through VS for a faster Windows product.

On the other hand, the VS2k5 debugger is quite good :)

Checking over my own object layout, I thought I'd make a better list of cross platform threading support objects I use most frequently. Boost may have several of these.


Sync - represents the CriticalSection in Windows, or the Mutex in Linux

Lock - 'Enters' the critical section on creation, 'leaves' on exit, similar under the mutex. Emulates re-entrant lock counting on Linux (so Windows & Linux code function the same). This is RAII locking.

Proc - a family of template objects based on a pure virtual, non-template base, representing a pointer to function (member or non-member, with versions of handling parameter passing). Used with queues

Que - Represents a thread of execution (my own preference in design causes the thread to launch at construction, graceful shutdown with waiting on destruction). Stores a queue (list, stack, your options) of Procs for processing (essentially a container of function pointers with parameters).

Event - most often the AutoResetEvent - something a thread can wait on, and can be signaled (to release it from waiting).

Pex - Parallel execution coordinator. If a process can be naturally (easily) divided into sections and processed in parallel, Pex handles much of the work to coordinate the startup and release of the threaded sections. The task, basically, is first to schedule n threads to run, dishing out sections of the job to each thread (presumably each core). Pex can check for availability and dynamically choose alternatives (if, for example, an 8 core machine is 50% busy, then only schedule 4 cores).

Pex has options such that, say, the job is going into 4 cores, that it can send out 3 and run the 4th itself (it's own thread), send out 4 and wait, or send out 4 and continue.

Usually, application code is written such that a call so scheduled should appear as a simple blocking call (just a function that does something from the application code's viewpoint). However, it may be important that the call NOT return until all 4 threads have finished, completing the 'illusion' of a single threaded blocking call. As such, Pex watches the scheduled threads and waits on a signal for each of the threads it initiates, returning to the caller when all are completed.

Pex is a family of objects in my implementation - some internal, a few options at setup control the behavior.

Pex is one of the recent additions to my own library (about 9 years ago), when early multi-core machines (say P2, Ppro or non x86 machines) appeared. If your use of threading isn't about scheduling multiple cores to work an algorithm in parallel, you won't need it.



Smart Pointers - I made my own just after templates were introduced, long before the STL or boost existed, so I still use them (they're well tuned for parallel performance in ways even the boost pointers aren't).

Thread Protected 'handles' - This is a set of templates that represent some value or object that should be protected by a critical section for update/access. It's typical that a few member variables are manipulated by multiple threads, and as such a typical Get/Set pair are invoked in order to 'lock' the value, change/load the value, then 'release' the lock on the value. I simplify this repeated work by wrapping the concept into a template. As such, if I have a ThreadLocked< int > v; - it means that v is automatically locked against it's own sync (critical section or mutex) when you alter it or read it's contents.

PredicateNormative
June 1st, 2007, 05:10 PM
JVene, that is a well thought out set of tools that look like they have come from years of experience! :) Thank you very much for a detailed approach of how to tackle multi-thread programming, I am very greatful. It might take me a while, but I wouldn't mind developing a similar set of tools. :)