I'd appreciate your help with this one because I think I'm out of my depth!
I've got 2 threads running in parallel, both using separate instances of a class (in theory) but the behaviour seems like both threads are accessing the same instance of the class.
Everything works fine while just one thread is running, but if more than one runs then everything grinds to a halt like one thread was trying to use the data from another.
When debugging, I noticed that each instance of my class has the same address for __vfptr. Is that normal? I thought that the vftable was a table of pointers to the functions of that class, but I'm too inexperienced to know if that's right or not.
I'm creating the instances of the class like this:
for (int k = 0; k<NUM_THREADS; k++) //=2
{
interface[k] = CDataInterface::create(0,Buffer[0]->GetSize(),3,300);
}
And the corresponding code is:
class __declspec(dllexport) CDataInterface
{
public:
static CDataInterface* create(int mode,int size, int channels, int new_size);
virtual ~CDataInterface(void);
...
CDataInterface* CDataInterface::create(int mode, int size, int channels, int new_size)
{
return new CData(mode, size,channels,new_size);
}
Sorry if this is a dumb question :-s
Ben.
JVene
August 9th, 2007, 03:20 PM
First, it's normal for the vtable pointer to be shared among instances of a class. There's no need for there to be one entry per instance, only one per class. The contents of that table are relative to classes, not to individual instances of a class. Put another way, two instances of the same class would have the same content for selecting the appropriate virtual functions, so why would a unique table per instance be anything but a waste of space?
The real question is, therefore, is the pointer to the instance(s) of the class different for each thread? If they are, your problem isn't about two threads sharing one instance.
This would turn our attention to what is happening in the function your threads are running, so we'll need to see that to be of help.
On the other hand, a few notions do come to mind which are typical to your inquiry. Static members, for example, are shared among instances of a class, and they can be an issue for threaded programming.
Next, how are you dividing the job to be performed by each thread? Parallel processing can provide considerable gains, but the design must take into account the architecture of the machine. If, for example, your design interleaved the items in a large vector, it's possible you're experiencing a kind of performance loss that can happen when two processors need to access close memory locations. Scalability is better when frequently accessed data is spaced some distance apart.
What you're reporting, however, sounds less like a scalability issue (where performance simply remains close to that of single thread), and more like a race condition, or something loosely related. When performance is actually slower in threaded versions, it's often due to synchronization problems starving the task for CPU time. Another overlooked problem is allocation. Generally, CRT memory allocations are serialized (that is, there's an overall lock on the heap such that only one thread at a time gets serviced). If your threaded functions frequently allocate RAM, that effectively creates a queue behind which threads wait for allocation services, all but starving scalability.
It could be, however, that you have some algorithmic issue. Perhaps the two threads are not synchronized against each other for some common resource, and are manipulating each other's control values (like a position in a container, for example) - such that the threaded version is actually working harder, repeating work on sections the opposing thread has already worked.
shoppinit
August 10th, 2007, 04:48 AM
Thanks for your reply. That helps me understand what's going on with the vf pointer. It's turns out that the problem of shared data was a ridiculous coding error on my part asking 2 different threads to use the same instances. :s No more crashes...
Still, it made me go through everything and try to understand better what was going on so that the positive side.
Performance is still worse with multiple threads then with with just one thread though, so I've obviously still got issues somewhere. In theory, any major memory allocation that I need to do is done in the class constructor and then the memory is reused. I'm not using any vectors either...
In terms of synchonisation, there are a couple of places where the threads read or write common variables, but these are protected with a critical section and on for the time it takes to read / write the variable.
I'll keep looking.
Thanks again.
Ben.
JVene
August 10th, 2007, 01:50 PM
In terms of synchonisation, there are a couple of places where the threads read or write common variables, but these are protected with a critical section and on for the time it takes to read / write the variable.
And that sounds at it should be, but it does suggest a line of thinking to follow.
First, what is the CPU usage like when this process is running in its threaded version, as opposed to CPU usage when it's non-threaded?
If the threaded version shows 100% CPU usage, as does the non-threaded version, then I suspect further coding/logic problems making the algorithm run afoul, wasting CPU resources.
However, if the CPU usage in the threaded version is less that 100%, closer to 50% or even lower (but this still applies if its under 90% or so), it could be synchronization problems.
Its true that synchronization is required for shared resources (typically - rare exceptions can be illustrated). If the resource is frequently contentious, where a second thread often attempts to lock the resource, but must wait, CPU usage drops, and scalability drops. This can be a vicious cycle because of the nature of releasing such a wait state.
When a critical section is locked and released without another thread having ever attempted a lock, the action is quick and the implication minimal. However, when a thread attempting to lock a critical section is forced to wait, it does not get released at the first opportunity after the 'owning' thread releases the critical section. The time delay from that moment when the critical section is released to the moment when the waiting thread jumps into action is based on the operating system's task scheduler switching speed, typically in the region of 150,000 switches per second on average modern Windows hardware.
If your process could run, say, 1 million iterations in a second in a single thread, and this contentious problem comes into play, the result of a threaded version could be in the region of 1/10th to 1/5th the speed of the single threaded version.
What makes this a vicious cycle is the fact that when the owning thread releases the critical section, should the process quickly cause it to tend to lock the critical section within the task scheduling delay time, it will enter into a queue waiting for the critical section BEHIND the thread that now, technically, owns the lock - but hasn't been scheduled for attention by a CPU. This can happen with any number of CPU's. Thread 1 locks the critical section, thread 2 waits behind it. Thread 1 releases, does more work, thread 2 now owns the lock. Thread 2 hasn't launched yet, so thread 1 'comes back around' to obtain the lock, entering into the queue behind thread 2. Thread 2 is scheduled (and jumps into action) some 300 microseconds later, processes some work, releases the lock and now thread 1 owns the lock but hasn't been scheduled for attention. Thread 2 now comes back around to get the lock - both threads are stalled, waiting on the task scheduler.
This means the task scheduler becomes the arbiter of the speed of the processing cycle. CPU usage might have dropped to 1%.
The only way out of this problem is to rethink your shared data, the algorithm itself and factor the work into something where the probability of locked contention is reduced significantly. If you genuinely find that's not possible, you may have identified that rare situation where threaded programming can't be applied.
shoppinit
August 11th, 2007, 08:59 AM
Hi Jvene,
Thanks for that interesting reply. It's certainly given me something to think about. The locking / queueing mechanism is something that didn't occur to me before and although I don't think that's what's happening in this case, it's certainly something that I'll bear in mind in the future.
When is it appropriate *not* to use a critical section? If a thread is only reading a common variable and not modifying it (although the variable could be modified elsewhere) is a critical section needed? This is a question that's been bugging me for a while. To be on the safe side I generally protect all reads and writes of common data with critical sections but I now wonder if this is strictly necessary.
Thanks again.
Ben.
JVene
August 11th, 2007, 02:53 PM
An example of a read only/modify common variable that doesn't require locking, perhaps a time value.
This is a contrived example, but let's say you have a value representing time, updated about 10 times per second, and that value is referenced from some static variable when 'stamping' some data.
A timer fires 10 times per second, causing an update to this value.
Various threads read from this value thousands of times each second.
Is it important that the value stamped on the objects, copied from this static variable, be absolutely in sync with that moment that the timer fires? Probably not.
If a few threads are reading from this value simultaneously, and the timer fires, a few of those threads might read the old value, while a few others might read the new value, 1/10th of a second off. Is that critical? It's a judgment call, and this design description hints that it's not critical, else a 1/10th second resolution would probably be inadequate.
There certainly could be designs where this is critical, only the designer could know with certainty, but as a contrived example this could be a design were locks on a critical section would be of not importance.
exterminator
August 21st, 2007, 06:28 AM
If you have quite many readers and few writers then a reader/writer lock might be a better choice than a critical section. Protecting the resource with a critical section by a reader when there are only active readers being blocked is a situation that reader/writer locks are better suited for and could be a performance improvement over a critical section.
Arjay
August 21st, 2007, 12:34 PM
If you have quite many readers and few writers then a reader/writer lock might be a better choice than a critical section. Protecting the resource with a critical section by a reader when there are only active readers being blocked is a situation that reader/writer locks are better suited for and could be a performance improvement over a critical section.Exactly. In the scenario what JVene describe where a variable is written to 10/sec, but read from 1000/sec across multiple thread, a reader/writer lock (sometimes called a single writer/multiple reader lock or SWMR) should help. Another area that helps performance is to keep the lock times down to an absolute minimum - only lock while accessing what you need protected. In sounds obvious, but it's easy to lock too early and include operations that don't need to be included in the lock. For example, say I need to access some data from a database. And this is going to be really obvious why not to do this, but you could:
lock
form database params
open db connection
retrieve data
assign it to the shared (locked) variable
close db connection
unlock
Of course who would ever do this? Well it happens all the time. The point is to lock for the least amount of time. The real sequence should be:
form params
open db
retrieve data
lock
assign data
unlock
close db
Many of us are already big into RAII, and it combined with accessor methods (that internally do the locking) really helps in these situations. Also threading frameworks that leverage RAII generally allow you to switch between different locking mechanisms. So you could easily switch between a cs and a reader/writer lock to test perf differences.
codeguru.com
Copyright Internet.com Inc., All Rights Reserved.