Click to See Complete Forum and Search --> : general question on threading speed


abcdefgqwerty
May 30th, 2007, 09:44 AM
Lets say you have a single CPU. Then you fork off another thread and have 2 threads both executing code. Each of those 2 threads gets a time slice in a round robin fashion right? So a single thread is really just as fast as 2 threads on a single CPU computer. With 2 threads do you get more CPU time total then because you have 2 threads getting time slices as opposed to just 1?
Can someone maybe explain the advantage speed wise on a single CPU machine in using multiple threads?

wildfrog
May 30th, 2007, 10:43 AM
The short answer is that you'll loose speed when using multiple threads on a single cpu/core machine. Some of the 'speed' is lost to context switching/switching among the threads, some is (usually) lost to synchronization among the threads...

- petter

MikeAThon
May 30th, 2007, 12:19 PM
Each of those 2 threads gets a time slice in a round robin fashion right?
Not exactly. In Windows, threads are scheduled based on a priority level that's determined dynamically by the thread scheduler. But conceptually, it's not bad to think about the scheduler as a type of round robin process.

With that in mind, your process can theoretically get a larger proportion of CPU time with more threads, relative to all processes running on the machine. For example, say that the machine has 5 CPU-bound processes executing concurrently, including your process, and each process has a single thread with the same base priority. Roughly speaking, your process will get 1/5 = 20% of the CPU. If your process spawns 5 threads, and the other 4 processes are kept to a single thread, then your process will get around 5/9 = 55% of the CPU.

This all assumes that your process is CPU-bound, and that it contends for CPU time with other CPU-bound processes. In the real world, that doesn't happen often. Usually, your process will be the only CPU-bound process running on the machine. In this situation, the scheduler does a pretty good job of giving your process almost 100% of the CPU, without any effort from you, even if your process has only a single thread. In this real-world situation, Petter's comment is absolutely correct, that on a single-processor machine multiple threads will slow you down due to context switches and synchronization between threads.

Mike

JVene
May 30th, 2007, 02:54 PM
Responses are all on point, I just wanted to add.....

There is a little gain on hyperthread CPU's (Intel P4). All modern x86 CPU's (since Pentium Pro) have multiple execution units. In some ways these 'look' a little like multiple cores, but they're not. They do the general work that we recognize as the assembler (or machine, if you prefer) instructions of the x86 language (colloquially).

Some designs only have 2 execution units, others have 3. I forget how many were in the HT P4 series, but on occasion the processor 'recognizes' it can't stuff all of the available general execution units with work, and has the ability to give a second thread some attention with the spare unit.

This gives the illusion of a dual core CPU (it's not, not even close, really). Still, there is some gain with that over a non HT CPU's (sometimes only 10%, occasionally as much as, I'm guessing, 30% - under ideal conditions).

Otherwise, this CPU qualifies as a single core device (and, HT can be turned off - so it might not even apply).

HT was also known to 'slow down' things a little, under certain conditions, so it's not really popular.

Your question hinges on another question which is a natural tangent - when and why thread, especially on the substantial population of single core machines in the market?

One reason to thread is GUI responsiveness. Any task which might take more than a very short time (some few ms, perhaps - it's a judgment call relative to the speed of the CPU, how busy it is already, etc) can be handed to a thread, such that the GUI is available to respond to user activity. This doesn't give faster performance on single core machines, but keeps the UI from being blocked by longer running processes. This has benefit on single core machines, and even more on multiple core machines.

Another reason to thread is parallel performance gain, but as you already have surmised, that can't achieve gain on single core machines (with the potential exception of minor gains on HT processors).

There is one more reason to thread - simulation design.

It can actually be helpful, on some occasions (not that often, I'll agree) to use threads to simulate simultaneous activity. This might be as simple as related to my first point - as in background printing, faxing, etc...

However, it can be that a process or algorithm naturally breaks down into several stages of operation, each completing at different rates, some interdependent and others not interdependent. Imagine, for example, a manufacturing plant where they make various products. Some products run through dozens of processes, were each product stops and nearly every station for some work.

Other products require only a few steps, and are only involved in a few processes.

It can be useful to create simulate this stream of activity as a set of threads, which breaks the problem apart into components (by thread) representing the various stations of the manufacturing plant. This isn't so much for the sake of performance gain (though on multiple core machines there certainly could be gain), but to aid in the modeling of the simulation.

abcdefgqwerty
May 31st, 2007, 08:29 AM
Ah ok thanks guys. I was just confused on when and why to thread on a single CPU and what the performance gains on a single CPU really are. I do occasionally use a thread or two for UI but try to avoid multithreading just out of complexity. So now with all this multicore stuff its going to be more important to multithread, but people with single CPUs will lose performance unless the program can detect this and change itself to use only 1 thread.

JVene
May 31st, 2007, 12:48 PM
So now with all this multicore stuff its going to be more important to multithread, but people with single CPUs will lose performance unless the program can detect this and change itself to use only 1 thread.


That's a key caveat. If your design purpose is not performance (like when tossing off material to relieve the GUI thread to respond to the user), you don't have a need to check, but for performance work (parallel execution of code to gain speed), you should definitely select an approach for the consumer's machine.

I'm not fond of DLL's myself. The primary reason I use them is to provide 'plug in' behavior, but even then if it imposes one minor hassle without much gain in executable RAM footprint, I don't bother. However, you could consider a DLL module as your means of switching between threaded and non-threaded versions, or you could rely on objects to mutate according to configuration.

TheCPUWizard
May 31st, 2007, 01:46 PM
It really comes down to some key decisions and specific knowledge of the application and usage domains.

Keeping the GUI responsive by not doing "work" in the GUI thread is almost always a good idea.

Multi-Threading of task which use multiple resources is also generally a good idea. Assume you have a task that is CPU bound 33% of the time, DISK bound 33% of the time, and Network bound 33% of the time.

Writing this as a single thread (no overlapped I/O etc), will take significantly longer than splitting this up into three threads (the nature of the division would depend on details beyond the scope of this post). Utilizing more threads (on a single core machine) may or may not provide additional benefit. Using more threads for the I/O, even on a multi-core machine may not provide more performance.

Writing multiple threads for pure CPU bound tasks (neglecting issues of concurrent simulation, ease of implementation, etc), will take more time if the number of CPU bound threads exceeds the number of processor cores. But in many cases this difference is almost imperceptable.

Assuming the memory working set is small enough that the data for all the threads fit's in the L1 cache (and I would say that for CPU bound operations L1/L2 cache management is probably the biggest performance issue), the state switch time can be as low as 10uS, with a timeslice on the order of 10mS. This means that a state switch "wastes" about 0.1% of the time.

Assume you have a task that is running on an otherwise 100% idle machine, and is single threaded 100% CPU bound for 1 hour. You divide the work up to 4 threads. For the single core machines this slows down the process by 3.6 seconds. Few users will notice that difference. But the same code on a dual or quad core box will run in 30 to 15 minutes respectively.

Is it really worth maintaining multiple code bases?????

Reminder: This is assuming 100% CPU bound. This is very very very rarely the case for true long running computationally intensive processes. Nearly all of them are working on more data than can fit in the L1 (or even L2) cache. The more cache misses (requiring access to the much slower main RAM) the bigger the impact. If you have to access the disk or other storage, this goes up even more.

ps: When MSFT released SQL-2005, it was discovered that it ran significantly (10-100 times) slower on certain machines than the equivilant SQL-2000. This was (fortunately) a fairly rare occurance, and only under (what appeared to be) sporatic and unrelated situations. It took the product team some time to realize that the effect was caused by a slight change in some of the internal "tight loops" of the Engine. These loops had grown slightly in size between the versions, but it was enough that the SQL-2000 version fit in the L!/L2 cache of a specific group of (older) Pentium processors, while the SQL-2005 version needed "just a few more bytes", but caused exteremely large fault counts to main ram....

Arjay
May 31st, 2007, 07:41 PM
Ah ok thanks guys. I was just confused on when and why to thread on a single CPU and what the performance gains on a single CPU really are. I do occasionally use a thread or two for UI but try to avoid multithreading just out of complexity.
Depending on the work, often it is easier to solve a problem using multiple threads than doing it all from a single thread.

To give an example, consider a program that needs to perform two independent tasks such as monitor a folder for incoming files and also retrieve data from a serial port. Say the incoming files need to be processed and stored in the db and the serial port data needs to be saved to a file. While you could do this in a single threaded app, it would be much easier to do theses tasks in two separate threads (and probably have a third thread control the other two).

I would estimate the performance of the multithreaded version an a single proc machine to be similar to that of a single threaded version.

But the code for the multithreaded app would be much simpler because, unlike in the single threaded version, you wouldn't need to figure out how to interrupt the file process so you could read the serial port. The multithreaded design would be cleaner and more maintainable.

Plus the multi-threaded code would take advantage of additional processors when run on a multicore/multiproc machine.


So now with all this multicore stuff its going to be more important to multithread, but people with single CPUs will lose performance unless the program can detect this and change itself to use only 1 thread.Unless the work is extremely CPU intensive, as others have mentioned, you aren't going to see much of a performance loss. It's probably not worth the effort to special case it. Plus, unless it's a hard requirement, why code for older, soon to be obsolete machines?

TheCPUWizard
May 31st, 2007, 07:44 PM
Depending on the work, often it is easier to solve a problem using multiple threads than doing it all from a single thread.

To give an example, consider a program that needs to perform two independent tasks such as monitor a folder for incoming files and also retrieve data from a serial port. Say the incoming files need to be processed and stored in the db and the serial port data needs to be saved to a file. While you could do this in a single threaded app, it would be much easier to do theses tasks in two separate threads (and probably have a third thread control the other two).

I would estimate the performance of the multithreaded version an a single proc machine to be similar to that of a single threaded version.

But the code for the multithreaded app would be much simpler because, unlike in the single threaded version, you wouldn't need to figure out how to interrupt the file process so you could read the serial port. The multithreaded design would be cleaner and more maintainable.

Plus the multi-threaded code would take advantage of additional processors when run on a multicore/multiproc machine.

Unless the work is extremely CPU intensive, as others have mentioned, you aren't going to see much of a performance loss. It's probably not worth the effort to special case it. Plus, unless it's a hard requirement, why code for older, soon to be obsolete machines?


Of course for this specific example I/O completion ports might be an even better solution (and probably best when used in conjuction with threads). :D