Click to See Complete Forum and Search --> : Multithreaded app slower then single on Linux - normal perf. on Windos


sakurazuka
November 6th, 2009, 05:29 PM
Hello there!
I have this little app of mine that does some havy calculations and so I decided to split them into threads (because of long, triple loops which are easily parallelized). Anyway the result on my 2-core notebook pc running windows is normal - the work time of the app becomes 2 times shorter (2 threads). On the other hand the same app on this pc but running Linux (Ubuntu) crashes with segmentation fault. Another Unix (dunno exactly) machine with 8-opteron cores produces fastest results when only 1 thread is set and it becomes slower with the increase of thread number and I really have no idea why such a thing occurs? So I am asking for guidance in this matter :D.
Oh here are my compiler settings:
Unix:
g++ -fopenmp -c -Wall -O2 main.cpp custom.h
g++ -fopenmp -O2 qdots -o main.o -lgsl -lgslcblas

Under windows I add openMP support and /O2 optimization in project properties in MS VS. I supose the -fopenmp and -O2 directives are redundant under unix and It would be sufficient to state those options once - but I guess that can't harm anything, or can it?
Anyway - thanks!
Sakurazuka

sakurazuka
November 7th, 2009, 11:13 AM
Ok - so maybe I'll add some code to my previous post. This is the only part that's parallel:
#pragma omp parallel for private(i) shared(g_iQdots, mu, y) schedule(static)
for(i = 0; i < g_iQdots; i++){
iQ = i*g_iQdots;
for(int j = 0; j < g_iQdots; j++){
f[iQ+j] = (g_qdot[j].energy - g_qdot[i].energy)*y[iQ+j + g_iQQ]; //Re(x_ab)
f[iQ+j + g_iQQ] = (g_qdot[i].energy - g_qdot[j].energy)*y[iQ+j]; //Im(x_ab)
for(int k = 0; k < g_iQdots; k++){
kQ = k*g_iQdots;
f[iQ+j] -= mu.d_omega[k][i]*y[kQ+j+g_iQQ]; //f[i*g_iQdots+j] -= mu.d_omega[k][i]*y[k*g_iQdots+j+g_iQQ]; <-- was like that before
f[iQ+j] += mu.d_omega[j][k]*y[iQ+k+g_iQQ];
f[iQ+j] -= 0.5*(mu.d_gamma[k][i]*y[kQ+j]+mu.d_gamma[j][k]*y[iQ+k]);
f[iQ+j+g_iQQ] += mu.d_omega[k][i]*y[kQ+j];
f[iQ+j+g_iQQ] -= mu.d_omega[j][k]*y[iQ+k];
f[iQ+j+g_iQQ] -= 0.5*(mu.d_gamma[k][i]*y[kQ+j+g_iQQ]+mu.d_gamma[j][k]*y[iQ+k+g_iQQ]);
}
}
}
- maybe there's something wrong with that? I just wanted to split the iterations over 'i' among different threads - am I doing it right? Thanks

JVene
November 11th, 2009, 12:48 PM
It's been some time since I've worked on this problem, but I'm vaguely familiar. Sorry I can't provide a solution, but I may be able to help by pointing you toward the problem.

I recall this being mentioned before, and my own work demonstrated this effect: in Linux, threading will sometimes serialize where for loops (perhaps other loops) drive an algorithm.

Not all implementations do this, but I have no data on determining which is which. My own targets didn't focus on Linux, so I never delved further to find the solution. It doesn't happen in Windows.

A simple test shows the problem. As an example, on a quad core (it's more convincing, but this works on single CPU machines too), create 4 threads (or more) that use a for loop to count to say several million integers and print every nth (every 10,000th or something similar), including a value indicating which thread is printing, as in "(1) 10,000"...

In Windows you'd expect to see the results intermixed, as in:

(1) 10,000
(2) 10,000
(4) 10,000
(1) 20,000
(3) 10,000
.....

Something "like" that....

In Linux, I'd always see....

(1) 10,000
(1) 20,000
(1) 30,000
.....
(1) 10,000,000
(3) 10,000
(3) 20,000
(3) 30,000
...

And so on...in other words, the threads seemed to operate sequentially, not in parallel.

I sensed there is a way to specify the thread parameters to correct for this effect, but I don't know if that's a compilation parameter or a launch parameter, or an OS parameter. The only descriptive explanation I ever got on this question was a vague indication that this behavior was a kind of optimization concept for Linux, but I never understood more about it - never bothered.

It would be helpful to the community if, should you ever find the answer to this issue, to let us know. I wish I had more to offer....