Click to See Complete Forum and Search --> : Help advance science by making this code faster


diehardii
November 12th, 2007, 07:01 PM
Hi everyone,

I'm a chemist who is writing code that is used to calculate the reflectivity of x-rays from films at the air-water interface. This has uses in fields such as materials science, nanoscience, and biology. I personally use it to study a peptide implicated in Alzheimer's disease. Now, this calculation is relatively fast, until you realize it has to be performed roughly 10 million times. I have threaded it with OpenMP, and I use the Intel compiler (which is ~2x faster than MSVC), but I'm not a professional. If anyone is looking for an interesting project and is interested in contributing to science (by making the calculation run faster) I would greatly appreciate it.

I've attached a console program which roughly benchmarks the calculation. The tunable areas are in mycomplex.h and multilayer.cpp. The vast majority of the calculation takes place in myrf (or transparentmyrf), and mkdensity (or mkdensitytrans).

It can be downloaded from here:

http://diehard2.googlepages.com/reflcode

Even a 10% speed increase would be enormously helpful. I've identified all of the places I can make the variables floats without skewing the calculation when the numbers are very small, and I've moved all calculations that are repetitive outside of major loops (I think). However, I have no concept of cache misses, and things of that nature. Also, since this really isn't my field, I don't know what tricks people use to speed up calculations.

If you can improve the speed, I will be more than happy to acknowledge your contribution to the algorithm. Thanks for any help.


~ Steve

Bluefox815
September 14th, 2008, 03:12 PM
If you want to optimize, a good choice would probably be the book "The Art of Assembly". It teaches you how to write optimized assembly code. I know you are doing C++, but the concepts in the book (such as cache misses) should help you in any language (I think Java might be an exception as it uses its own runtime).

I was able to find an HTML copy of the book on the internet, but I lost the site's address.

TIP:

Decreasing the amount of code between two uses of one variable, two calls to a function, etc. increases the chance of that function/variable being in the CPU's cache and making access to it quicker.

example (I don't expect anyone to actually have code like this, the user could easily make x equal to x+y by adding it in their head)


int x, y, z, a, b, c;

void example() {
x = 5;
y = 5;
a = 4;
b = 6;
c = 12;
z = 4;
x = y + x;
}

// NOTE: I may be incorrect, because as this works perfectly for assembly when you have full control over instructions,
// I cannot tell how C++ affects/calls cpu instructions, so I may be entirely wrong, but maybe you can try it and clock your program to test it?
// If it works then you have a faster calculator
// ALSO, assembly is much faster than C++, but I don't know of a way to implement them together. Just might be something to look in to.

// this function (example2()) should be faster as the references to x and y are closer together.
// the other references (z, a, etc.) cause their values to be placed into the cache and possibly overwrite x and y
// requiring the cpu fetch their values all the way from memory instead of the much faster cache
// you can probably learn a lot more from the book I mentioned

void example2() {
x = 5;
y = 5;
x = y + x;
a = 4;
b = 6;
c = 12;
z = 4;
}



If my view of cache misses are correct, you might do this to your code (my comments have astrisks to make them different from yours):

int _tmain(int argc, _TCHAR* argv[])
{
//Setup parameters that affect the reflectivity calc speed
BOOL m_busesurfabs = FALSE;
double m_dresolution = 5;
double m_dtotallength = 95;
double m_dleftoffset = 35;
double m_dlayerlength = 25;
int m_iparratlayers = 60;
// int totalnumberofcalc = 2000; // this statement is moved ***

//Setup the reflectivity calc
multilayer ml0;

// break in code ***

//Warm up
ml0.mkdensity(&genome);
ml0.myrf();

t_on = clock();

int totalnumberofcalc = 2000; // statement moved to here ***

// *** note: I checked to make sure totalnumberofcalc was not referenced above new initialization point

for(int i = 0; i< totalnumberofcalc ;i++)
{
ml0.mkdensity(&genome);
ml0.myrf();
}

// ... code continues ***


I hope all this helps you. If it doesn't help you directly, I hope I gave you a few ideas.