Rotate a Bitmap at Any Angle Without GetPixel/SetPixel
Environment: Windows GDI / Win32 API
Reason For This Article
There is already an article on CodeGuru about rotating a bitmap at any angle. Unfortunately, it is very slow and produces some artifacts when rotating. The comments have focused on optimizing the floating point operations, but that is not the issue. The real issue is that GetPixel and SetPixel are abysmally slow. So, I wrote this program to demonstrate that it's not the fault of the floating point calculations.
Using GetDIBits
This code is much faster because it uses GetDIBits to get the 32-bit representation of the bitmap to rotate. All operations are done in local memory rather than in slow API calls such as GetPixel or even BitBlt. I use the 32-bit representation mainly for its ease of use. When working with other color depths, you have to add some padding bytes at the end of each scan line, which is not necessary in 32 bits. Moreover, it allows easier access to the memory.
pBGR MyGetDibBits(HDC hdcSrc, HBITMAP hBmpSrc, int nx, int ny)
{
BITMAPINFO bi;
BOOL bRes;
pBGR buf;
bi.bmiHeader.biSize = sizeof(bi.bmiHeader);
bi.bmiHeader.biWidth = nx;
bi.bmiHeader.biHeight = - ny;
bi.bmiHeader.biPlanes = 1;
bi.bmiHeader.biBitCount = 32;
bi.bmiHeader.biCompression = BI_RGB;
bi.bmiHeader.biSizeImage = nx * 4 * ny;
bi.bmiHeader.biClrUsed = 0;
bi.bmiHeader.biClrImportant = 0;
buf = (pBGR) malloc(nx * 4 * ny);
bRes = GetDIBits(hdcSrc, hBmpSrc, 0, ny, buf, &bi,
DIB_RGB_COLORS);
if (!bRes) {
free(buf);
buf = 0;
}
return buf;
}
RotateMemoryDC
This function does all the work. Apart from getting the source coordinate from a target coordinate, it is very straightforward. The formulas I use are these:
orgX = (cA * (((float) stepX + OfX) + CtX * (cA - 1)) + sA * (((float) stepY + OfY) + CtY * (sA - 1))) / cA*cA + sA*sA;orgY = CtY + (CtX - ((float) stepX + OfX)) * sA + cA *(((float) stepY + OfY) - CtY + (CtY - CtX) * sA);
cA is just cos(angle) and sA is sin(angle). CtX and CtY are the center of the source image and OfX and OfY are the offsets of the source image inside the target image. This is necessary because the target image has to be bigger than the source.
I actually make a point of using floating point operations in the inner loop to show that the performance does not depend on these. In the old days of DOS, when PCs were slow at floating point operations and graphics manipulation was fast because it consisted only of accessing a special part of the system memory, the bottleneck was indeed the floating point operations. With GDI, this is unfortunately not true anymore. Most of the time is taken up by getting the bitmaps and setting them.
Changing the calculations to fixed point math will only result in a marginal improvement in the overall speed.
General Notes
The program is based on the sample "Hello World" application generated by Visual C++ 6. I just added the new drawing code and a Timer so that the bitmap rotates continuously.
Downloads
Download demo project - 25 KbDownload source - 10 Kb

Comments
Smooth rotate
Posted by PaulO01 on 07/31/2009 01:24pmThe integer arithmatic makes jagged lines. If you want to rotate an image you will want the rotation to be smooth. If we think of the calculated orgX and orgY as pointing "into" a pixel then we can use the fractions to obtain additional information from the surrounding pixels: if ((orgX >= 0) && (orgY >= 0) && (orgX < SrcX-1) && (orgY < SrcY-1)) { // Inside the source bitmap -> copy the bits sBGR *psrc, *pdst; pdst= &(dst[stepY*DstX + DstX - stepX - 1]); psrc= &(src[iorgX + iorgY * SrcX]); float r,g,b; r = (psrc)->r * (1-(orgX-iorgX)) * (1-(orgY-iorgY)) + (psrc+1)->r * ( orgX-iorgX) * (1-(orgY-iorgY)) + (psrc+SrcX)->r * (1-(orgX-iorgX)) * ( orgY-iorgY) + (psrc+SrcX+1)->r * ( orgX-iorgX) * ( orgY-iorgY); g = (psrc)->g * (1-(orgX-iorgX)) * (1-(orgY-iorgY)) + (psrc+1)->g * ( orgX-iorgX) * (1-(orgY-iorgY)) + (psrc+SrcX)->g * (1-(orgX-iorgX)) * ( orgY-iorgY) + (psrc+SrcX+1)->g * ( orgX-iorgX) * ( orgY-iorgY); b = (psrc)->b * (1-(orgX-iorgX)) * (1-(orgY-iorgY)) + (psrc+1)->b * ( orgX-iorgX) * (1-(orgY-iorgY)) + (psrc+SrcX)->b * (1-(orgX-iorgX)) * ( orgY-iorgY) + (psrc+SrcX+1)->b * ( orgX-iorgX) * ( orgY-iorgY); pdst->r = (BYTE)(r+0.5); pdst->g = (BYTE)(g+0.5); pdst->b = (BYTE)(b+0.5); }ReplyWrong formulas to rotate
Posted by PaulO01 on 07/30/2009 02:50pm-
Reply
ReplySimpler correct forumlas...
Posted by avik on 01/02/2010 04:05amBug - Image Flipped Over Y-Axis
Posted by ThundrbltMN on 01/06/2009 03:58pmI noticed that the original image is horizontally flipped. The following line: dstLine[dstX - stepX - 1] = src[iorgX + iorgY * SrcX]; should be: dstLine[stepX + 1] = src[iorgX + iorgY * SrcX];
-
Replycorrection to the correction
Posted by avik on 01/02/2010 04:08amIt should be: dstLine[stepX] = src[iorgX+iorY*SrcX] but also the "for" loop should be changed to: for (stepY = dstY-1; stepY >= 0; stepY--)
ReplyVery cool article!
Posted by yura.kachanov on 05/16/2006 10:03pmmin4 and max4 functions can be as follow: float min4(float a, float b, float c, float d) { float min = a; if(min > b) min = b; if(min > c) min = c; if(min > d) min = d; return min; } It is not for speed but it is more clearly and shortly.ReplyBronken Link?
Posted by ksujeet83 on 12/15/2005 01:18amI am not able to download the files....it says file not found. Can someone at codeguru please look into it? Regards, Sujeet
ReplyA faster method to rotate the bitmap
Posted by soica_laurentiu on 07/07/2005 05:22amHow about this : I have this bitmap : a-----b | | | | | | c-----d I want to rotate it with 10 degree. i make an horizontal skew at 10 degree : a-----b \ \ ->\ \ \ \ --> c-----d i make a vertical skew at -10 degree : /\b / \ / \ a/ \d \ / \ / \ / c\/ and that's the bitmap rotated . we only have to calculate (worst case) the cos and sin once per every row and once per every column. I think is much faster. I will try it today. Good bye!ReplyMore performance improvent
Posted by Legacy on 10/17/2003 12:00amOriginally posted by: Ahmed
Although may not be directly related to the problem, but if we are looking for a performance improvement, there is a very critical performance penalty when we convert floats to int using compiler-default casts.
Google search reveals this text for the reason why default conversion is slow.
"
I've had a look at the code generated by a cast which looks something like
this:
fldcw -12(%ebp)
fistpl -16(%ebp)
movl -16(%ebp),%eax
fldcw -10(%ebp)
The first and last instruction in this group modifies the FPU control word
(specifically modifying the FPU rounding mode). It is this instruction which
causes the pain as an FPU pipeline flush is required each time it is executed.
Removing both instances of "fldcw" can result in significant execution speed
increases. The downside is a slight loss in accuracy. The maximum absolute
error between what is obtained with the cast and assembler code without the
"fldcw" instructions is 1 which is perfectly acceptable for audio and
video/graphics processing applications.
"
below is a simply routine that lets you cast from floats or doubles to int much more quickly at the cost of losing ansi conformance!
Try it in the RotateDIB... code where flaots are converted to int and you may see performance increases of more than 100% (i.e, rendering time 25 ms goes down to 11 ms or so!)
__inline long int
lrint (double flt)
{
int intgr;
_asm
{
fld flt
fistp intgr
} ;
return intgr ;
}
__inline long int
lrintf (float flt)
{
int intgr;
_asm
{
fld flt
fistp intgr
} ;
return intgr ;
}
-
ReplySlower
Posted by Oliver M. on 06/08/2005 07:42amI just made a test with a trivial DIB grayscale loop. The original loop used a static_cast(float) which was replaced by your version of inline lrintf(float).
The original version required an average of about 30ms while the version with your asm code required 50ms.
Compiler is MSVC 6.0 with all service releases. How could this be?
Replygood job~~~
Posted by Legacy on 09/15/2003 12:00amOriginally posted by: jscv
I am very appreciated for your program..
ReplyThanks.... Have a good program time...~
Nice job !!!
Posted by Legacy on 08/08/2003 12:00amOriginally posted by: niceguy
Very Good !!
ReplyAlgorithm is too complicated
Posted by Legacy on 05/09/2003 12:00amOriginally posted by: Sefi
Hi,
the algorithm is just too complitcated!!
Why dividing through cA*cA + sA*sA?
I hope you know that cA� + sA� = 1 so you dont need to divide at all!
If you solve the linear equations:
nx = x*cos - y*sin
ny = x*sin + y*cos
you come to the conclution that
x = nx * cos + ny * sin
y = ny * cos - nx * sin
its just as simple as that!
ReplyThis should speed up the algorithm very much :)
Loading, Please Wait ...