Rotate a Bitmap at Any Angle Without GetPixel/SetPixel

Environment: Windows GDI / Win32 API

Reason For This Article

There is already an article on CodeGuru about rotating a bitmap at any angle. Unfortunately, it is very slow and produces some artifacts when rotating. The comments have focused on optimizing the floating point operations, but that is not the issue. The real issue is that GetPixel and SetPixel are abysmally slow. So, I wrote this program to demonstrate that it's not the fault of the floating point calculations.

Using GetDIBits

This code is much faster because it uses GetDIBits to get the 32-bit representation of the bitmap to rotate. All operations are done in local memory rather than in slow API calls such as GetPixel or even BitBlt. I use the 32-bit representation mainly for its ease of use. When working with other color depths, you have to add some padding bytes at the end of each scan line, which is not necessary in 32 bits. Moreover, it allows easier access to the memory.

pBGR MyGetDibBits(HDC hdcSrc, HBITMAP hBmpSrc, int nx, int ny)
{
  BITMAPINFO bi;
  BOOL bRes;
  pBGR buf;

  bi.bmiHeader.biSize = sizeof(bi.bmiHeader);
  bi.bmiHeader.biWidth = nx;
  bi.bmiHeader.biHeight = - ny;
  bi.bmiHeader.biPlanes = 1;
  bi.bmiHeader.biBitCount = 32;
  bi.bmiHeader.biCompression = BI_RGB;
  bi.bmiHeader.biSizeImage = nx * 4 * ny;
  bi.bmiHeader.biClrUsed = 0;
  bi.bmiHeader.biClrImportant = 0;

  buf = (pBGR) malloc(nx * 4 * ny);
  bRes = GetDIBits(hdcSrc, hBmpSrc, 0, ny, buf, &bi,
                   DIB_RGB_COLORS);
  if (!bRes) {
    free(buf);
    buf = 0;
  }
  return buf;
}

RotateMemoryDC

This function does all the work. Apart from getting the source coordinate from a target coordinate, it is very straightforward. The formulas I use are these:

orgX = (cA * (((float) stepX + OfX) + CtX * (cA - 1)) + sA * (((float) stepY + OfY) + CtY * (sA - 1))) / cA*cA + sA*sA;

orgY = CtY + (CtX - ((float) stepX + OfX)) * sA + cA *(((float) stepY + OfY) - CtY + (CtY - CtX) * sA);

cA is just cos(angle) and sA is sin(angle). CtX and CtY are the center of the source image and OfX and OfY are the offsets of the source image inside the target image. This is necessary because the target image has to be bigger than the source.

I actually make a point of using floating point operations in the inner loop to show that the performance does not depend on these. In the old days of DOS, when PCs were slow at floating point operations and graphics manipulation was fast because it consisted only of accessing a special part of the system memory, the bottleneck was indeed the floating point operations. With GDI, this is unfortunately not true anymore. Most of the time is taken up by getting the bitmaps and setting them.

Changing the calculations to fixed point math will only result in a marginal improvement in the overall speed.

General Notes

The program is based on the sample "Hello World" application generated by Visual C++ 6. I just added the new drawing code and a Timer so that the bitmap rotates continuously.

Downloads

Download demo project - 25 Kb
Download source - 10 Kb


Comments

  • Smooth rotate

    Posted by PaulO01 on 07/31/2009 01:24pm

    The integer arithmatic makes jagged lines.
    If you want to rotate an image you will want the rotation
    to be smooth. If we think of the calculated orgX and orgY
    as pointing "into" a pixel then we can use the fractions to
    obtain additional information from the surrounding pixels:
    
    if ((orgX >= 0) && (orgY >= 0) && (orgX < SrcX-1) && (orgY < SrcY-1)) {
      // Inside the source bitmap -> copy the bits
      sBGR *psrc, *pdst;
      pdst=  &(dst[stepY*DstX + DstX - stepX - 1]);
      psrc=  &(src[iorgX + iorgY * SrcX]);
    
      float r,g,b;
      r = (psrc)->r        * (1-(orgX-iorgX)) * (1-(orgY-iorgY))
        + (psrc+1)->r      * (   orgX-iorgX)  * (1-(orgY-iorgY))
        + (psrc+SrcX)->r   * (1-(orgX-iorgX)) * (   orgY-iorgY)
        + (psrc+SrcX+1)->r * (   orgX-iorgX)  * (   orgY-iorgY);
    
      g = (psrc)->g        * (1-(orgX-iorgX)) * (1-(orgY-iorgY))
        + (psrc+1)->g      * (   orgX-iorgX)  * (1-(orgY-iorgY))
        + (psrc+SrcX)->g   * (1-(orgX-iorgX)) * (   orgY-iorgY)
        + (psrc+SrcX+1)->g * (   orgX-iorgX)  * (   orgY-iorgY);
    
      b = (psrc)->b        * (1-(orgX-iorgX)) * (1-(orgY-iorgY))
        + (psrc+1)->b      * (   orgX-iorgX)  * (1-(orgY-iorgY))
        + (psrc+SrcX)->b   * (1-(orgX-iorgX)) * (   orgY-iorgY)
        + (psrc+SrcX+1)->b * (   orgX-iorgX)  * (   orgY-iorgY);
    
      pdst->r = (BYTE)(r+0.5);
      pdst->g = (BYTE)(g+0.5);
      pdst->b = (BYTE)(b+0.5);
    }

    Reply
  • Wrong formulas to rotate

    Posted by PaulO01 on 07/30/2009 02:50pm

    The correct formulas are:
    // Calculate the source coordinate
    orgX= ( cA*(stepX-CtX2) + sA*(stepY-CtY2)) / divisor  + CtX;
    orgY= (-sA*(stepX-CtX2) + cA*(stepY-CtY2)) / divisor  + CtY;
    
    With CtX2 and CtY2 the centre of the destination:
    CtX2= ((float) SrcX) / 2 - OfX;
    CtY2= ((float) SrcY) / 2 - OfY;

    • Simpler correct forumlas...

      Posted by avik on 01/02/2010 04:05am

      orgX = CtX + cA * ((float)stepX + OfX - CtX) + 
      sA * ((float)stepY + OfY - CtY);
      orgY = CtY - sA * ((float)stepX + OfX - CtX) + cA * 
      ((float) stepY + OfY - CtY);

      Reply
    Reply
  • Bug - Image Flipped Over Y-Axis

    Posted by ThundrbltMN on 01/06/2009 03:58pm

    I noticed that the original image is horizontally flipped. The following line: dstLine[dstX - stepX - 1] = src[iorgX + iorgY * SrcX]; should be: dstLine[stepX + 1] = src[iorgX + iorgY * SrcX];

    • correction to the correction

      Posted by avik on 01/02/2010 04:08am

      It should be: dstLine[stepX] = src[iorgX+iorY*SrcX] but also the "for" loop should be changed to: for (stepY = dstY-1; stepY >= 0; stepY--)

      Reply
    Reply
  • Very cool article!

    Posted by yura.kachanov on 05/16/2006 10:03pm

    min4 and max4 functions can be as follow:
    float min4(float a, float b, float c, float d)
    {
       float min = a;
       if(min > b) min = b;
       if(min > c) min = c;
       if(min > d) min = d;
       return min;
    }
    
    It is not for speed but it is more clearly and shortly.

    Reply
  • Bronken Link?

    Posted by ksujeet83 on 12/15/2005 01:18am

    I am not able to download the files....it says file not found. Can someone at codeguru please look into it? Regards, Sujeet

    Reply
  • A faster method to rotate the bitmap

    Posted by soica_laurentiu on 07/07/2005 05:22am

    How about this : 
    I have this bitmap :
    
    a-----b
    |     |
    |     |
    |     |
    c-----d
    
    I want to rotate it with 10 degree.
    
    i make an horizontal skew at 10 degree : 
    a-----b
     \     \
    ->\     \
       \     \
    --> c-----d
    
    i make a vertical skew at -10 degree : 
    
        /\b
       /  \
      /    \
    a/      \d
     \      /
      \    /
       \  /
       c\/
    and that's the bitmap rotated .
    
    we only have to calculate (worst case) the cos and sin once per every row and once per every column.
    I think is much faster.
    I will try it today.
    Good bye!

    Reply
  • More performance improvent

    Posted by Legacy on 10/17/2003 12:00am

    Originally posted by: Ahmed

    Although may not be directly related to the problem, but if we are looking for a performance improvement, there is a very critical performance penalty when we convert floats to int using compiler-default casts.


    Google search reveals this text for the reason why default conversion is slow.

    "
    I've had a look at the code generated by a cast which looks something like
    this:

    fldcw -12(%ebp)
    fistpl -16(%ebp)
    movl -16(%ebp),%eax
    fldcw -10(%ebp)

    The first and last instruction in this group modifies the FPU control word
    (specifically modifying the FPU rounding mode). It is this instruction which
    causes the pain as an FPU pipeline flush is required each time it is executed.

    Removing both instances of "fldcw" can result in significant execution speed
    increases. The downside is a slight loss in accuracy. The maximum absolute
    error between what is obtained with the cast and assembler code without the
    "fldcw" instructions is 1 which is perfectly acceptable for audio and
    video/graphics processing applications.

    "


    below is a simply routine that lets you cast from floats or doubles to int much more quickly at the cost of losing ansi conformance!

    Try it in the RotateDIB... code where flaots are converted to int and you may see performance increases of more than 100% (i.e, rendering time 25 ms goes down to 11 ms or so!)


    __inline long int
    lrint (double flt)
    {
    int intgr;

    _asm
    {
    fld flt
    fistp intgr
    } ;

    return intgr ;
    }


    __inline long int
    lrintf (float flt)
    {
    int intgr;

    _asm
    {
    fld flt
    fistp intgr
    } ;

    return intgr ;
    }


    • Slower

      Posted by Oliver M. on 06/08/2005 07:42am

      I just made a test with a trivial DIB grayscale loop. The original loop used a static_cast(float) which was replaced by your version of inline lrintf(float). The original version required an average of about 30ms while the version with your asm code required 50ms. Compiler is MSVC 6.0 with all service releases. How could this be?

      Reply
    Reply
  • good job~~~

    Posted by Legacy on 09/15/2003 12:00am

    Originally posted by: jscv

    I am very appreciated for your program..
    Thanks.... Have a good program time...~

    Reply
  • Nice job !!!

    Posted by Legacy on 08/08/2003 12:00am

    Originally posted by: niceguy

    Very Good !!

    Reply
  • Algorithm is too complicated

    Posted by Legacy on 05/09/2003 12:00am

    Originally posted by: Sefi

    Hi,
    the algorithm is just too complitcated!!
    Why dividing through cA*cA + sA*sA?
    I hope you know that cA� + sA� = 1 so you dont need to divide at all!

    If you solve the linear equations:
    nx = x*cos - y*sin
    ny = x*sin + y*cos

    you come to the conclution that

    x = nx * cos + ny * sin
    y = ny * cos - nx * sin

    its just as simple as that!
    This should speed up the algorithm very much :)

    Reply
  • Loading, Please Wait ...

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: October 29, 2014 @ 11:00 a.m. ET / 8:00 a.m. PT Are you interested in building a cognitive application using the power of IBM Watson? Need a platform that provides speed and ease for rapidly deploying this application? Join Chris Madison, Watson Solution Architect, as he walks through the process of building a Watson powered application on IBM Bluemix. Chris will talk about the new Watson Services just released on IBM bluemix, but more importantly he will do a step by step cognitive …

  • Managing your company's financials is the backbone of your business and is vital to the long-term health and viability of your company. To continue applying the necessary financial rigor to support rapid growth, the accounting department needs the right tools to most efficiently do their job. Read this white paper to understand the 10 essentials of a complete financial management system and how the right solution can help you keep up with the rapidly changing business world.

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds