• #### Smooth rotate

Posted by PaulO01 on 07/31/2009 01:24pm

```The integer arithmatic makes jagged lines.
If you want to rotate an image you will want the rotation
to be smooth. If we think of the calculated orgX and orgY
as pointing "into" a pixel then we can use the fractions to
obtain additional information from the surrounding pixels:

if ((orgX >= 0) && (orgY >= 0) && (orgX < SrcX-1) && (orgY < SrcY-1)) {
// Inside the source bitmap -> copy the bits
sBGR *psrc, *pdst;
pdst=  &(dst[stepY*DstX + DstX - stepX - 1]);
psrc=  &(src[iorgX + iorgY * SrcX]);

float r,g,b;
r = (psrc)->r        * (1-(orgX-iorgX)) * (1-(orgY-iorgY))
+ (psrc+1)->r      * (   orgX-iorgX)  * (1-(orgY-iorgY))
+ (psrc+SrcX)->r   * (1-(orgX-iorgX)) * (   orgY-iorgY)
+ (psrc+SrcX+1)->r * (   orgX-iorgX)  * (   orgY-iorgY);

g = (psrc)->g        * (1-(orgX-iorgX)) * (1-(orgY-iorgY))
+ (psrc+1)->g      * (   orgX-iorgX)  * (1-(orgY-iorgY))
+ (psrc+SrcX)->g   * (1-(orgX-iorgX)) * (   orgY-iorgY)
+ (psrc+SrcX+1)->g * (   orgX-iorgX)  * (   orgY-iorgY);

b = (psrc)->b        * (1-(orgX-iorgX)) * (1-(orgY-iorgY))
+ (psrc+1)->b      * (   orgX-iorgX)  * (1-(orgY-iorgY))
+ (psrc+SrcX)->b   * (1-(orgX-iorgX)) * (   orgY-iorgY)
+ (psrc+SrcX+1)->b * (   orgX-iorgX)  * (   orgY-iorgY);

pdst->r = (BYTE)(r+0.5);
pdst->g = (BYTE)(g+0.5);
pdst->b = (BYTE)(b+0.5);
}```

• #### Wrong formulas to rotate

Posted by PaulO01 on 07/30/2009 02:50pm

```The correct formulas are:
// Calculate the source coordinate
orgX= ( cA*(stepX-CtX2) + sA*(stepY-CtY2)) / divisor  + CtX;
orgY= (-sA*(stepX-CtX2) + cA*(stepY-CtY2)) / divisor  + CtY;

With CtX2 and CtY2 the centre of the destination:
CtX2= ((float) SrcX) / 2 - OfX;
CtY2= ((float) SrcY) / 2 - OfY;```

• #### Simpler correct forumlas...

Posted by avik on 01/02/2010 04:05am

```orgX = CtX + cA * ((float)stepX + OfX - CtX) +
sA * ((float)stepY + OfY - CtY);
orgY = CtY - sA * ((float)stepX + OfX - CtX) + cA *
((float) stepY + OfY - CtY);```

• #### Bug - Image Flipped Over Y-Axis

Posted by ThundrbltMN on 01/06/2009 03:58pm

I noticed that the original image is horizontally flipped. The following line: dstLine[dstX - stepX - 1] = src[iorgX + iorgY * SrcX]; should be: dstLine[stepX + 1] = src[iorgX + iorgY * SrcX];

• #### correction to the correction

Posted by avik on 01/02/2010 04:08am

It should be: dstLine[stepX] = src[iorgX+iorY*SrcX] but also the "for" loop should be changed to: for (stepY = dstY-1; stepY >= 0; stepY--)

• #### Very cool article!

Posted by yura.kachanov on 05/16/2006 10:03pm

```min4 and max4 functions can be as follow:
float min4(float a, float b, float c, float d)
{
float min = a;
if(min > b) min = b;
if(min > c) min = c;
if(min > d) min = d;
return min;
}

It is not for speed but it is more clearly and shortly.```

Posted by ksujeet83 on 12/15/2005 01:18am

• #### A faster method to rotate the bitmap

Posted by soica_laurentiu on 07/07/2005 05:22am

```How about this :
I have this bitmap :

a-----b
|     |
|     |
|     |
c-----d

I want to rotate it with 10 degree.

i make an horizontal skew at 10 degree :
a-----b
\     \
->\     \
\     \
--> c-----d

i make a vertical skew at -10 degree :

/\b
/  \
/    \
a/      \d
\      /
\    /
\  /
c\/
and that's the bitmap rotated .

we only have to calculate (worst case) the cos and sin once per every row and once per every column.
I think is much faster.
I will try it today.
Good bye!```

• #### More performance improvent

Posted by Legacy on 10/17/2003 12:00am

Originally posted by: Ahmed

Although may not be directly related to the problem, but if we are looking for a performance improvement, there is a very critical performance penalty when we convert floats to int using compiler-default casts.

Google search reveals this text for the reason why default conversion is slow.

"
I've had a look at the code generated by a cast which looks something like
this:

fldcw -12(%ebp)
fistpl -16(%ebp)
movl -16(%ebp),%eax
fldcw -10(%ebp)

The first and last instruction in this group modifies the FPU control word
(specifically modifying the FPU rounding mode). It is this instruction which
causes the pain as an FPU pipeline flush is required each time it is executed.

Removing both instances of "fldcw" can result in significant execution speed
increases. The downside is a slight loss in accuracy. The maximum absolute
error between what is obtained with the cast and assembler code without the
"fldcw" instructions is 1 which is perfectly acceptable for audio and
video/graphics processing applications.

"

below is a simply routine that lets you cast from floats or doubles to int much more quickly at the cost of losing ansi conformance!

Try it in the RotateDIB... code where flaots are converted to int and you may see performance increases of more than 100% (i.e, rendering time 25 ms goes down to 11 ms or so!)

__inline long int
lrint (double flt)
{
int intgr;

_asm
{
fld flt
fistp intgr
} ;

return intgr ;
}

__inline long int
lrintf (float flt)
{
int intgr;

_asm
{
fld flt
fistp intgr
} ;

return intgr ;
}

• #### Slower

Posted by Oliver M. on 06/08/2005 07:42am

I just made a test with a trivial DIB grayscale loop. The original loop used a static_cast(float) which was replaced by your version of inline lrintf(float). The original version required an average of about 30ms while the version with your asm code required 50ms. Compiler is MSVC 6.0 with all service releases. How could this be?

• #### good job~~~

Posted by Legacy on 09/15/2003 12:00am

Originally posted by: jscv

I am very appreciated for your program..
Thanks.... Have a good program time...~

• #### Nice job !!!

Posted by Legacy on 08/08/2003 12:00am

Originally posted by: niceguy

Very Good !!

• #### Algorithm is too complicated

Posted by Legacy on 05/09/2003 12:00am

Originally posted by: Sefi

Hi,
the algorithm is just too complitcated!!
Why dividing through cA*cA + sA*sA?
I hope you know that cA� + sA� = 1 so you dont need to divide at all!

If you solve the linear equations:
nx = x*cos - y*sin
ny = x*sin + y*cos

you come to the conclution that

x = nx * cos + ny * sin
y = ny * cos - nx * sin

its just as simple as that!
This should speed up the algorithm very much :)