Super-Fast Two Pass Bi-Linear Scaling

Environment:VC6 SP4 + Processor pack

I loved Eran Yariv's two pass scaling code. However, it wasn't quite fast enough for my needs even after implementing the posted corrections.

I bit the bullet and taught myself inline assembler to speed things up. This optimized code is between about 1.5 and 12 times as fast as the original C++ source depending on what CPU you are using and the bitmap sizes involved! The temporary bitmap was also eliminated, significantly reducing memory overhead for large bitmap scaling operations.

I also wrote MMX/SSE enabled versions of the scaling algorithms and achieved an additional 25% improvement for P3 users (although I don't have a good CPUID function, so you have to enable these functions yourself if you want to use them).

The new scaling class currently only supports bi-linear filtering and does not use templates, but these could be re-introduced by using Erin's original source!

Like most codeguru submissions, this class was developed by a programmer who was "working on something else", so this class is designed to serve my needs more than it is intended to be fool proof. It may not scale exactly the same as Erin's even though the math is identical (rounding differences may be introduce changes - probably improvements actually). It works on bitmaps made out of COLORREF pixels only and throws out the alignment byte (ie the COLORREF 02RRGGBB becomes 00RRGGBB when scaled). This served my needs but may not work so well for those who want to preserve or even include the extra byte in your scaling.

For example, if you were scaling a 32 bit bitmap that was TTRRGGBB where TT is a transparency mask, the scaling algorithms would toss out the transparency information. The SSE enabled scaling functions could be easily updated to include the transparency value in calculations with little or no speed penalty since we are only doing 3 multiplications and 3 additions to scale an RGB pixel and SSE can do 4 operations in parallel. Since I am not a graphics whiz, I don't know if that would be useful to people so I didn't do it myself.

You will need VC++6 and SP4 if you want to compile the SSE scaling functions.

As shown in the accompanying demo you just use:

CFast2PassScale ScaleEngine;   // Create an instance of 
                               //    the scaling class

          m_OriginalBitmapBits, // Original bitmap bits
          m_ScaledBitmapBits,   // Scaled bitmap bits


Download demo project - 20 Kb
Download source - 9 Kb