Virtual Developer Workshop: Containerized Development with Docker

Environment:VC6 SP4 + Processor pack

I loved Eran Yariv's two pass scaling code. However, it wasn't quite fast enough for my needs even after implementing the posted corrections.

I bit the bullet and taught myself inline assembler to speed things up. This optimized code is between about 1.5 and 12 times as fast as the original C++ source depending on what CPU you are using and the bitmap sizes involved! The temporary bitmap was also eliminated, significantly reducing memory overhead for large bitmap scaling operations.

I also wrote MMX/SSE enabled versions of the scaling algorithms and achieved an additional 25% improvement for P3 users (although I don't have a good CPUID function, so you have to enable these functions yourself if you want to use them).

The new scaling class currently only supports bi-linear filtering and does not use templates, but these could be re-introduced by using Erin's original source!

Like most codeguru submissions, this class was developed by a programmer who was "working on something else", so this class is designed to serve my needs more than it is intended to be fool proof. It may not scale exactly the same as Erin's even though the math is identical (rounding differences may be introduce changes - probably improvements actually). It works on bitmaps made out of COLORREF pixels only and throws out the alignment byte (ie the COLORREF 02RRGGBB becomes 00RRGGBB when scaled). This served my needs but may not work so well for those who want to preserve or even include the extra byte in your scaling.

For example, if you were scaling a 32 bit bitmap that was TTRRGGBB where TT is a transparency mask, the scaling algorithms would toss out the transparency information. The SSE enabled scaling functions could be easily updated to include the transparency value in calculations with little or no speed penalty since we are only doing 3 multiplications and 3 additions to scale an RGB pixel and SSE can do 4 operations in parallel. Since I am not a graphics whiz, I don't know if that would be useful to people so I didn't do it myself.

You will need VC++6 and SP4 if you want to compile the SSE scaling functions.

As shown in the accompanying demo you just use:

CFast2PassScale ScaleEngine;   // Create an instance of 
                               //    the scaling class

          m_OriginalBitmapBits, // Original bitmap bits
          m_ScaledBitmapBits,   // Scaled bitmap bits


Download demo project - 20 Kb
Download source - 9 Kb


  • My Scaling function

    Posted by Legacy on 07/25/2003 07:00am

    Originally posted by: Richard Jones

    I have an application that requires occasional scaling of large images (converted to bitmaps) to accuratley display them in a window.

    Although there is no real serious issue of speed I would like to improve things. The result of this code below is simply the destination rectangle for StretchBlt.

    I have sped up the process some by using StretchBlt with Memory dc's and taking that result and BitBlt to the screen.

    But some of the images (jpeg) are over 1 meg in size. I use ImageMagick dll's to convert images into bitmaps. I am not sure what is taking the most time ImageMagick or StrethcBlt.

    This scaling function works well, but I am wondering what I could incorperate of your software.

    Any opinions apreciated.
    Richard Jones

    void MyTStatic::ScaleImageToFit(int W, int H, int rrW, int rrH)
    int Sx=3;
    int Sy=4;
    int DiffW=0;
    int DiffH=0;
    if( W >= rrW )DiffW=W-rrW;
    if( H >= rrH )DiffH=H-rrH;
    if(DiffW || DiffH)
    //Scale down with division (no distortion)
    int Prime=1;
    //Gets largest prime for both

    if(Prime > 1)

    if(W < rrW && H < rrH)break;


    //Scaled down may be way to small so now try to re-grow it
    //back to a little less than size of window
    (W >= (rrW-8))//-4 assures smaller

    || //or

    (H >= (rrH-8))



    //Assign to Destination rect
    //used by StretchBlt
    Destiny.left =0;
    Destiny.top =0;
    Destiny.right =W;

    //Now center it
    Destiny.left = (rrW-Destiny.right )/2;
    Destiny.top = (rrH-Destiny.bottom)/2;

    int MyTStatic::GetNextPrime(int ValX, int ValY)
    int Prime=1;
    if((ValX % 2)==0 && (ValY % 2)==0)Prime=2;
    if((ValX % 3)==0 && (ValY % 3)==0)Prime=3;
    if((ValX % 5)==0 && (ValY % 5)==0)Prime=5;
    if((ValX % 7)==0 && (ValY % 7)==0)Prime=7;
    if((ValX % 11)==0 && (ValY % 11)==0)Prime=11;
    if((ValX % 13)==0 && (ValY % 13)==0)Prime=13;
    if((ValX % 17)==0 && (ValY % 17)==0)Prime=17;
    return Prime;

  • VC++ 7.0 & SSE Compile HOWTO

    Posted by Legacy on 11/28/2002 08:00am

    Originally posted by: Niki

    You will receive an error when compiling
    the SSEfunction with VC++ 7.0:
    "Unknown __m128 type" (or somth like this)

    To get it to work just include this file:
    #include <xmmintrin.h> /* Streaming SIMD Extensions
    Intrinsics include file */

    Then it will compile without error.


  • Ummmmmm

    Posted by Legacy on 08/27/2002 07:00am

    Originally posted by: Vinnie

    I hate to point this out, but the "optimized" version is approximately 50 times slower than what is possible to achieve.

    First of all, there is no need to use floating point numbers. Photoshop's implementation of bilinear scaling uses no floats.

    The correct implementation has two functions with the following prototypes:

    // shrinks a row or column of 8-bit pixels.
    // handles case where destPixels<srcPixels
    void blshrinkmap(
    ,long srcPixels
    ,long destPixels
    ,unsigned char *src
    ,long srcPixelBytes
    ,unsigned char *dest
    ,long destPixelBytes );

    You have to call this n*2 times, where n is the number of planes (RGB=3 planes). Call it first to shrink all the rows, and then again to shrink all the columns (for example). This requires intermediate storage.

    The expansion function is similar:

    void blexpandmap(
    ,long srcPixels
    ,long destPixels
    ,unsigned char *src
    ,long srcPixelBytes
    ,unsigned char *dest
    ,long destPixelBytes );

    Although I can't give out the actual implementation, I can tell you that a) it doesn't use floats, and b) it is almost identical to bresenham's line drawing algorithm (with the "error" term being the equivalent of a remainder which must be passed into the next pixel).

  • Is anywhere I can find the description of Bi-Linear Algorithm?

    Posted by Legacy on 09/13/2001 07:00am

    Originally posted by: raser

    I am interested in how it was done, could anyone tell me where can get more information?
    thanks a lot!

  • ok.. but

    Posted by Legacy on 08/23/2001 07:00am

    Originally posted by: sasquach

    WDJ had a faster more flexable version a few issues ago..
    even did rotation...

  • Pretty cool -- but too slow

    Posted by Legacy on 08/22/2001 07:00am

    Originally posted by: Clute

    I have a more specialized vers. which is > twice as fast and thats not even asm or SIMD'ed. But for generic bilinear bitmap scaling this is ok.

  • not compile function SSESuperScale

    Posted by Legacy on 08/21/2001 07:00am

    Originally posted by: Novak

    not compile function SSESuperScale error "fatal error C1601: unsupported inline assembly opcode"

  • You must have javascript enabled in order to post comments.

Leave a Comment
  • Your email address will not be published. All fields are required.

Most Popular Programming Stories

More for Developers

RSS Feeds

Thanks for your registration, follow us on our social networks to keep up-to-date