Thunking in Win32

Introduction

There are times when the standard language call mechanics can’t quite fill your needs. More often than not, this is related to calling convention and signature compatibility of an API, and specifically pointers to member functions. Consider the following example:


// PlatformSDK function, which takes a function pointer as callback
HHOOK SetWindowsHookEx(
int idHook,
int (__stdcall *lpfn)(),
HINSTANCE hMod,
DWORD dwThreadId
);

class MyClass
{
public:
void ourCallbackTarget()
{
// .. do something
}
};

// We want to pass a pointer to ourCallbackTarget to SetWindowsHookEx!

This is a pretty typical scenario when one is doing object-oriented programming against the Win32 API. There’s an API or third-party library function that takes a function pointer as a parameter, which is called, for example, when the API or library has completed some lengthy operation or enumeration. Because your program is object oriented, you’d like the passed callback function to be a member of a class, and therein lies the rub. The SetWindowsHookEx function expects the pointer to a non-member function, and you can’t simply cast a member function pointer to suit this.

This article will show what the problem is, why thunking can be a handy resort, and propose an approach using simple calling convention adjustments in Assembly language. Finally a generic library, Thunk32, that takes care of all the low-level work, will be demonstrated and explained.

The Thunk32 Library Features


  • A very simple way of creating member function callbacks, passable to API-functions which expect non-member function pointers.

  • Callbacks can use any number of parameters, of any type.

  • Callbacks can return any value.

  • Very quick (only three instructions) calling convention adjustment.

The Thunk32 Library Requisites

  • The brilliant Boost libraries.

The Problem

Most Win32 API callbacks, such as the WndProc passed to RegisterClassEx are declared as non-member functions with the WINAPI or CALLBACK decoration. This essentially boils down to the __stdcall calling convention. A calling convention is, in short, a description of how parameters are passed to a function, and how the stack is cleaned when the function returns. When a function bearing the __stdcall convention (arguments passed right to left, callee pops its arguments off the stack) is being called, the stack basically looks like this:


[return address to caller]
[argument 1]
[…]
[argument n]

Member functions bearing the same calling convention would, on the other hand, look like:


[return address to caller]
[pointer to class instance] <== “this” pointer here!
[argument 1]
[…]
[argument n]

The calls are clearly incompatible, and that’s why one isn’t allowed to pass pointers to non-static member functions when non-member functions are expected. There are, of course, ways to deal with this.

If the callback mechanism somehow enables you to pass a custom parameter, such as an MyClass instance pointer, back to the callback, MyClass::callBack could easily be declared static. Static member functions have the same stack signatures as non-member functions, so that would free you from the calling convention headache. The approach can be further adapted by having a static trampoline function that calls the “real” non-static member callback. A template class generalization of such is described at http://qxxy.com/people/roger/2005/12/08/member_callback. In some cases, as with SetWindowsHookEx above, and the WndProc of RegisterClassEx / SetWindowLong, there’s no such convenient way to pass along a pointer. You thus are left with other more or less hairy options, such as storing instance pointers in a global variable. The word “global” should immediately trigger blinking lights and deafening whistles in your code-oriented mind, and if it doesn’t, I’m certain “thread synchronization” will.

To sum it up, these are the reasons why thunks are needed:


  • You cannot pass a pointer to a non-static member function along to a function that expects a pointer to a non-member function; the C++ standard doesn’t allow you to. Because there’s no way to indicate an instance pointer (this) to non-member functions, you’d cause little more than a crash if you were allowed to do this conversion.

  • You cannot pass a pointer to a static member function because many of the API functions that require callbacks have no way of passing a user-defined parameter back to the callback—specifically an instance pointer. The static function would therefore have no proper way of doing anything on the class instance you’d like it to affect, so the solution wouldn’t really be an object-oriented one.

  • Function objects, such as boost::bind, cannot be used. This simply because the operator() usually used to execute the functor is also a non-static member function. Again, this cannot be cast to a non-member function!

  • Thunks are very fast. With just a few instructions used to adjust the calling convention, there should be little or no noticable overhead on their use.

Ridding the Woe by Thunking

What you would like to do in the above example is to call SetWindowsHookEx with a member of the MyClass class as a parameter, which ends up calling MyClass::ourCallbackTarget. Also, you’d very much like the solution to extend to virtual functions. So, if class MyDerivedClass publicly inherits MyClass, and callback is virtual void, thunked calls to MyClass::ourCallbackTarget on an instance of MyDerivedClass should lead you to MyDerivedClass::ourCallbackTarget.

So, what do you know thus far? Let’s sum it up:


  • You want the thunk to be a data member of MyClass. The C++ standard allows you to cast data members into function pointers.

  • The thunk will be a set of machine code instructions, which should ship you to MyClass::ourCallbackTarget.

  • The thunk must be callable with any return type, any number of arguments, and the __stdcall calling convention.

  • The calling convention has to be changed within the thunk, to match the transistion from non-member __stdcall to non-static member __thiscall.

Setting up the thunk as a strictly machine code member of MyClass is fairly simple. All you need is a struct with a couple of suitable variables inside. There are a couple of key points to remember at this point, though. First off, to have the struct as a consecutive chunk of memory, with no padding between the members no matter the size of each, you need to let the compiler know about this. The pack pragma will do just that, as the upcoming example will show. Secondly, for a chunk of memory to be executable, even with DEP (see http://en.wikipedia.org/wiki/Data_Execution_Prevention), you’ve got to allocate the block and mark it for execution. In this example, VirtualAlloc, with the flag PAGE_EXECUTE_READWRITE, will do. In the attached library, Thunk32, a private heap object is used in VirtualAlloc‘s place. Each thunk will allocate, and free, its own space within this heap. The private heap use evades the fact that VirtualAlloc allocates blocks with a minimum size corresponding to the page size, which could potentially lead to high memory usage with a high number of thunks present.

As for the example thus far, here goes:


struct THUNK
{
// store the current packing, and set 1 byte
// as the new one. This prevents padding.
#pragma pack(push, 1)
ULONG block1; // 4 bytes
UCHAR block2; // 1 byte
ULONG block3; // 4 bytes
#pragma pack(pop) // restore previous packing
};

class MyClass
{
public:
MyClass()
{
callbackThunk = reinterpret_cast<THUNK*>(
VirtualAlloc(NULL,
sizeof(THUNK),
MEM_COMMIT,
PAGE_EXECUTE_READWRITE));
if(callbackThunk == NULL)
{
/* throw an exception to notify about
the allocation problem. */
}
}

~MyClass()
{
VirtualFree(callbackThunk, sizeof(THUNK), MEM_DECOMMIT);
}

THUNK* callbackThunk;
void ourCallbackTarget()
{
}
};

This demonstrates the struct member packing, what a thunk structure could look like, and how to allocate/free the lot.

The next step would be to fill the thunk with something meaningful. Because the default calling convention of non-static member functions in VC++ is __thiscall, which essentially the same as __stdcall, only with the instance pointer is passed in ECX; that’s what the thunk will be adjusting the callback to fit. This will require a tiny amount of assembly.


lea ecx, instance pointer
mov eax, address of callback
jmp eax

The first line puts the instance pointer in ECX. Next, the address of the callback is brought into EAX, and finally you do a jump to that address. Presto, your callback is executed. It’s simple, and quick.

Next on the agenda is placing this code in the thunk. The thunk will be stored as bytecode. This means that the Assembly instructions will be translated into their numerical/byte representation. To convert the above asm block to bytecode, you can simply compile it in a dummy VC++-project, add a breakpoint, and bring up the disassembly. This should show you the instructions as well as the bytes that represent them. These bytes can be put in the thunk structure, along with a proper address for the class instance and destination function. All wrapped up and somewhat tightened, the thunk and initialization looks like this:


struct THUNK
{
#pragma pack(push, 1)
unsigned short stub1; // lea ecx,
unsigned long nThisPtr; // this
unsigned char stub2; // mov eax,
unsigned long nJumpProc; // pointer to destination function
unsigned short stub3; // jmp eax
#pragma pack(pop)
};

MyClass::MyClass()
{
callbackThunk = reinterpret_cast<THUNK*>(
VirtualAlloc(NULL,
sizeof(THUNK),
MEM_COMMIT,
PAGE_EXECUTE_READWRITE));
if(callbackThunk == NULL)
{
/* throw an exception to notify about
the allocation problem. */
}

// See declaration of the THUNK struct for a byte code explanation
callbackThunk->stub1 = 0x0D8D;
callbackThunk->nThisPtr = reinterpret_cast<ULONG>(this);

callbackThunk->stub2 = 0xB8;
// Fetch address to the destination function
callbackThunk->nJumpProc =
brute_cast<ULONG>(&MyClass::ourCallbackTarget);
callbackThunk->stub3 = 0xE0FF;

// Flush instruction cache. May be required on some architectures
// that don’t feature strong cache coherency guarantees, though
// neither on x86, x64, nor AMD64.
FlushInstructionCache(GetCurrentProcess(), callbackThunk,
sizeof(THUNK));
}

The only thing left to describe of the above code is the call to brute_cast. This is simply a templated function that forces a cast between pretty much anything, here used to convert a pointer to a ULONG. This is wanted because there is no legal way to cast the address of a member function into an arbitrary pointer. You could cast the member function into a reference to a pointer, and subsequently dereference and cast this into a ULONG (phew), but oddly enough, this can’t be done from within the class itself (which is the example used in this article). The brute_cast function is a clean, and fairly good looking, way to deal with the conversion, with the same low cost. In both the case of the extra pointer address dereference and brute_cast, the compiler should (and will) optimize it down to a single instruction.

The actual address you get from &Class::MemberFunction points to one of two things: If you’ve got incremental linking enabled, it may point to another thunk, which jumps to either the function’s vcall entry, if the function is virtual, or the actual function otherwise. If incremental linking isn’t enabled, the pointer will either be for the vcall or the function itself. See http://en.wikipedia.org/wiki/VTBL for more information on virtual functions and vtbl/vcall. As for brute_cast, this is what it looks like:


template<typename Target, typename Source>
inline Target brute_cast(const Source s)
{
BOOST_STATIC_ASSERT(sizeof(Target) == sizeof(Source));
union { Target t; Source s; } u;
u.s = s;
return u.t;
}

Before you ask, the BOOST_STATIC_ASSERT is a compile-time assert, which is used here to prevent the casting between types of different sizes (32bit int to 8 bit char, for example).

So, there it is. A simple, but effective, approach to call thunks. The attached library wraps the principles shown here into a typesafe framework, which is fairly simple to use from any object-oriented Win32 application.

Using the Code

Achieving what’s targetted above, by use of the attached Thunk32 library, is a simple matter. A demonstration follows.

You have a dummy function that looks a lot like the ones found in the Winapi and PlatformSDK, that accepts a function pointer:


void someCallbackMechanism(int (__stdcall *func)(int, int))
{
(*func)(10, 10);
}

The example class has gained two extra member variables, namely instances of the Thunk32 class, which does absolutely all the work:


class MyClass
{
public:
indev::Thunk32<MyClass, int(int, int)> simpleCallbackThunk;

MyClass()
{
simpleCallbackThunk.initializeThunk(this, &MyClass::simpleCallback);
}

virtual int simpleCallback(int i1, int i2)
{
cout << “MyClass::simpleCallback hit” << endl;
return 10;
}
};

The thunk, simpleCalbackThunk, has two template parameters: the class type in which the function it points to reside, and the signature of the function. In case of the above declaration, the class type is MyClass, and the signature matches a function that returns an int and takes two int‘s as parameters. The initialization done in the constructor assigns the thunk with an instance pointer to this, and the address of your target function, &MyClass::simpleCallback.

You now expand the example shown earlier in the article to include a derived class. There’s really nothing fancy about this MyDerivedClass—all it does is provide a new implementation of the simpleCallback function:


class MyDerivedClass : public MyClass
{
public:
virtual int simpleCallback(int i1, int i2)
{
cout << “MyDerivedClass::simpleCallback hit, ”
<< “heading for parent class” << endl;
return MyClass::simpleCallback(i1, i2);
}
};

What the following example will do is first create an instance of the base class, and then pass the thunked callback to someCallbackMechanism. The call made by someCallbackMechanism will go about just as one would expect a direct call to MyClass::simpleCallback to, apart from the calling convention adjustments applied by the thunk. Following this, an instance of the derived class will be created, and referenced by the base class. Through this base reference, someCallbackMechanism is called upon again. This second time, the callback in someCallbackMechanism will first hit the thunk, then the vtable entry and lastly the derived MyDerivedClass::simpleCallback; just as it would if the function was called directly on the base reference.


int main()
{
/*
First demo: Should have the callback mechanism call
MyClass::simpleCallback.
*/
cout << “First demo” << endl;
MyClass myClassInstance;
someCallbackMechanism(myClassInstance.simpleCallbackThunk.
getCallback());
cout << endl;

/*
Second demo: Should have the callback mechanism call
MyDerivedClass::simpleCallback, which goes on to call
MyClass::simpleCallback.
*/
cout << “Second demo” << endl;
MyDerivedClass myDerivedInstance;
MyClass& myClassReference = myDerivedInstance;
someCallbackMechanism(myClassReference.simpleCallbackThunk.
getCallback());
cout << endl;

cout << “Press enter to exit” << endl;
cin.get();

return 0;
}

MyDerivedClass::simpleCallback just so happens to call the base simpleCallback after writing some text to standard output. This means that the printed text will be:


First demo
MyClass::simpleCallback hit

Second demo
MyDerivedClass::simpleCallback hit, heading for parent class
MyClass::simpleCallback hit

Press enter to exit

Points of Interest

Read through the source code. It’s not extensive by any standards, but does use some of the handy preprocessor functions supplied with the brilliant boost framework.

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read