ATL Under the Hood Part 4

Environment: ATL


Previous articles:


Until now, we haven’t discussed anything about assembly language. But we can’t avoid it any longer if we really want to know what is going on under the hood of ATL because ATL uses some low-level techniques as well as some inline assembly language to make it as small and as fast as possible. I assume that readers already have a basic knowledge of assembly language, so I will concentrate only on my topic and not try to write another tutorial of assembly language. If you don’t know enough assembly language, I recommend taking a look at MATT PIETREK’s Article “Under The Hood” in Feb 1998 issue of Microsoft System Journal. It gives you enough information about assembly language to get started.

To start our tour, take a look at this simple program.

Program 55


void fun(int, int) {
}

int main() {
fun(5, 10);
return 0;
}

Now, compile it on the command line with command line compiler cl.exe. Compile it with the -FAs switch. For example, if this program’s name is prog55, compile it this way.


Cl -FAs prog55.cpp

This will generate a file with the same name, but its .asm extension contains the assembly language code of the following program. Now, take a look at the generated output file. Let’s discuss the calling of the function first. The assembly code to call this function is something like this.


push 10 ; 0000000aH
push 5
call ?fun@@YAXHH@Z ; fun

The parameters of the function are pushed on the stack from right to left and then call the function. But the name of function is a little bit different than our given function name. This is because the C++ compiler decorates the name of the function to perform function overloading. Let’s change the program a little bit and overload the function to take a look at the code’s behavior.

Program 56


void fun(int, int) {
}

void fun(int, int, int) {
}

int main() {
fun(5, 10);
fun(5, 10, 15);
return 0;
}

Now, the assembly language for calling both of the functions looks something like this:


push 10 ; 0000000aH
push 5
call ?fun@@YAXHH@Z ; fun

push 15 ; 0000000fH
push 10 ; 0000000aH
push 5
call ?fun@@YAXHHH@Z ; fun

Take a look at the name of the function; we write both functions with the same name but the compiler decorates these functions to do function overloading.

If you don’t want to decorate the function name, you can use external “C” with functions. Let’s see a little bit of a change in the program.

Program 57


extern “C” void fun(int, int) {
}

int main() {
fun(5, 10);
return 0;
}

The assembly language code of this function is:


push 10 ; 0000000aH
push 5
call _fun

This means that now you can’t overload the function with C linkage. Take a look at the following program.

Program 58


extern “C” void fun(int, int) {
}

extern “C” void fun(int, int, int) {
}

int main() {
fun(5, 10);
return 0;
}

This program gives a compilation error. This happens because function overloading is not supported in the C language and you are going to make the two functions with the same name and tell the compiler to not decorate its name — use C language linkage, not C++ linkage.

Now, take a look what the code compiler generates for our do-nothing function. Here is the code that the compiler generates for our function.


push ebp
mov ebp, esp
pop ebp
ret 0

Before I go into further detail, take a look at the last statement of the function: ret 0. Why it is 0? Or can it be other than 0? As we have seen, all the parameters that we pass to the function are in fact pushed into the stack. What will be the effect on the register when you or the compiler pushes something on the stack? Take a look at the following simple program to see the behavior of this. I use printf rather than cout to avoid the overhead of cout.

Program 59


#include <cstdio>

int g_iTemp;

int main() {

fun(5, 10);

_asm mov g_iTemp, esp
printf(“Before push %d\n”, g_iTemp);

_asm push eax
_asm mov g_iTemp, esp
printf(“After push %d\n”, g_iTemp);
_asm pop eax

return 0;
}

The output of this program is:


Before push 1244980
After push 1244976

This program displays the value of the ESP register before and after pushing some value onto the stack. This clearly shows that when you push something into the stack, it grows downward in the memory.

Now, there is a question. Who is going to restore the stack pointer when we pass a parameter into the function, the function itself or the caller of that function? In fact, both cases are possible and this is the difference between a standard calling convention and the C calling convention. Take a look at the very next statement after calling the function.


push 10 ; 0000000aH
push 5
call _fun
add esp, 8

Here, two parameters are passed in the function, so the stack pointer subtracts 8 bytes after pushing two values onto the stack. Now, in this program it is the responsibility of the function’s caller to set the stack pointer. This is called the C Calling convention. In this calling convention, you can pass a variable with an argument because the caller knows how many parameters are being passed to the function, so it can set the stack pointer itself.

However, if the standard calling convention is selected, it is the responsibility of the callee to clear the stack. So, in this case, variables, not arguments, can’t be passed in the function. There is no way to tell the function how many parameters are passed, so it can set the stack pointer appropriately.

Take a look at the following program to see the behavior of the standard calling convention.

Program 60


extern “C” void _stdcall fun(int, int) {
}

int main() {

fun(5, 10);
return 0;
}

Now take a look at the calling of the function.


push 10 ; 0000000aH
push 5
call _fun@8

Here, @ with the function name shows that this is a standard calling convention and 8 shows the number of bytes pushed into the stack. So, the number of the argument can be calculated by dividing this number by 4.

Here is the code of our do-nothing function.


push ebp
mov ebp, esp
pop ebp
ret 8

This function sets the stack pointer itself with the help of the “ret 8” instruction before leaving it.

Now, explore the code that the compiler generates for us. The compiler inserts this code to make a stack frame so it can access the parameter and local variable in the standard way. Stack frame is a memory area reserved for the function to store the information about the parameter, local variable, and return address. Stack frame is always created when a new function is called and destroys it when the function returns. On 8086 architecture, the EBP register is used to store the address of the stack frame, sometimes called the stack pointer.

So, the compiler first saves the address of the previous stack frame and then creates a new stack frame by using the value of ESP. And, before returning the function, the value of the old stack frame is preserved.

Now, take a look at what is in the stack frame. The Stack frame has all the parameters at the positive side of EBP and all the local variables at the negative side of EBP.

So, the return address of the function is stored at EBP and the value of the previous Stack frame is stored at EBP + 4. Now, take a look at the example, which has two parameters and three local variables.

Program 61


extern “C” void fun(int a, int b) {
int x = a;
int y = b;
int z = x + y;
return;
}

int main() {
fun(5, 10);
return 0;
}

And now, take a look at the compiler-generated code of the function.


push ebp
mov ebp, esp
sub esp, 12 ; 0000000cH

; int x = a;
mov eax, DWORD PTR _a$[ebp]
mov DWORD PTR _x$[ebp], eax

; int y = b;
mov ecx, DWORD PTR _b$[ebp]
mov DWORD PTR _y$[ebp], ecx

; int z = x + y;
mov edx, DWORD PTR _x$[ebp]
add edx, DWORD PTR _y$[ebp]
mov DWORD PTR _z$[ebp], edx

mov esp, ebp
pop ebp
ret 0

Now, what is _x, _y, and so forth? It is defined just above the function definition, something like this:


_a$ = 8
_b$ = 12
_x$ = -4
_y$ = -8
_z$ = -12

This means you can read this code something like this:


; int x = a;
mov eax, DWORD PTR [ebp + 8]
mov DWORD PTR [ebp – 4], eax

; int y = b;
mov ecx, DWORD PTR [ebp + 12]
mov DWORD PTR [ebp – 8], ecx

; int z = x + y;
mov edx, DWORD PTR [ebp – 4]
add edx, DWORD PTR [ebp – 8]
mov DWORD PTR [ebp – 12], edx

This means the address of parameters a and b are EBP + 8 and EBP + 12, respectively. And, the value of x, y, and z are stored at memory location EBP – 4, EBP – 8, and EBP – 12, respectively.

After you’ve been armed with this knowledge, let’s play a game with the parameter of the functions. Let’s take a look at this simple program.

Program 62


#include <cstdio>

extern “C” int fun(int a, int b) {
return a + b;
}

int main() {

printf(“%d\n”, fun(4, 5));
return 0;
}

The output of this program is expected. The output of this program is “9”. Now, let’s change a program a little bit.

Program 63


#include <cstdio>

extern “C” int fun(int a, int b) {
_asm mov dword ptr[ebp+12], 15
_asm mov dword ptr[ebp+8], 14
return a + b;
}

int main() {

printf(“%d\n”, fun(4, 5));
return 0;
}

The output of this program is “29”. We now know the address of the parameter and in this program, we change the value of the parameter. And, when we add those variables, the new values — 15 and 14 — are added.

VC has naked attributes for functions. If you specify any function to naked, it won’t generate the prolog and epilog code for that function. Now, what is prolog and epilog code? Prolog is an English word mean “Opening;” yes, it is the name of a programming language, too, which is used in AI, but there is no relation between that programming language and prolog code generated by the compiler. This is a code that the compiler automatically inserted in the opening of the function calling to set the stack frame. Take a look at the assembly language code generated by Program 61. In the beginning of the function, the compiler automatically inserted the following code to set the stack frame.


push ebp
mov ebp, esp
sub esp, 12 ; 0000000cH

This code is called prolog code. And, in the same way, the code inserted at the end of function is called Epilog code. In the same program, the Epilog code generated by the compiler is:


mov esp, ebp
pop ebp
ret 0

Now, take a look at the function with the naked attribute:

Program 64


extern “C” void _declspec(naked) fun() {
_asm ret
}

int main() {

fun();
return 0;
}

The code of the function fun, which is generated by the compiler, is something like this.


_asm ret

This means that there are no prolog and epilog code in this function. In fact, there are rules of naked functions. You can’t declare an automatic variable in a naked function because, for this compiler, you have to generate the code yourself and in the naked function, the compiler won’t generate any code for you. In fact, you have to write the ret statement yourself; otherwise, the program will crash. You even can’t write a return statement in the naked function. Why? Because when you return something from the function, the compiler puts its value in the eax register. So, it means the compiler has to generate the code for your return statement. Let’s take a look at this simple program to understand the working of the return value from the function.

Program 64


#include <cstdio>

extern “C” int sum(int a, int b) {
return a + b;
}

int main() {

int iRetVal;
sum(3, 7);
_asm mov iRetVal, eax
printf(“%d\n”, iRetVal);
return 0;
}

The output of this program is “10”. Here we haven’t directly used the return value of the function; instead, we copy the value of eax into the variable just after calling the function.

Now, write the whole function naked, with prolog and epilog code, which returns the value of two variables after returning it.

Program 65


#include <cstdio>

extern “C” int _declspec(naked) sum(int a, int b) {

// prolog code
_asm push ebp
_asm mov ebp, esp

// code for add two variables and return
_asm mov eax, dword ptr [ebp + 8]
_asm add eax, dword ptr [ebp + 12]

// epilog code
_asm pop ebp
_asm ret
}

int main() {

int iRetVal;
sum(3, 7);
_asm mov iRetVal, eax
printf(“%d\n”, iRetVal);
return 0;
}

The output of this program is “10”; in other words, the sum of two parameters: 3 and 7.

This attribute is used in the ATLBASE.H file to implement the member of the _QIThunk structure. This structure is used to debug the reference counting the ATL program when _ATL_DEBUG_INTERFACES are defined.

I hope to explore some other mysteries of ATL in the next article.

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read