Full Text Search: The Key to Better Natural Language Queries for NoSQL in Node.js
Introduction and Series Recap
- In Part 1, you looked at the basic structure of code generated surrounding a function call.
- In Part 2, you looked at calling conventions and studied the code generated for two popular calling conventions, __stdcall and __cdecl.
- In Part 3, you looked at stack frame and studied the positioning of local variables and function arguments in the frame.
In this article, you will explore a slightly more complex data type, class. You may wonder what any of this has got to do with functions. Suffice it to say that, for now, classes can have member functions, which after all are functions. And, because you are on the topic of functions, take a look at them, a little closer look.
I will not go into what classes are and things like that. I assume readers would be familiar with this already. In this part, you will concern yourself with the member functions of a class.
Here's an example:
int result1 = funcA(1,2); classA objA; int result2 = objA.memberFuncA(1,2)
Looking at the code above, you see the striking similarity in how the function is called in the two cases, member function and non-member function. In fact, that is the case. Member function calls are in fact normal function calls. So, everything I talked about in the earlier parts about argument passing via stack, return value passing, pushing of return address on the stack, all work exactly the same way.
In case of a member function call, objA. indicates that the memberFuncA could access objA object's data. How does this work? How does memberFuncA code know what the objA's data is? It is not passed in as an argument, as you can see. Does it mean that the compiler is doing something silently and passing in all the objA's data as arguments, or perhaps, the compiler is passing in the objA's address as an argument to memberFuncA... or is it something else?
Digging into Disassembly
You have the background from the earlier parts on how to analyze this. Start digging in now.
- Fire up Visual Studio 2005. Choose Win32 as the project type and choose Win32 Console Application template. Enter a name, say classes. Click Finish to create the project.
- Press Alt+F7 to invoke project properties. Now, to make learning easy, turn off certain settings that cause the compiler to emit some code that will make it harder to understand the core concepts for this article. I will try to address the implications of these settings at a future time.
- Go to Configuration Properties->C/C++->General. Herein, set Debug Information Format to Program Database(/Zi).
- Go to Configuration Properties->C/C++->Code Generation. Herein, set Basic Runtime Checks to Default.
- Go to Configuration Properties->Linker->General. Herein, set Enable Incremental Linking to No.
- Hit OK.
- Modify code like below and put a breakpoint on line 21 (place a caret on line 21 and press F9 to put in abreakpoint):
- Press F7 to do a build.
- Press F5 to start debugging. The program execution stops at line 21 now.
- Press Alt+5. This now brings up the register's window.
- Press Alt+6. This brings up a memory watch window.
- Place a caret on line 21, right-click, and choose go to disassembly.
- In the disassembly view, right-click again and make sure you have the following checked:
- Show Address
- Show Source Code
- Show Code Bytes
- Now, start analying the portions of the code. The disassembly looks like this:
- Next, note the two push instructions marked blue. These correspond to the pushing of the function arguments prior to calling the functions. This is expected, as you learnt in Part 1.
- Now comes the interesting part. So far, you have seen that, for __stdcall and __cdecl calls, the arguments are pushed on the stack from right to left, and then an immediate call instruction is issued. However, that doesn't seem to be the case here. Before both of these call instructions, you see the following line:
lea ecx,[ebp-4]Looks interesting. Whatever is it, though? From Part 3, you already are aware of ebp. EBP is used to hold the frame pointer. Anything lower in value w.r.t. EBP are the local variables. So, it appears that some local variable is being referenced here. And, you know that the only local variable you have is objA. So, it must be referring to objA. It makes sense. You are calling a member function of objA, so, if the compiler is putting in some code to refer to it, it makes sense. What is the ecx? That brings me to another Microsoft-specific calling convention called __thiscall.
__thiscall is a calling convention for calling class member functions. It is the default convention the compiler will use if nothing is specified explicitly. The __thiscall convention is characterized by the following:
- The arguments are passed from right to left.
- The address of the object for which the member function is called is passed in the ECX register.
- The stack cleanup responsibility is the callee's. This means that the compiler cannot use the __thiscall convention if the function takes a variable number of arguments, because a variable number of arguments would necessitate that the caller cleans up the stack.
- Coming back to the line lea ecx, [ebp-4], what you just learned is exactly what is being done. The address (location EBP-4) is being set into ECX before calling the member function.
- Now that the address to the object is passed via the ECS register, if you have guessed that the member function would use ECX to access the object's data, you just guessed it right!! To confirm, step over the two lines by pressing F10 and step into the Expand function by pressing F11 on the call instruction. The disassembly now looks like this:
- Analyzing this, the first two lines are the typical stack frame preparation you learned in Part 3.
- The next two lines circled are the compiler just storing the ECX contents on the stack frame. So, from this point, if the compiler were to need to reach out to the objA's member data, all it needs to do is go to EBP-4 and get the contents from there. The contents are actually the address to the objA object. Once it has the location of objA, the member data are at address+n offset.
- That is what happens in the next few lines.
mov eax,dword ptr [ebp-4]gets the address to objA into EAX.
mov ecx,dword ptr [eax]loads the contents of DWORD at offset 0 from the address of objA. As you see, objA has only one data member, so this means that the statement above is fetching the m_nData value.
imul ecx,dword ptr [ebp+8]that one is the multiplication operation. The [EBP+8], as you know from Part 3, is the location of the first argument to the function Expand.
mov edx,dword ptr [ebp-4] mov dword ptr [edx],ecxAgain, you see [ebp-4] being used. This means that, again, the compiler is trying to reach the objA data. If you look at the source code, it makes perfect sense. You need to set the result of multiplication to m_nData. That is what the instruction above does.