CLR Method Internals (.NET 2.0)

Simple tasks that you typically take for granted can be surprisingly complex when you peek under the hood. Method calls are the bread and butter of C# and VB programming, but a lot of moving pieces go into making that all work. In this article, we'll take a quick look at how all of it works.

When a method call on the CLR is made, the caller and callee must communicate a set of information with each other. The abstraction that contains this information is called an activation frame. The caller supplies the this pointer for instance methods, additional arguments for the method, and return address information, while the receiver must give back the return value of the method, ensure that the stack has been cleaned up, and return to the caller's address. All of this requires that a standard method-calling process be in place. This is referred to as a calling convention, of which there are several options on Windows.

Activation frames are implemented using a combination of registers and the physical OS stack, and are managed by the CLR's JIT Compiler. There isn't a single "activation frame object"; as noted above, it's just a convention followed by the caller and callee. In addition to that, the CLR manages its own stack of frames to mark transitions in the stack, for example unmanaged to native calls, security asserts, and uses the information to mark the addresses of GC roots that are active in the call stack. These are stored on the stack and referred to by the Thread Environment Block (TEB).

There are a number of ways to make method calls on the CLR. From entirely static to entirely dynamic and everywhere in between (e.g. call, callvirt, calli, delegates), we'll take a look at each. The primary difference between the various method calls is the mechanism used to find the target address to which the generated native code must call.

We'll use this set of types in our examples below:

using System;
using System.Runtime.CompilerServices;

class Foo
    public int f(string s, int x, int y)
        Console.WriteLine("Foo::f({0},{1},{2})", s, x, y);
        return x*y;

    public virtual int g(string s, int x, int y)
        Console.WriteLine("Foo::g({0},{1},{2})", s, x, y);
        return x+y;

class Bar : Foo
    public override int g(string s, int x, int y)
        Console.WriteLine("Bar::g({0},{1},{2})", s, x, y);
        return x-y;

delegate int Baz(string s, int x, int y);

Furthermore, we'll imagine the following variables are in scope for examples below:

Foo f = new Foo();
Bar b = new Bar();

The CLR's jitted code uses the fastcall Windows calling convention. This permits the caller to supply the first two arguments (including this in the case of instance methods) in the machine's ECX and EDX registers. Registers are significantly faster than using the machine's stack, which is where the remaining arguments are supplied, in right-to-left order (using the push instruction).

CLR Method Internals (.NET 2.0)

Ordinary Calls (call)

You might have already guessed the primary native code difference between an ordinary call and a virtual call based on descriptions elsewhere. Simply put, a virtual call looks at the method-table of the object against which the method is dispatching to determine the method-table slot address to use for the call, while others just use the token supplied at the call-site to determine the method-table slot address at compile time. Slot offsets for virtual calls are determined statically at JIT time, so they are quite fast. Method table layout is such that overridden virtual methods inherited from base classes occupy the same slots, ensuring the index for a particular method doesn't depend on runtime type.

Normal method calls (i.e., the IL call instruction, or callvirts to non-virtual methods) are very fast. The JIT Compiler is able to burn the precise address of the target method-table slot at the call-site because it knows the location at compile time.

Let's consider an example:

int ff = f.f("Hi", 10, 10);
int bf = b.f("Hi", 10, 10);

In this case, we're calling the method f as defined on Foo. Although we use the b variable in the second line to make the call, f is non-virtual and thus the call always goes through Foo's definition. The jitted native code for both (in this example, IA-32 code) will be nearly identical:

mov   ecx,esi 
mov   edx,dword ptr ds:[01B4303Ch] 
push  0Ah  
push  0Ah  

Remember, the first two arguments are passed in ECX and EDX, respectively. Our this pointer (constructed above with the Foo f = new Foo() C# code) resides in ESI, and thus we simply mov it into ECX. Then we move the pointer to the string "Hi" into EDX; the exact address clearly will change based on your program. Since we are passing two additional parameters to the method beyond the two which are stored in a register, we pass them using the machine's stack; 0Ah is hexadecimal for the integer 10, so we push two onto the stack (one each for each argument).

Lastly, we make a call to a statically known address. This address refers to the method-table slot, in this case Foo::f's, and is discovered at JIT compile time by matching the supplied method token with the internal CLR method-table data structure:

call FFFC0D28

The second call — through the b variable — differs only in that it passes b's value in the ECX register. The target address of the call is the same:

mov   ecx,edi 
mov   edx,dword ptr ds:[01B4303Ch] 
push  0Ah  
push  0Ah  
call  FFFC0D28 

After performing the call to FFFC0D28 in this example, the JIT stub will either jmp straight to the jitted code or invoke the JIT compiler (with a call) if the method's code has not yet been compiled.

Virtual Method Calls (callvirt)

A virtual method call is very much like an ordinary call, except that it must look up the target of the call at runtime based on the this object. For example, consider this code:

int fg = f.g("Hi", 10, 10);
int bg = b.g("Hi", 10, 10);

The manner in which the this pointer and its arguments are passed is identical to the call example above. ESI is moved into ECX for the dispatch on f and EDI is moved into ECX for the dispatch on b. The difference is that the call target can't be burned into the call-site. Instead, we indirectly go through the method-table to get at the address:

mov   eax,dword ptr [ecx] 
call  dword ptr [eax+38h]

We first dereference ECX, which holds the this pointer, and store the result in EAX. Then we add 38h to EAX to get at the correct slot in the method-table. Because this table's address was discovered using the this pointer, we will inspect a different method-table for f and b. Thus, the call through b will end up going through its overridden version. We then just call the address of that slot. Remember, we stated above that all classes in a hierarchy use the same offsets for methods, meaning that this same offset can be used for all derived classes.

The full IA-32 for this calling sequence (using the f variable) is:

mov   ecx,esi 
mov   edx,dword ptr ds:[01B4303Ch] 
push  0Ah  
push  0Ah  
mov   eax,dword ptr [ecx] 
call  dword ptr [eax+38h]

Again, the only difference when b is used is that EDI, instead of ESI, is moved into ECX.

Indirect Method Calls (calli)

C# doesn't supply a mechanism with which to emit a calli instruction in the IL. You can, of course, emit code using reflection, but an example would introduce more complexity than necessary. If you were to imagine that a calli sequence were being JIT compiled, the only difference introduced would be that the native call instruction would perform a call dword ptr [exx], where exx is the register in which the target address of the calli was found. That is, it calls the address to which the indirect pointer refers. All of the arguments would be passed in accordance to the method token supplied to the calli instruction.

Dynamic Method Calls (Delegates, Others)

There is a range of dynamic method calls available. Many of them are part of the dynamic programming infrastructure supplied by reflection, and thus won't be explored in depth here. They are all variants on the same basic premise, which is that some piece of runtime functionality is able to look up the method-table information at runtime to make a method dispatch. The runtime can then, of course, make calls to this code as requested, based on information supplied by the programmer.

Delegates are an interesting special case of this capability. A delegate is essentially just a strongly typed function pointer type, an instance of which has two pieces of information: the target object (to be passed as this), and the target method token. Each delegate type has a special Invoke method whose signature matches the function over which it has been formed. The CLR supplies the implementation of this method, which enables it to perform lightweight dispatch to the underlying method.

A call to a delegate looks identical to a call to a normal method. The difference is that the target is the delegate's Invoke method-table slot instead of the actual underlying function. Arguments are laid out as with any other type of call (i.e.,_fastcall). The implementation of Invoke simply patches the ECX register to contain the target object reference (supplied at delegate construction time) and uses the method token (also supplied at delegate construction time) to jump to the appropriate method-slot. There is very little overhead in this process, which makes delegate dispatch on the order of one to two times the speed than a simple virtual method call.

The various other styles of method dispatch — such as Type.InvokeMember, MethodInfo.Invoke, and so forth — all add a certain level of overhead when compared to delegates, because they must go through the process of binding to the target method. This is the process of matching dynamic type, method name, and argument information to the list of known loaded types. Delegates typically don't suffer this penalty because the target method token is embedded in the IL. You may dynamically construct and invoke delegates (e.g., with DynamicInvoke), which adds a comparable level of overhead for the construction and binding process. Another penalty associated with pure dynamic invocation, is that these mechanisms tend to pass arguments as object[]s. This requires that the dispatching code inside the CLR must transform that information into the appropriate calling convention to perform the invocation, by unraveling the array, and then perform the necessary marshaling on the return.

Wrapping Up

This was a very brief overview of something that is incredibly deep. More details, including the performance characteristics, and how you can play around with some of these implementation details through spelunking in the Visual Studio debugger, are outlined in this MSDN video.

This article is adapted from Professional .NET Framework 2.0 by Joe Duffy (Wrox, 2006, ISBN: 0-7645-7135-4), from chapter 3 "Inside the CLR."

Copyright 2006 by WROX. All rights reserved. Reproduced here by permission of the publisher.

About the Author

Joe Duffy

Joe Duffy is a Program Manager on the CLR Team at Microsoft, where he works on WinFX and the .NET Framework. He is also the author of Professional .NET Framework 2.0 (Wrox, 2006, ISBN: 0-7645-7135-4).


  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • As all sorts of data becomes available for storage, analysis and retrieval - so called 'Big Data' - there are potentially huge benefits, but equally huge challenges...
  • The agile organization needs knowledge to act on, quickly and effectively. Though many organizations are clamouring for "Big Data", not nearly as many know what to do with it...
  • Cloud-based integration solutions can be confusing. Adding to the confusion are the multiple ways IT departments can deliver such integration...

Most Popular Programming Stories

More for Developers

RSS Feeds

Thanks for your registration, follow us on our social networks to keep up-to-date