Inside the .NET Managed Heap

Of all of the technologies to be found in .NET, the most controversial, seems to be garbage collection. A key part of the .NET framework, the managed heap and the garbage collection mechanism are foreign ideas to many of us. This installment of my .NET column discusses the managed heap, and how you can take advantage of it.

Why a Managed Heap?

The .NET framework includes a managed heap that all .NET languages use when allocating reference type objects. Lightweight objects known as value types are always allocated on the stack, but all instances of classes and arrays are created from a pool of memory known as the managed heap.

The basic algorithm used by the garbage collector is quite simple:

  • Mark all managed memory as garbage
  • Look for used memory blocks, and mark them as valid
  • Discard all unused memory blocks
  • Compact the heap

Managed Heap Optimizations

The high-level overview seems simple enough, but the actual steps taken by the garbage collector and other parts of the heap management system are not trivial, and often involve optimizations designed to improve performance. For example, a garbage collection pass of the entire memory pool can be quite expensive. However, studies show that most objects allocated from a managed heap have a very short lifetime, so the heap is divided into three sections, known as generations. Newly allocated objects are placed into generation zero, and this generation is always collected first—it is the most likely place to find some unused memory, and due to its small size (it's small enough to fit into the processor's L2 cache) a collection here is extremely fast and productive.

Another optimization found in the managed heap concerns a principle known as locality of reference. This principle says that objects allocated together are often used together. If objects can be located in close proximity in the heap, cache performance is improved. Due to the nature of the managed heap, objects are always allocated on contiguous addresses, and the heap is kept compacted, resulting in objects growing closer together, never further apart. This is in contrast to the standard heap offered to unmanaged code, where the heap can be easily fragmented, and objects allocated together are often located relatively far apart in the heap.

Still another optimization concerns large objects. In general, large objects tend to have very long lifetimes. When a large object is allocated in the .NET managed heap, it is allocated from a special portion of the heap that is never compacted. The performance hot incurred by moving a large object outweighs the performance gain found by compacting that portion of the heap.

What about External Resources?

The garbage collector efficiently handles freeing resources from the managed heap, but a collection is only initiated when memory pressure triggers a collection. What about classes that manage limited external resources such as database connections or Windows handles? Waiting until a garbage collection is triggered to clean up database connections or file handles can severely degrade system performance.

Classes that hold external resources should implement a Close or Dispose method that is called by clients when the object is no longer used. Beginning in Beta 2, the Dispose pattern is formalized via the IDisposable interface, which is discussed later in this article.

Classes that need to clean up resources should also implement a finalizer. In C#, the preferred way to create a finalizer is to implement a destructor, while at the framework level, a finalizer is created by overriding the System.Object.Finalize method. These two methods of implementing a finalizer are equivalent:

~OverdueBookLocator()
{
    Dispose(false);
}

and:

public void Finalize()
{
    base.Finalize();
    Dispose(false);
}

In C#, it is an error to implement both a Finalize method and a destructor.

You should not create a destructor or finalization routine if you don't absolutely require finalization semantics. Finalizers decrease system performance, and increase the memory pressure on the runtime. Also, due to the way that finalizers are executed you can't guarantee when or even if a finalizer will be called.

Allocation and Garbage Collection Details

With the GC overview out of the way, let's discuss the details of how allocation and collection work in the managed heap. The managed heap looks nothing like a traditional heap that you're accustomed to using in C++ programming. In a traditional heap, a data structure is used to track free chunks of memory. Searching for a block of memory of a particular size can be a time-consuming chore, especially if the heap becomes fragmented. By contrast, in the managed heap memory is arranged as a contiguous array; a pointer tracks the boundary between allocated and free memory. As memory is allocated, the pointer is simply incremented—resulting in much higher performance for allocations.

When objects are allocated, they are initially placed in generation zero. When generation zero nears its maximum size, a collection is initiated against generation zero only. This is a very fast GC pass, due to the small size of generation zero. A generation zero collection always results in generation zero being completely flushed. Any objects that are discovered to be garbage are freed, and any objects actually in use are compacted and promoted to generation one.

As generation one nears its maximum size due to the number of objects aging into it from generation zero, a collection phase will be initiated against generations zero and one. As with a generation zero collection, all unused objects will be freed, and valid objects will be compacted and moved into the next generation. Most GC passes target generation zero, which is a target-rich environment due to the number of unused temporary objects found in generation zero. Generation two collection passes are the most costly, and are only initiated when generation zero and one collections don't free enough memory. If a generation two collection pass is not able to free enough memory for an allocation, an OutOfMemoryException exception object is thrown.

Objects that require finalization complicate the collection process. When an object with a finalizer is identified as garbage, it is not immediately released. Instead it's placed into a finalization queue, which establishes a reference to the object, preventing its collection. A background thread executes the finalizer for each object, and removes the finalized object from the finalization queue. Only after finalization can the object be removed from memory during the next collection pass. As a side effect of surviving one collection pass, the object is promoted into a higher generation, further delaying its eventual collection.

Classes that require finalization should implement the IDisposable interface in order to allow client to short-circuit finalization. IDisposable includes one method—Dispose. This interface, which was introduced for Beta 2, formalizes a pattern that was in wide use even prior to Beta 2. Essentially, an object requiring finalization exposes a Dispose method. This method is expected to free external resources and suppress finalization, as shown in this typical code fragment:

public class OverdueBookLocator: IDisposable
{
    ~OverdueBookLocator()
    {
        InternalDispose(false);
    }

    public void Dispose()
    {
        InternalDispose(true);
    }

    protected void InternalDispose(bool disposing)
    {
        if(disposing)
        {
            GC.SuppressFinalize(this);
            // Dispose of managed objects if disposing.
        }
        // free external resources here
        .
        .
        .
    }
}

Note that there are two ways that the object is cleaned up. The first way is through the Dispose method from the IDisposable interface. This method is called by client code when an object is explicitly terminated; this method calls InternalDispose(true). All objects are cleaned up in this case. If the destructor is called, InternalDispose(false) is called, and only external resources are released. If we are being finalized, managed objects that we own may have already been collected, and referencing them may cause an exception to be generated.

The call to GC.SuppressFinalize prevents the garbage collector from placing the object into the finalization queue. This reduces memory pressure because the object can be freed in a single GC pass, and improves performance because the finalizer does not need to be executed.

Disposal Optimizations for C#

So using the IDisposable.Dispose() method to clean up resources is a great way to relieve some of the memory pressure place on the managed heap, and it reduces the number of objects that must undergo finalization. However, it is cumbersome to use, especially when multiple temporary objects are created. In order to properly take advantage of the IDisposable interface, a C# client would need to write code like this:

OverdueBookLocator bookLocator = null;
try
{
    bookLocator = new OverdueBookLocator();
    // Use bookLocator here
    Book book = bookLocator.Find("Eiffel, the Language");
    .
    .
    .
}
finally
{
    if(bookLocator != null)
    {
        IDisposable disp = bookLocator as IDisposable;
        disp.Dispose();
    }
}

The finally block is required so that the proper clean up is performed if an exception is thrown. In order to simplify the use of the Dispose pattern by clients, the using statement was introduced for Beta 2. The using statement allows you to simplify your code; so the earlier example can be boiled down to:

using(bookLocator = new OverdueBookLocator())
{
   // Use bookLocator here
   Book book = bookLocator.Find("Eiffel, the Language");
}

You should employ the using statement whenever you allocate types that have a well-defined lifetime. It guarantees proper handling of the IDisposable interface, even in the presence of exceptions.

Using the System.GC Class

The System.GC class is used to access the garbage collection mechanism exposed by the .NET framework. This class includes the following useful methods:

  • GC.SuppressFinalize was described earlier in the column; this method inhibits finalization for an object. Call this method if you have already released external resources owned by an object.
  • GC.Collect comes in two versions. The version that has no parameter performs a full collection on all generations in the managed heap. Another version accepts an integer value representing the generation to be collected. You'll rarely need to call this method, as the garbage collector automatically runs when needed.
  • GC.GetGeneration returns the generation number for an object passed as a parameter. This method is useful when debugging or tracing for performance reasons, but has limited value in most applications.
  • GC.GetTotalMemory returns the amount of memory allocated in the heap. This number is not exact due to the way the managed heap works, but a close approximation can be obtained if true is passed as a parameter. This causes a collection to be performed before the memory usage is calculated.

An example of an audit method that uses these functions is provided below:

/// <summary>
/// Displays current GC information
/// </summary>
/// <param name="generation">The generation to collect</param>
/// <param name="waitForGC">Run GC before calculating usage?</param>
public void CollectAndAudit(int generation, bool waitForGC)
{
  int myGeneration = GC.GetGeneration(this);
  long totalMemory = GC.GetTotalMemory(waitForGC);
  Console.WriteLine("I am in generation {0}.", myGeneration);
  Console.WriteLine("Memory before collection {0}.", totalMemory);
  GC.Collect(generation);
  Console.WriteLine("Memory after collection {0}.", totalMemory);
}

Future Columns

In my next column I'll discuss using the XSL classes in .NET and some other XML goodies.  Following that article I have plans for articles on .NET remoting, as well as articles on interop with existing code, and an article on multi-threaded programming in .NET.

About the Author

Mickey Williams is the founder of Codev Technologies, a provider of tools and consulting for Windows Developers. He is also on the staff at .NET Experts (www.dotnetexperts.com), where he teaches the .NET Framework course. He has spoken at conferences in the USA and Europe, and has written eight books on Windows programming. He currently claims to be writing "Microsoft Visual C#" for MS Press. Mickey can be reached at mw@codevtech.com.