E: A bit about garbage collection | CodeGuru

E: A bit about garbage collection

Bruce Eckel’s Thinking in Java Contents | Prev | Next It’s hard to believe that Java could possibly be as fast or faster than C++ . This assertion has yet to be proven to my satisfaction. However, I’ve begun to see that many of my doubts about speed come from early implementations that were not […]

Written By
CodeGuru Staff
CodeGuru Staff
Mar 1, 2001
8 minute read
CodeGuru content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

It’s


hard to believe that Java could possibly be as fast or faster than C++


. This
assertion has yet to be proven to my satisfaction. However, I’ve begun to
see that many of my doubts about
speed
come from early implementations that were not particularly efficient so there
was no model at which to point to explain how Java could be fast.

Part


of the way I’ve thought about speed has come from being cloistered with


the

C++
model. C++ is very focused on everything happening statically, at compile time,
so that the run-time image of the program is small and fast. C++ is also based
directly on the C model, primarily for backwards compatibility, but sometimes
simply because it worked a particular way in C so it was the easiest approach
in C++. One of the most important cases is the way memory is managed in C and
C++, and this has to do with one of my more fundamental assertions about why
Java must be slow: in Java, all objects must be created on the heap.

In


C++, creating objects on the

stack
is fast because when you enter a particular scope the stack pointer is moved
down once to allocate storage for all the stack-based objects created in that
scope, and when you leave the scope (after all the local destructors have been
called) the stack pointer is moved up once. However, creating heap objects in
C++ is typically much slower because it’s based on the C concept of a
heap as a big pool of memory that (and this is essential) must be recycled.
When you call
delete
in C++ the released memory leaves a hole in the heap, so when you call
new,
the storage allocation mechanism must go seeking to try to fit the storage for
your object into any existing holes in the heap or else you’ll rapidly
run out of heap storage. Searching for available pieces of memory is the reason
that allocating heap storage has such a performance impact in C++, so
it’s far faster to create stack-based objects.

Again,


because so much of C++ is based on doing everything at compile-time, this makes


sense. But in Java there are certain places where things happen more


dynamically and it changes the model. When it comes to creating objects, it


turns out that the garbage collector can have a significant impact on


increasing the speed of object creation. This might sound a bit odd at first


– that storage release affects storage allocation – but it’s


the way some JVMs work and it means that allocating storage for heap objects in


Java can be nearly as fast as creating storage on the stack in C++.

You


can think of the C++ heap (and a slow implementation of a Java heap) as a yard


where each object stakes out its own piece of turf. This real estate can become


abandoned sometime later and must be reused. In some JVMs, the Java heap is


quite different; it’s more like a conveyor belt that moves forward every


time you allocate a new object. This means that object storage allocation is


remarkably rapid. The “heap pointer” is simply moved forward into


virgin territory, so it’s effectively the same as C++’s stack


allocation. (Of course, there’s a little extra overhead for bookkeeping


but it’s nothing like searching for storage.)

Now


you might observe that the heap isn’t in fact a conveyor belt, and if you


treat it that way you’ll eventually start paging memory a lot (which is a


big performance hit) and later run out. The trick is that the garbage collector


steps in and while it collects the garbage it compacts all the objects in the


heap so that you’ve effectively moved the “heap pointer”


closer to the beginning of the conveyor belt and further away from a page


fault. The garbage collector rearranges things and makes it possible for the


high-speed, infinite-free-heap model to be used while allocating storage.

To


understand how this works, you need to get a little better idea of the way the


different garbage collector (GC) schemes work. A simple but slow GC technique


is reference counting. This means that each object contains a reference


counter, and every time a handle is attached to an object the reference count


is increased. Every time a handle goes out of scope or is set to


null

,


the reference count is decreased. Thus, managing reference counts is a small


but constant overhead that happens throughout the lifetime of your program. The


garbage collector moves through the entire list of objects and when it finds


one with a reference count of zero it releases that storage. The one drawback


is that if objects circularly refer to each other they can have non-zero


reference counts while still being garbage. Locating such self-referential


groups requires significant extra work for the garbage collector. Reference


counting is commonly used to explain one kind of garbage collection but it


doesn’t seem to be used in any JVM implementations.

In


faster schemes, garbage collection is not based on reference counting. Instead,


it is based on the idea that any non-dead object must ultimately be traceable


back to a handle that lives either on the stack or in static storage. The chain


might go through several layers of objects. Thus, if you start in the stack and


the static storage area and walk through all the handles you’ll find all


the live objects. For each handle that you find, you must trace into the object


that it points to and then follow all the handles in


that

object, tracing into the objects they point to, etc., until you’ve moved


through the entire web that originated with the handle on the stack or in


static storage. Each object that you move through must still be alive. Note


that there is no problem with detached self-referential groups – these


are simply not found, and are therefore automatically garbage.

In


the approach described here, the JVM uses an


adaptive

garbage-collection


scheme, and what it does with the live objects that it locates depends on the


variant currently being used. One of these variants is


stop-and-copy

.


This means that, for reasons that will become apparent, the program is first


stopped (this is not a background collection scheme). Then, each live object


that is found is copied from one heap to another, leaving behind all the


garbage. In addition, as the objects are copied into the new heap they are


packed end-to-end, thus compacting the new heap (and allowing new storage to


simply be reeled off the end as previously described).

Of


course, when an object is moved from one place to another, all handles that


point at (reference) that object must be changed. The handle that comes from


tracing to the object from the heap or the static storage area can be changed


right away, but there can be other handles pointing to this object that will be


encountered later during the “walk.” These are fixed up as they are


found (you could imagine a hash table mapping old addresses to new ones).

There


are two issues that make copy collectors inefficient. The first is the idea


that you have two heaps and you slosh all the memory back and forth between


these two separate heaps, maintaining twice as much memory as you actually


need. Some JVMs deal with this by allocating the heap in chunks as needed and


simply copying from one chunk to another.

The


second issue is the copying. Once your program becomes stable it might be


generating little or no garbage. Despite that, a copy collector will still copy


all the memory from one place to another, which is wasteful. To prevent this,


some JVMs detect that no new garbage is being generated and switch to a


different scheme (this is the “adaptive” part). This other scheme


is called


mark
and sweep

,


and it’s what Sun’s JVM uses all the time. For general use mark and


sweep is fairly slow, but when you know you’re generating little or no


garbage it’s fast.

Mark


and sweep follows the same logic of starting from the stack and static storage


and tracing through all the handles to find live objects. However, each time it


finds a live object that object is marked by setting a flag in it, but the


object isn’t collected yet. Only when the marking process is finished


does the sweep occur. During the sweep, the dead objects are released. However,


no copying happens, so if the collector chooses to compact a fragmented heap it


does so by shuffling objects around.

The


“stop-and-copy” refers to the idea that this type of garbage


collection is


not

done in the background; instead, the program is stopped while the GC occurs. In


the Sun literature you’ll find many references to garbage collection as a


low-priority background process, but it turns out that this was a theoretical


experiment that didn’t work out. In practice, the Sun garbage collector


is run when memory gets low. In addition, mark-and-sweep requires that the


program be stopped.

As


previously mentioned, in the JVM described here memory is allocated in big


blocks. If you allocate a large object, it gets its own block. Strict


stop-and-copy requires copying every live object from the source heap to a new


heap before you could free the old one, which translates to lots of memory.


With blocks, the GC can typically use dead blocks to copy objects to as it


collects. Each block has a


generation
count

to keep track of whether it’s alive. In the normal case, only the blocks


created since the last GC are compacted; all other blocks get their generation


count bumped if they have been referenced from somewhere. This handles the


normal case of lots of short-lived temporary objects. Periodically, a full


sweep is made – large objects are still not copied (just get their


generation count bumped) and blocks containing small objects are copied and


compacted. The JVM monitors the efficiency of GC and if it becomes a waste of


time because all objects are long-lived then it switches to mark-and-sweep.


Similarly, the JVM keeps track of how successful mark-and-sweep is, and if the


heap starts to become fragmented it switches back to stop-and-copy. This is


where the “adaptive” part comes in, so you end up with a mouthful:


“adaptive generational stop-and-copy mark-and-sweep.”

There


are a number of additional speedups possible in a JVM. An especially important


one involves the operation of the loader and Just-In-Time (JIT) compiler. When


a class must be loaded (typically, the first time you want to create an object


of that class), the


.class

file is located and the byte codes for that class are brought into memory. At


this point, one approach is to simply JIT all the code, but this has two


drawbacks: it takes a little more time, which, compounded throughout the life


of the program, can add up; and it increases the size of the executable (byte


codes are significantly more compact than expanded JIT code) and this might


cause paging, which definitely slows down a program. An alternative approach is


lazy
evaluation,

which means that the code is not JIT compiled until necessary. Thus, code that


never gets executed might never get JIT compiled.

Because


JVMs are external to browsers, you might expect that you could benefit from the


speedups of some JVMs while using any browser. Unfortunately, JVMs don’t


currently interoperate with different browsers. To get the benefits of a


particular JVM, you must either use the browser with that JVM built in or run


standalone Java applications.


E

Contents

|

Prev

|

Next
CodeGuru Logo

CodeGuru covers topics related to Microsoft-related software development, mobile development, database management, and web application programming. In addition to tutorials and how-tos that teach programmers how to code in Microsoft-related languages and frameworks like C# and .Net, we also publish articles on software development tools, the latest in developer news, and advice for project managers. Cloud services such as Microsoft Azure and database options including SQL Server and MSSQL are also frequently covered.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.