Object landscapes and lifetimes

Bruce Eckel’s Thinking in Java Contents | Prev | Next

and
lifetimes

Technically,
OOP is just about abstract data typing, inheritance and polymorphism, but other
issues can be at least as important. The remainder of this section will cover
these issues.

One
of the most important factors is the way objects are created and destroyed.
Where is the data for an object and how is the lifetime of the object
controlled? There are different philosophies at work here. C++ takes the
approach that control of efficiency is the most important issue, so it gives
the programmer a choice. For maximum run-time speed, the storage and lifetime
can be determined while the program is being written, by placing the objects on
the stack (these are sometimes called
automatic
or
scoped
variables) or in the static storage area. This places a priority on the speed
of storage allocation and release, and control of these can be very valuable in
some situations. However, you sacrifice flexibility because you must know the
exact quantity, lifetime and type of objects
while
you’re writing the program. If you are trying to solve a more general
problem such as computer-aided design, warehouse management or air-traffic
control, this is too restrictive.

The
second approach is to create objects dynamically in a pool of memory called the
heap.
In this approach you don’t know until run time how many objects you need,
what their lifetime is or what their exact type is. Those are determined at the
spur of the moment while the program is running. If you need a new object, you
simply make it on the heap at the point that you need it. Because the storage
is managed dynamically, at run time, the amount of time required to allocate
storage on the heap is significantly longer than the time to create storage on
the stack. (Creating storage on the stack is often a single assembly
instruction to move the stack pointer down, and another to move it back up.)
The dynamic approach makes the generally logical assumption that objects tend
to be complicated, so the extra overhead of finding storage and releasing that
storage will not have an important impact on the creation of an object. In
addition, the greater flexibility is essential to solve the general programming
problem.

C++
allows you to determine whether the objects are created while you write the
program or at run time to allow the control of efficiency. You might think that
since it’s more flexible, you’d always want to create objects on
the heap rather than the stack. There’s another issue, however, and
that’s the lifetime of an object. If you create an object on the stack or
in static storage, the compiler determines how long the object lasts and can
automatically destroy it. However, if you create it on the heap the compiler
has no knowledge of its lifetime. A programmer has two options for destroying
objects: you can determine programmatically when to destroy the object, or the
environment can provide a feature called a
garbage
collector

that automatically discovers when an object is no longer in use and destroys
it. Of course, a garbage collector is much more convenient, but it requires
that all applications must be able to tolerate the existence of the garbage
collector and the other overhead for garbage collection. This does not meet the
design requirements of the C++ language and so it was not included, but Java
does have a garbage collector (as does Smalltalk; Delphi does not but one could
be added. Third-party garbage collectors exist for C++).

The
rest of this section looks at additional factors concerning object lifetimes
and landscapes.

Collections
and iterators

If
you don’t know how many objects you’re going to need to solve a
particular problem, or how long they will last, you also don’t know how
to store those objects. How can you know how much space to create for those
objects? You can’t, since that information isn’t known until run
time.

The
solution to most problems in object-oriented design seems flippant: you create
another type of object. The new type of object that solves this particular
problem holds handles to other objects. Of course, you can do the same thing
with an array, which is available in most languages. But there’s more.
This new object, generally called a
collection
(also called a
container,
but the AWT uses that term in a different sense so this book will use
“collection”), will expand itself whenever necessary to accommodate
everything you place inside it. So you don’t need to know how many
objects you’re going to hold in a collection. Just create a collection
object and let it take care of the details.

Fortunately,
a good OOP language comes with a set of collections as part of the package. In
C++, it’s the Standard Template Library (STL). Object Pascal has
collections in its Visual Component Library (VCL). Smalltalk has a very
complete set of collections. Java also has collections in its standard library.
In some libraries, a generic collection is considered good enough for all
needs, and in others (C++ in particular) the library has different types of
collections for different needs: a vector for consistent access to all
elements, and a linked list for consistent insertion at all elements, for
example, so you can choose the particular type that fits your needs. These may
include sets, queues, hash tables, trees, stacks, etc.

All
collections have some way to put things in and get things out. The way that you
place something into a collection is fairly obvious. There’s a function
called “push” or “add” or a similar name. Fetching
things out of a collection is not always as apparent; if it’s an
array-like entity such as a vector, you might be able to use an indexing
operator or function. But in many situations this doesn’t make sense.
Also, a single-selection function is restrictive. What if you want to
manipulate or compare a set of elements in the collection instead of just one?

The
solution is an
iterator,
which is an object whose job is to select the elements within a collection and
present them to the user of the iterator. As a class, it also provides a level
of abstraction. This abstraction can be used to separate the details of the
collection from the code that’s accessing that collection. The
collection, via the iterator, is abstracted to be simply a sequence. The
iterator allows you to traverse that sequence without worrying about the
underlying structure – that is, whether it’s a vector, a linked
list, a stack or something else. This gives you the flexibility to easily
change the underlying data structure without disturbing the code in your
program. Java began (in version 1.0 and 1.1) with a standard iterator, called
Enumeration,
for all of its collection classes. Java 1.2 has added a much more complete
collection library which contains an iterator called
Iterator
that does more than the older
Enumeration.

From
the design standpoint, all you really want is a sequence that can be
manipulated to solve your problem. If a single type of sequence satisfied all
of your needs, there’d be no reason to have different kinds. There are
two reasons that you need a choice of collections. First, collections provide
different types of interfaces and external behavior. A stack has a different
interface and behavior than that of a queue, which is different than that of a
set or a list. One of these might provide a more flexible solution to your
problem than the other. Second, different collections have different
efficiencies for certain operations. The best example is a vector and a list.
Both are simple sequences that can have identical interfaces and external
behaviors. But certain operations can have radically different costs. Randomly
accessing elements in a vector is a constant-time operation; it takes the same
amount of time regardless of the element you select. However, in a linked list
it is expensive to move through the list to randomly select an element, and it
takes longer to find an element if it is further down the list. On the other
hand, if you want to insert an element in the middle of a sequence, it’s
much cheaper in a list than in a vector. These and other operations have
different efficiencies depending upon the underlying structure of the sequence.
In the design phase, you might start with a list and, when tuning for
performance, change to a vector. Because of the abstraction via iterators, you
can change from one to the other with minimal impact on your code.

The
singly-rooted hierarchy

One
of the issues in OOP that has become especially prominent since the
introduction of C++ is whether all classes should ultimately be inherited from
a single base class. In Java (as with virtually all other OOP languages) the
answer is “yes” and the name of this ultimate base class is simply
Object.
It
turns out that the benefits of the
singly-rooted
hierarchy

are many.

All
objects in a singly-rooted hierarchy have an interface in common, so they are
all ultimately the same type. The alternative (provided by C++) is that you
don’t know that everything is the same fundamental type. From a
backwards-compatibility standpoint this fits the model of C better and can be
thought of as less restrictive, but when you want to do full-on object-oriented
programming you must then build your own hierarchy to provide the same
convenience that’s built into other OOP languages. And in any new class
library you acquire, some other incompatible interface will be used. It
requires effort (and possibly multiple inheritance) to work the new interface
into your design. Is the extra “flexibility” of C++ worth it? If
you need it – if you have a large investment in C – it’s
quite valuable. If you’re starting from scratch, other alternatives such
as Java can often be more productive.

All
objects in a singly-rooted hierarchy (such as Java provides) can be guaranteed
to have certain functionality. You know you can perform certain basic
operations on every object in your system. A singly-rooted hierarchy, along
with creating all objects on the heap, greatly simplifies argument passing (one
of the more complex topics in C++).

A
singly-rooted hierarchy makes it much easier to implement a garbage collector.
The necessary support can be installed in the base class, and the garbage
collector can thus send the appropriate messages to every object in the system.
Without a singly-rooted hierarchy and a system to manipulate an object via a
handle, it is difficult to implement a garbage collector.

Since
run-time type information is guaranteed to be in all objects, you’ll
never end up with an object whose type you cannot determine. This is especially
important with system level operations, such as exception handling, and to
allow greater flexibility in programming.

Collection
libraries and support

for
easy collection use

Because
a collection is a tool that you’ll use frequently, it makes sense to have
a library of collections that are built in a reusable fashion, so you can take
one off the shelf and plug it into your program. Java provides such a library,
although it is fairly limited in Java 1.0 and 1.1 (the Java 1.2 collections
library, however, satisfies most needs).


Downcasting
vs. templates/generics

To
make these collections reusable, they contain the one universal type in Java
that was previously mentioned:
Object.
The singly-rooted hierarchy means that everything is an
Object,
so a collection that holds
Objects
can hold anything. This makes it easy to reuse.

To
use such a collection, you simply add object handles to it, and later ask for
them back. But, since the collection holds only
Objects,
when you add your object handle into the collection it is upcast to
Object,
thus losing its identity. When you fetch it back, you get an
Object
handle, and not a handle to the type that you put in. So how do you turn it
back into something that has the useful interface of the object that you put
into the collection?

Here,
the cast is used again, but this time you’re not casting
up
the inheritance hierarchy to a more general type, you cast
down
the hierarchy to a more specific type. This manner of casting is called
downcasting.
With upcasting, you know, for example, that a
Circle
is a type of
Shape
so it’s safe to upcast, but you don’t know that an
Object
is necessarily a
Circle
or a
Shape
so it’s hardly safe to downcast unless you know that’s what
you’re dealing with.

It’s
not completely dangerous, however, because if you downcast to the wrong thing
you’ll get a run-time error called an
exception,
which will be described shortly. When you fetch object handles from a
collection, though, you must have some way to remember exactly what they are so
you can perform a proper downcast.

Downcasting
and the run-time checks require extra time for the running program, and extra
effort from the programmer. Wouldn’t it make sense to somehow create the
collection so that it knows the types that it holds, eliminating the need for
the downcast and possible mistake? The solution is
parameterized
types
,
which are classes that the compiler can automatically customize to work with
particular types. For example, with a parameterized collection, the compiler
could customize that collection so that it would accept only
Shapes
and fetch only
Shapes.

The
housekeeping dilemma:

who
should clean up?

Each
object requires resources in order to exist, most notably memory. When an
object is no longer needed it must be cleaned up so that these resources are
released for reuse. In simple programming situations the question of how an
object is cleaned up doesn’t seem too challenging: you create the object,
use it for as long as it’s needed, and then it should be destroyed.
It’s not too hard, however, to encounter situations in which the
situation is more complex.

Suppose,
for example, you are designing a system to manage air traffic for an airport.
(The same model might also work for managing crates in a warehouse, or a video
rental system, or a kennel for boarding pets.) At first it seems simple: make a
collection to hold airplanes, then create a new airplane and place it in the
collection for each airplane that enters the air-traffic-control zone. For
cleanup, simply delete the appropriate airplane object when a plane leaves the
zone.

But
perhaps you have some other system to record data about the planes; perhaps
data that doesn’t require such immediate attention as the main controller
function. Maybe it’s a record of the flight plans of all the small planes
that leave the airport. So you have a second collection of small planes, and
whenever you create a plane object you also put it in this collection if
it’s a small plane. Then some background process performs operations on
the objects in this collection during idle moments.

With
Java, the garbage collector is designed to take care of the problem of
releasing the memory (although this doesn’t include other aspects of
cleaning up an object). The garbage collector “knows” when an
object is no longer in use, and it then automatically releases the memory for
that object. This, combined with the fact that all objects are inherited from
the single root class
Object
and that you can create objects only one way, on the heap, makes the process of
programming in Java much simpler than programming in C++. You have far fewer
decisions to make and hurdles to overcome.


Garbage
collectors

vs.
efficiency and flexibility

If
all this is such a good idea, why didn’t they do the same thing in C++?
Well of course there’s a price you pay for all this programming
convenience, and that price is run-time overhead. As mentioned before, in C++
you can create objects on the stack, and in this case they’re
automatically cleaned up (but you don’t have the flexibility of creating
as many as you want at run-time). Creating objects on the stack is the most
efficient way to allocate storage for objects and to free that storage.
Creating objects on the heap can be much more expensive. Always inheriting from
a base class and making all function calls polymorphic also exacts a small
toll. But the garbage collector is a particular problem because you never quite
know when it’s going to start up or how long it will take. This means
that there’s an inconsistency in the rate of execution of a Java program,
so you can’t use it in certain situations, such as when the rate of
execution of a program is uniformly critical. (These are generally called
real
time
programs,
although not all real-time programming problems are this stringent.)
[7]

The
designers of the C++ language, trying to woo C programmers (and most
successfully, at that), did not want to add any features to the language that
would impact the speed or the use of C++ in any situation where C might be
used. This goal was realized, but at the price of greater complexity when
programming in C++. Java is simpler than C++, but the tradeoff is in efficiency
and sometimes applicability. For a significant portion of programming problems,
however, Java is often the superior choice.


[6]
Note that this is true only for objects that are created on the heap, with
new.
However, the problem described, and indeed any general programming problem,
requires objects to be created on the heap.

[7]
According to a technical reader for this book, one existing real-time Java
implementation (www.newmonics.com) has guarantees on garbage collector
performance.

More by Author

Must Read