Object landscapes and lifetimes

Bruce Eckel’s Thinking in Java

Contents | Prev | Next

and
lifetimes

Technically,

OOP is just about abstract data typing, inheritance and polymorphism, but other

issues can be at least as important. The remainder of this section will cover

these issues.

One

of the most important factors is the way objects are created and destroyed.

Where is the data for an object and how is the lifetime of the object

controlled? There are different philosophies at work here. C++ takes the

approach that control of efficiency is the most important issue, so it gives

the programmer a choice. For maximum run-time speed, the storage and lifetime

can be determined while the program is being written, by placing the objects on

the stack (these are sometimes called

automatic

scoped

variables) or in the static storage area. This places a priority on the speed

of storage allocation and release, and control of these can be very valuable in

some situations. However, you sacrifice flexibility because you must know the

exact quantity, lifetime and type of objects

while

you’re writing the program. If you are trying to solve a more general

problem such as computer-aided design, warehouse management or air-traffic

control, this is too restrictive.

The

second approach is to create objects dynamically in a pool of memory called the

heap

In this approach you don’t know until run time how many objects you need,

what their lifetime is or what their exact type is. Those are determined at the

spur of the moment while the program is running. If you need a new object, you

simply make it on the heap at the point that you need it. Because the storage

is managed dynamically, at run time, the amount of time required to allocate

storage on the heap is significantly longer than the time to create storage on

the stack. (Creating storage on the stack is often a single assembly

instruction to move the stack pointer down, and another to move it back up.)

The dynamic approach makes the generally logical assumption that objects tend

to be complicated, so the extra overhead of finding storage and releasing that

storage will not have an important impact on the creation of an object. In

addition, the greater flexibility is essential to solve the general programming

problem.

C++

allows you to determine whether the objects are created while you write the

program or at run time to allow the control of efficiency. You might think that

since it’s more flexible, you’d always want to create objects on

the heap rather than the stack. There’s another issue, however, and

that’s the lifetime of an object. If you create an object on the stack or

in static storage, the compiler determines how long the object lasts and can

automatically destroy it. However, if you create it on the heap the compiler

has no knowledge of its lifetime. A programmer has two options for destroying

objects: you can determine programmatically when to destroy the object, or the

environment can provide a feature called a

garbage
collector

that automatically discovers when an object is no longer in use and destroys

it. Of course, a garbage collector is much more convenient, but it requires

that all applications must be able to tolerate the existence of the garbage

collector and the other overhead for garbage collection. This does not meet the

design requirements of the C++ language and so it was not included, but Java

does have a garbage collector (as does Smalltalk; Delphi does not but one could

be added. Third-party garbage collectors exist for C++).

The

rest of this section looks at additional factors concerning object lifetimes

and landscapes.

Collections
and iterators

you don’t know how many objects you’re going to need to solve a

particular problem, or how long they will last, you also don’t know how

to store those objects. How can you know how much space to create for those

objects? You can’t, since that information isn’t known until run

time.

The

solution to most problems in object-oriented design seems flippant: you create

another type of object. The new type of object that solves this particular

problem holds handles to other objects. Of course, you can do the same thing

with an array, which is available in most languages. But there’s more.

This new object, generally called a

collection

(also called a

container

but the AWT uses that term in a different sense so this book will use

“collection”), will expand itself whenever necessary to accommodate

everything you place inside it. So you don’t need to know how many

objects you’re going to hold in a collection. Just create a collection

object and let it take care of the details.

Fortunately,

a good OOP language comes with a set of collections as part of the package. In

C++, it’s the Standard Template Library (STL). Object Pascal has

collections in its Visual Component Library (VCL). Smalltalk has a very

complete set of collections. Java also has collections in its standard library.

In some libraries, a generic collection is considered good enough for all

needs, and in others (C++ in particular) the library has different types of

collections for different needs: a vector for consistent access to all

elements, and a linked list for consistent insertion at all elements, for

example, so you can choose the particular type that fits your needs. These may

include sets, queues, hash tables, trees, stacks, etc.

All

collections have some way to put things in and get things out. The way that you

place something into a collection is fairly obvious. There’s a function

called “push” or “add” or a similar name. Fetching

things out of a collection is not always as apparent; if it’s an

array-like entity such as a vector, you might be able to use an indexing

operator or function. But in many situations this doesn’t make sense.

Also, a single-selection function is restrictive. What if you want to

manipulate or compare a set of elements in the collection instead of just one?

The

solution is an

iterator

which is an object whose job is to select the elements within a collection and

present them to the user of the iterator. As a class, it also provides a level

of abstraction. This abstraction can be used to separate the details of the

collection from the code that’s accessing that collection. The

collection, via the iterator, is abstracted to be simply a sequence. The

iterator allows you to traverse that sequence without worrying about the

underlying structure – that is, whether it’s a vector, a linked

list, a stack or something else. This gives you the flexibility to easily

change the underlying data structure without disturbing the code in your

program. Java began (in version 1.0 and 1.1) with a standard iterator, called

Enumeration

for all of its collection classes. Java 1.2 has added a much more complete

collection library which contains an iterator called

Iterator

that does more than the older

Enumeration

From

the design standpoint, all you really want is a sequence that can be

manipulated to solve your problem. If a single type of sequence satisfied all

of your needs, there’d be no reason to have different kinds. There are

two reasons that you need a choice of collections. First, collections provide

different types of interfaces and external behavior. A stack has a different

interface and behavior than that of a queue, which is different than that of a

set or a list. One of these might provide a more flexible solution to your

problem than the other. Second, different collections have different

efficiencies for certain operations. The best example is a vector and a list.

Both are simple sequences that can have identical interfaces and external

behaviors. But certain operations can have radically different costs. Randomly

accessing elements in a vector is a constant-time operation; it takes the same

amount of time regardless of the element you select. However, in a linked list

it is expensive to move through the list to randomly select an element, and it

takes longer to find an element if it is further down the list. On the other

hand, if you want to insert an element in the middle of a sequence, it’s

much cheaper in a list than in a vector. These and other operations have

different efficiencies depending upon the underlying structure of the sequence.

In the design phase, you might start with a list and, when tuning for

performance, change to a vector. Because of the abstraction via iterators, you

can change from one to the other with minimal impact on your code.

the end, remember that a collection is only a storage cabinet to put objects

in. If that cabinet solves all of your needs, it doesn’t really matter

how

it is implemented (a basic concept with most types of objects). If you’re

working in a programming environment that has built-in overhead due to other

factors (running under Windows, for example, or the cost of a garbage

collector), then the cost difference between a vector and a linked list might

not matter. You might need only one type of sequence. You can even imagine the

“perfect” collection abstraction, which can automatically change

its underlying implementation according to the way it is used.

The
singly-rooted hierarchy

One

of the issues in OOP that has become especially prominent since the

introduction of C++ is whether all classes should ultimately be inherited from

a single base class. In Java (as with virtually all other OOP languages) the

answer is “yes” and the name of this ultimate base class is simply

Object

turns out that the benefits of the

singly-rooted
hierarchy

are many.

All

objects in a singly-rooted hierarchy have an interface in common, so they are

all ultimately the same type. The alternative (provided by C++) is that you

don’t know that everything is the same fundamental type. From a

backwards-compatibility standpoint this fits the model of C better and can be

thought of as less restrictive, but when you want to do full-on object-oriented

programming you must then build your own hierarchy to provide the same

convenience that’s built into other OOP languages. And in any new class

library you acquire, some other incompatible interface will be used. It

requires effort (and possibly multiple inheritance) to work the new interface

into your design. Is the extra “flexibility” of C++ worth it? If

you need it – if you have a large investment in C – it’s

quite valuable. If you’re starting from scratch, other alternatives such

as Java can often be more productive.

All

objects in a singly-rooted hierarchy (such as Java provides) can be guaranteed

to have certain functionality. You know you can perform certain basic

operations on every object in your system. A singly-rooted hierarchy, along

with creating all objects on the heap, greatly simplifies argument passing (one

of the more complex topics in C++).

singly-rooted hierarchy makes it much easier to implement a garbage collector.

The necessary support can be installed in the base class, and the garbage

collector can thus send the appropriate messages to every object in the system.

Without a singly-rooted hierarchy and a system to manipulate an object via a

handle, it is difficult to implement a garbage collector.

Since

run-time type information is guaranteed to be in all objects, you’ll

never end up with an object whose type you cannot determine. This is especially

important with system level operations, such as exception handling, and to

allow greater flexibility in programming.

You

may wonder why, if it’s so beneficial, a singly-rooted hierarchy

isn’t it in C++. It’s the old bugaboo of efficiency and control. A

singly-rooted hierarchy puts constraints on your program designs, and in

particular it was perceived to put constraints on the use of existing C code.

These constraints cause problems only in certain situations, but for maximum

flexibility there is no requirement for a singly-rooted hierarchy in C++. In

Java, which started from scratch and has no backward-compatibility issues with

any existing language, it was a logical choice to use the singly-rooted

hierarchy in common with most other object-oriented programming languages.

Collection
libraries and support

for
easy collection use

Because

a collection is a tool that you’ll use frequently, it makes sense to have

a library of collections that are built in a reusable fashion, so you can take

one off the shelf and plug it into your program. Java provides such a library,

although it is fairly limited in Java 1.0 and 1.1 (the Java 1.2 collections

library, however, satisfies most needs).

Downcasting
vs. templates/generics

make these collections reusable, they contain the one universal type in Java

that was previously mentioned:

Object

The singly-rooted hierarchy means that everything is an

Object

so a collection that holds

Object

can hold anything. This makes it easy to reuse.

use such a collection, you simply add object handles to it, and later ask for

them back. But, since the collection holds only

Object

when you add your object handle into the collection it is upcast to

Object

thus losing its identity. When you fetch it back, you get an

Object

handle, and not a handle to the type that you put in. So how do you turn it

back into something that has the useful interface of the object that you put

into the collection?

Here,

the cast is used again, but this time you’re not casting

the inheritance hierarchy to a more general type, you cast

down

the hierarchy to a more specific type. This manner of casting is called

downcasting

With upcasting, you know, for example, that a

Circle

is a type of

Shape

so it’s safe to upcast, but you don’t know that an

Object

is necessarily a

Circle

or a

Shape

so it’s hardly safe to downcast unless you know that’s what

you’re dealing with.

It’s

not completely dangerous, however, because if you downcast to the wrong thing

you’ll get a run-time error called an

exception,

which will be described shortly. When you fetch object handles from a

collection, though, you must have some way to remember exactly what they are so

you can perform a proper downcast.

Downcasting

and the run-time checks require extra time for the running program, and extra

effort from the programmer. Wouldn’t it make sense to somehow create the

collection so that it knows the types that it holds, eliminating the need for

the downcast and possible mistake? The solution is

parameterized
types

which are classes that the compiler can automatically customize to work with

particular types. For example, with a parameterized collection, the compiler

could customize that collection so that it would accept only

Shape

and fetch only

Shape

Parameterized

types are an important part of C++, partly because C++ has no singly-rooted

hierarchy. In C++, the keyword that implements parameterized types is

template

Java currently has no parameterized types since it is possible for it to get by

– however awkwardly – using the singly-rooted hierarchy. At one

point the word

generic

(the keyword used by Ada for its templates) was on a list of keywords that were

“reserved for future implementation.” Some of these seemed to have

mysteriously slipped into a kind of “keyword Bermuda Triangle” and

it’s difficult to know what might eventually happen.

The
housekeeping dilemma:

who
should clean up?

Each

object requires resources in order to exist, most notably memory. When an

object is no longer needed it must be cleaned up so that these resources are

released for reuse. In simple programming situations the question of how an

object is cleaned up doesn’t seem too challenging: you create the object,

use it for as long as it’s needed, and then it should be destroyed.

It’s not too hard, however, to encounter situations in which the

situation is more complex.

Suppose,

for example, you are designing a system to manage air traffic for an airport.

(The same model might also work for managing crates in a warehouse, or a video

rental system, or a kennel for boarding pets.) At first it seems simple: make a

collection to hold airplanes, then create a new airplane and place it in the

collection for each airplane that enters the air-traffic-control zone. For

cleanup, simply delete the appropriate airplane object when a plane leaves the

zone.

But

perhaps you have some other system to record data about the planes; perhaps

data that doesn’t require such immediate attention as the main controller

function. Maybe it’s a record of the flight plans of all the small planes

that leave the airport. So you have a second collection of small planes, and

whenever you create a plane object you also put it in this collection if

it’s a small plane. Then some background process performs operations on

the objects in this collection during idle moments.

Now

the problem is more difficult: how can you possibly know when to destroy the

objects? When you’re done with the object, some other part of the system

might not be. This same problem can arise in a number of other situations, and

in programming systems (such as C++) in which you must explicitly delete an

object when you’re done with it this can become quite complex.

[6]

With

Java, the garbage collector is designed to take care of the problem of

releasing the memory (although this doesn’t include other aspects of

cleaning up an object). The garbage collector “knows” when an

object is no longer in use, and it then automatically releases the memory for

that object. This, combined with the fact that all objects are inherited from

the single root class

Object

and that you can create objects only one way, on the heap, makes the process of

programming in Java much simpler than programming in C++. You have far fewer

decisions to make and hurdles to overcome.

Garbage
collectors

vs.
efficiency and flexibility

all this is such a good idea, why didn’t they do the same thing in C++?

Well of course there’s a price you pay for all this programming

convenience, and that price is run-time overhead. As mentioned before, in C++

you can create objects on the stack, and in this case they’re

automatically cleaned up (but you don’t have the flexibility of creating

as many as you want at run-time). Creating objects on the stack is the most

efficient way to allocate storage for objects and to free that storage.

Creating objects on the heap can be much more expensive. Always inheriting from

a base class and making all function calls polymorphic also exacts a small

toll. But the garbage collector is a particular problem because you never quite

know when it’s going to start up or how long it will take. This means

that there’s an inconsistency in the rate of execution of a Java program,

so you can’t use it in certain situations, such as when the rate of

execution of a program is uniformly critical. (These are generally called

real
time

programs,

although not all real-time programming problems are this stringent.)

[7]

The

designers of the C++ language, trying to woo C programmers (and most

successfully, at that), did not want to add any features to the language that

would impact the speed or the use of C++ in any situation where C might be

used. This goal was realized, but at the price of greater complexity when

programming in C++. Java is simpler than C++, but the tradeoff is in efficiency

and sometimes applicability. For a significant portion of programming problems,

however, Java is often the superior choice.

[6]

Note that this is true only for objects that are created on the heap, with

new

However, the problem described, and indeed any general programming problem,

requires objects to be created on the heap.

[7]

According to a technical reader for this book, one existing real-time Java

implementation (www.newmonics.com) has guarantees on garbage collector

performance.

Contents

Object landscapes and lifetimes

Collections
and iterators

The
singly-rooted hierarchy

Collection
libraries and support

The
housekeeping dilemma:

CodeGuru Staff

Company

Categories

Object landscapes and lifetimes

Collections and iterators

The singly-rooted hierarchy

Collection libraries and support

The housekeeping dilemma:

CodeGuru Staff

Company

Categories

Collections
and iterators

The
singly-rooted hierarchy

Collection
libraries and support

The
housekeeping dilemma: