Full Text Search: The Key to Better Natural Language Queries for NoSQL in Node.js
|Bruce Eckel's Thinking in Java||Contents | Prev | Next|
Technically, OOP is just about abstract data typing, inheritance and polymorphism, but other issues can be at least as important. The remainder of this section will cover these issues.
One of the most important factors is the way objects are created and destroyed. Where is the data for an object and how is the lifetime of the object controlled? There are different philosophies at work here. C++ takes the approach that control of efficiency is the most important issue, so it gives the programmer a choice. For maximum run-time speed, the storage and lifetime can be determined while the program is being written, by placing the objects on the stack (these are sometimes called automatic or scoped variables) or in the static storage area. This places a priority on the speed of storage allocation and release, and control of these can be very valuable in some situations. However, you sacrifice flexibility because you must know the exact quantity, lifetime and type of objects while you’re writing the program. If you are trying to solve a more general problem such as computer-aided design, warehouse management or air-traffic control, this is too restrictive.
The second approach is to create objects dynamically in a pool of memory called the heap. In this approach you don’t know until run time how many objects you need, what their lifetime is or what their exact type is. Those are determined at the spur of the moment while the program is running. If you need a new object, you simply make it on the heap at the point that you need it. Because the storage is managed dynamically, at run time, the amount of time required to allocate storage on the heap is significantly longer than the time to create storage on the stack. (Creating storage on the stack is often a single assembly instruction to move the stack pointer down, and another to move it back up.) The dynamic approach makes the generally logical assumption that objects tend to be complicated, so the extra overhead of finding storage and releasing that storage will not have an important impact on the creation of an object. In addition, the greater flexibility is essential to solve the general programming problem.
C++ allows you to determine whether the objects are created while you write the program or at run time to allow the control of efficiency. You might think that since it’s more flexible, you’d always want to create objects on the heap rather than the stack. There’s another issue, however, and that’s the lifetime of an object. If you create an object on the stack or in static storage, the compiler determines how long the object lasts and can automatically destroy it. However, if you create it on the heap the compiler has no knowledge of its lifetime. A programmer has two options for destroying objects: you can determine programmatically when to destroy the object, or the environment can provide a feature called a garbage collector that automatically discovers when an object is no longer in use and destroys it. Of course, a garbage collector is much more convenient, but it requires that all applications must be able to tolerate the existence of the garbage collector and the other overhead for garbage collection. This does not meet the design requirements of the C++ language and so it was not included, but Java does have a garbage collector (as does Smalltalk; Delphi does not but one could be added. Third-party garbage collectors exist for C++).
Collections and iterators
If you don’t know how many objects you’re going to need to solve a particular problem, or how long they will last, you also don’t know how to store those objects. How can you know how much space to create for those objects? You can’t, since that information isn’t known until run time.
The solution to most problems in object-oriented design seems flippant: you create another type of object. The new type of object that solves this particular problem holds handles to other objects. Of course, you can do the same thing with an array, which is available in most languages. But there’s more. This new object, generally called a collection (also called a container, but the AWT uses that term in a different sense so this book will use “collection”), will expand itself whenever necessary to accommodate everything you place inside it. So you don’t need to know how many objects you’re going to hold in a collection. Just create a collection object and let it take care of the details.
Fortunately, a good OOP language comes with a set of collections as part of the package. In C++, it’s the Standard Template Library (STL). Object Pascal has collections in its Visual Component Library (VCL). Smalltalk has a very complete set of collections. Java also has collections in its standard library. In some libraries, a generic collection is considered good enough for all needs, and in others (C++ in particular) the library has different types of collections for different needs: a vector for consistent access to all elements, and a linked list for consistent insertion at all elements, for example, so you can choose the particular type that fits your needs. These may include sets, queues, hash tables, trees, stacks, etc.
All collections have some way to put things in and get things out. The way that you place something into a collection is fairly obvious. There’s a function called “push” or “add” or a similar name. Fetching things out of a collection is not always as apparent; if it’s an array-like entity such as a vector, you might be able to use an indexing operator or function. But in many situations this doesn’t make sense. Also, a single-selection function is restrictive. What if you want to manipulate or compare a set of elements in the collection instead of just one?
The solution is an iterator, which is an object whose job is to select the elements within a collection and present them to the user of the iterator. As a class, it also provides a level of abstraction. This abstraction can be used to separate the details of the collection from the code that’s accessing that collection. The collection, via the iterator, is abstracted to be simply a sequence. The iterator allows you to traverse that sequence without worrying about the underlying structure – that is, whether it’s a vector, a linked list, a stack or something else. This gives you the flexibility to easily change the underlying data structure without disturbing the code in your program. Java began (in version 1.0 and 1.1) with a standard iterator, called Enumeration, for all of its collection classes. Java 1.2 has added a much more complete collection library which contains an iterator called Iterator that does more than the older Enumeration.
From the design standpoint, all you really want is a sequence that can be manipulated to solve your problem. If a single type of sequence satisfied all of your needs, there’d be no reason to have different kinds. There are two reasons that you need a choice of collections. First, collections provide different types of interfaces and external behavior. A stack has a different interface and behavior than that of a queue, which is different than that of a set or a list. One of these might provide a more flexible solution to your problem than the other. Second, different collections have different efficiencies for certain operations. The best example is a vector and a list. Both are simple sequences that can have identical interfaces and external behaviors. But certain operations can have radically different costs. Randomly accessing elements in a vector is a constant-time operation; it takes the same amount of time regardless of the element you select. However, in a linked list it is expensive to move through the list to randomly select an element, and it takes longer to find an element if it is further down the list. On the other hand, if you want to insert an element in the middle of a sequence, it’s much cheaper in a list than in a vector. These and other operations have different efficiencies depending upon the underlying structure of the sequence. In the design phase, you might start with a list and, when tuning for performance, change to a vector. Because of the abstraction via iterators, you can change from one to the other with minimal impact on your code.
In the end, remember that a collection is only a storage cabinet to put objects in. If that cabinet solves all of your needs, it doesn’t really matter how it is implemented (a basic concept with most types of objects). If you’re working in a programming environment that has built-in overhead due to other factors (running under Windows, for example, or the cost of a garbage collector), then the cost difference between a vector and a linked list might not matter. You might need only one type of sequence. You can even imagine the “perfect” collection abstraction, which can automatically change its underlying implementation according to the way it is used.
The singly-rooted hierarchy
One of the issues in OOP that has become especially prominent since the introduction of C++ is whether all classes should ultimately be inherited from a single base class. In Java (as with virtually all other OOP languages) the answer is “yes” and the name of this ultimate base class is simply Object. It turns out that the benefits of the singly-rooted hierarchy are many.
All objects in a singly-rooted hierarchy have an interface in common, so they are all ultimately the same type. The alternative (provided by C++) is that you don’t know that everything is the same fundamental type. From a backwards-compatibility standpoint this fits the model of C better and can be thought of as less restrictive, but when you want to do full-on object-oriented programming you must then build your own hierarchy to provide the same convenience that’s built into other OOP languages. And in any new class library you acquire, some other incompatible interface will be used. It requires effort (and possibly multiple inheritance) to work the new interface into your design. Is the extra “flexibility” of C++ worth it? If you need it – if you have a large investment in C – it’s quite valuable. If you’re starting from scratch, other alternatives such as Java can often be more productive.
All objects in a singly-rooted hierarchy (such as Java provides) can be guaranteed to have certain functionality. You know you can perform certain basic operations on every object in your system. A singly-rooted hierarchy, along with creating all objects on the heap, greatly simplifies argument passing (one of the more complex topics in C++).
A singly-rooted hierarchy makes it much easier to implement a garbage collector. The necessary support can be installed in the base class, and the garbage collector can thus send the appropriate messages to every object in the system. Without a singly-rooted hierarchy and a system to manipulate an object via a handle, it is difficult to implement a garbage collector.
Since run-time type information is guaranteed to be in all objects, you’ll never end up with an object whose type you cannot determine. This is especially important with system level operations, such as exception handling, and to allow greater flexibility in programming.
You may wonder why, if it’s so beneficial, a singly-rooted hierarchy isn’t it in C++. It’s the old bugaboo of efficiency and control. A singly-rooted hierarchy puts constraints on your program designs, and in particular it was perceived to put constraints on the use of existing C code. These constraints cause problems only in certain situations, but for maximum flexibility there is no requirement for a singly-rooted hierarchy in C++. In Java, which started from scratch and has no backward-compatibility issues with any existing language, it was a logical choice to use the singly-rooted hierarchy in common with most other object-oriented programming languages.
Collection libraries and support
for easy collection use
Because a collection is a tool that you’ll use frequently, it makes sense to have a library of collections that are built in a reusable fashion, so you can take one off the shelf and plug it into your program. Java provides such a library, although it is fairly limited in Java 1.0 and 1.1 (the Java 1.2 collections library, however, satisfies most needs).Downcasting vs. templates/generics
To make these collections reusable, they contain the one universal type in Java that was previously mentioned: Object. The singly-rooted hierarchy means that everything is an Object, so a collection that holds Objects can hold anything. This makes it easy to reuse.
To use such a collection, you simply add object handles to it, and later ask for them back. But, since the collection holds only Objects, when you add your object handle into the collection it is upcast to Object, thus losing its identity. When you fetch it back, you get an Object handle, and not a handle to the type that you put in. So how do you turn it back into something that has the useful interface of the object that you put into the collection?
Here, the cast is used again, but this time you’re not casting up the inheritance hierarchy to a more general type, you cast down the hierarchy to a more specific type. This manner of casting is called downcasting. With upcasting, you know, for example, that a Circle is a type of Shape so it’s safe to upcast, but you don’t know that an Object is necessarily a Circle or a Shape so it’s hardly safe to downcast unless you know that’s what you’re dealing with.
It’s not completely dangerous, however, because if you downcast to the wrong thing you’ll get a run-time error called an exception, which will be described shortly. When you fetch object handles from a collection, though, you must have some way to remember exactly what they are so you can perform a proper downcast.
Downcasting and the run-time checks require extra time for the running program, and extra effort from the programmer. Wouldn’t it make sense to somehow create the collection so that it knows the types that it holds, eliminating the need for the downcast and possible mistake? The solution is parameterized types , which are classes that the compiler can automatically customize to work with particular types. For example, with a parameterized collection, the compiler could customize that collection so that it would accept only Shapes and fetch only Shapes.
Parameterized types are an important part of C++, partly because C++ has no singly-rooted hierarchy. In C++, the keyword that implements parameterized types is template. Java currently has no parameterized types since it is possible for it to get by – however awkwardly – using the singly-rooted hierarchy. At one point the word generic (the keyword used by Ada for its templates) was on a list of keywords that were “reserved for future implementation.” Some of these seemed to have mysteriously slipped into a kind of “keyword Bermuda Triangle” and it’s difficult to know what might eventually happen.
The housekeeping dilemma:
Each object requires resources in order to exist, most notably memory. When an object is no longer needed it must be cleaned up so that these resources are released for reuse. In simple programming situations the question of how an object is cleaned up doesn’t seem too challenging: you create the object, use it for as long as it’s needed, and then it should be destroyed. It’s not too hard, however, to encounter situations in which the situation is more complex.
Suppose, for example, you are designing a system to manage air traffic for an airport. (The same model might also work for managing crates in a warehouse, or a video rental system, or a kennel for boarding pets.) At first it seems simple: make a collection to hold airplanes, then create a new airplane and place it in the collection for each airplane that enters the air-traffic-control zone. For cleanup, simply delete the appropriate airplane object when a plane leaves the zone.
But perhaps you have some other system to record data about the planes; perhaps data that doesn’t require such immediate attention as the main controller function. Maybe it’s a record of the flight plans of all the small planes that leave the airport. So you have a second collection of small planes, and whenever you create a plane object you also put it in this collection if it’s a small plane. Then some background process performs operations on the objects in this collection during idle moments.
Now the problem is more difficult: how can you possibly know when to destroy the objects? When you’re done with the object, some other part of the system might not be. This same problem can arise in a number of other situations, and in programming systems (such as C++) in which you must explicitly delete an object when you’re done with it this can become quite complex. 
With Java, the garbage collector is designed to take care of the problem of releasing the memory (although this doesn’t include other aspects of cleaning up an object). The garbage collector “knows” when an object is no longer in use, and it then automatically releases the memory for that object. This, combined with the fact that all objects are inherited from the single root class Object and that you can create objects only one way, on the heap, makes the process of programming in Java much simpler than programming in C++. You have far fewer decisions to make and hurdles to overcome.Garbage collectors
vs. efficiency and flexibility
If all this is such a good idea, why didn’t they do the same thing in C++? Well of course there’s a price you pay for all this programming convenience, and that price is run-time overhead. As mentioned before, in C++ you can create objects on the stack, and in this case they’re automatically cleaned up (but you don’t have the flexibility of creating as many as you want at run-time). Creating objects on the stack is the most efficient way to allocate storage for objects and to free that storage. Creating objects on the heap can be much more expensive. Always inheriting from a base class and making all function calls polymorphic also exacts a small toll. But the garbage collector is a particular problem because you never quite know when it’s going to start up or how long it will take. This means that there’s an inconsistency in the rate of execution of a Java program, so you can’t use it in certain situations, such as when the rate of execution of a program is uniformly critical. (These are generally called real time programs, although not all real-time programming problems are this stringent.) 
The designers of the C++ language, trying to woo C programmers (and most successfully, at that), did not want to add any features to the language that would impact the speed or the use of C++ in any situation where C might be used. This goal was realized, but at the price of greater complexity when programming in C++. Java is simpler than C++, but the tradeoff is in efficiency and sometimes applicability. For a significant portion of programming problems, however, Java is often the superior choice.
 Note that this is true only for objects that are created on the heap, with new. However, the problem described, and indeed any general programming problem, requires objects to be created on the heap.
 According to a technical reader for this book, one existing real-time Java implementation (www.newmonics.com) has guarantees on garbage collector performance.