Text processing

you come from a C or C++ background, you might be skeptical at first of

Java’s power when it comes to handling text. Indeed, one drawback is that

execution speed is slower and that could hinder some of your efforts. However,

the tools (in particular the

String
class) are quite powerful, as the examples in this section show (and
performance improvements have been promised for Java).

you’ll see, these examples were created to solve problems that arose in

the creation of this book. However, they are not restricted to that and the

solutions they offer can easily be adapted to other situations. In addition,

they show the power of Java in an area that has not previously been emphasized

in this book.

Extracting
code listings

You’ve

no doubt noticed that each complete code listing (not code fragment) in this

book begins and ends with special comment tag marks ‘

//:

’

and ‘

///:~

’.

This meta-information is included so that the code can be automatically

extracted from the book into compilable source-code files. In my previous book,

I had a system that allowed me to automatically incorporate tested code files

into the book. In this book, however, I discovered that it was often easier to

paste the code into the book once it was initially tested and, since it’s

hard to get right the first time, to perform edits to the code within the book.

But how to extract it and test the code? This program is the answer, and it

could come in handy when you set out to solve a text processing problem. It

also demonstrates many of the

String

class features.

first save the entire book in ASCII text format into a separate file. The

CodePackager

program has two modes (which you can see described in

usageString

if you use the

-p

flag, it expects to see an input file containing the ASCII text from the book.

It will go through this file and use the comment tag marks to extract the code,

and it uses the file name on the first line to determine the name of the file.

In addition, it looks for the

package

statement in case it needs to put the file into a special directory (chosen via

the path indicated by the

package

statement).

But

that’s not all. It also watches for the change in chapters by keeping

track of the package names. Since all packages for each chapter begin with

c02

c03

c04

etc. to indicate the chapter where they belong

(except

for those beginning with

com

which are ignored for the purpose of keeping track of chapters), as long as the

first listing in each chapter contains a

package

statement with the chapter number, the

CodePackager

program can keep track of when the chapter changed and put all the subsequent

files in the new chapter subdirectory.

each file is extracted, it is placed into a

SourceCodeFile

object that is then placed into a collection. (This process will be more

thoroughly described later.) These

SourceCodeFile

objects could simply be stored in files, but that brings us to the second use

for this project. If you invoke

CodePackager
without

the

-p

flag it expects a “packed” file as input, which it will then

extract into separate files. So the

-p

flag means that the extracted files will be found “packed” into

this single file.

Why

bother with the packed file? Because different computer platforms have

different ways of storing text information in files. A big issue is the

end-of-line character or characters, but other issues can also exist. However,

Java has a special type of IO stream – the

DataOutputStream
–
which promises that, regardless of what machine the data is coming from, the
storage of that data will be in a form that can be correctly retrieved by any
other machine by using a DataInputStream.
That is, Java handles all of the platform-specific
details, which is a large part of the promise of Java. So the
-p
flag stores everything into a single file in a universal format. You download
this file and the Java program from the Web, and when you run
CodePackager
on this file
without
the
-p
flag the files will all be extracted to appropriate places on your system. (You
can specify an alternate subdirectory; otherwise the subdirectories will just
be created in the current directory.) To ensure that no system-specific formats
remain,
File
objects are used everywhere a path or a file is described. In addition,
there’s a sanity check: an empty file is placed in each subdirectory; the
name of that file indicates how many files you should find in that subdirectory.

Here

is the code, which will be described in detail at the end of the listing:

//: CodePackager.java
// "Packs" and "unpacks" the code in "Thinking 
// in Java" for cross-platform distribution.
/* Commented so CodePackager sees it and starts
   a new chapter directory, but so you don't
   have to worry about the directory where this
   program lives:
package c17;
*/
import java.util.*;
import java.io.*;
 
class Pr {
  static void error(String e) {
    System.err.println("ERROR: " + e);
    System.exit(1);
  }
}
 
class IO {
  static BufferedReader disOpen(File f) {
    BufferedReader in = null;
    try {
      in = new BufferedReader(
        new FileReader(f));
    } catch(IOException e) {
      Pr.error("could not open " + f);
    }
    return in;
  }
  static BufferedReader disOpen(String fname) {
    return disOpen(new File(fname));
  }
  static DataOutputStream dosOpen(File f) {
    DataOutputStream in = null;
    try {
      in = new DataOutputStream(
        new BufferedOutputStream(
          new FileOutputStream(f)));
    } catch(IOException e) {
      Pr.error("could not open " + f);
    }
    return in;
  }
  static DataOutputStream dosOpen(String fname) {
    return dosOpen(new File(fname));
  }
  static PrintWriter psOpen(File f) {
    PrintWriter in = null;
    try {
      in = new PrintWriter(
        new BufferedWriter(
          new FileWriter(f)));
    } catch(IOException e) {
      Pr.error("could not open " + f);
    }
    return in;
  }
  static PrintWriter psOpen(String fname) {
    return psOpen(new File(fname));
  }
  static void close(Writer os) {
    try {
      os.close();
    } catch(IOException e) {
      Pr.error("closing " + os);
    }
  }
  static void close(DataOutputStream os) {
    try {
      os.close();
    } catch(IOException e) {
      Pr.error("closing " + os);
    }
  }
  static void close(Reader os) {
    try {
      os.close();
    } catch(IOException e) {
      Pr.error("closing " + os);
    }
  }
}
 
class SourceCodeFile {
  public static final String
    startMarker = "//:", // Start of source file
    endMarker = "} ///:~", // End of source
    endMarker2 = "}; ///:~", // C++ file end
    beginContinue = "} ///:Continued",
    endContinue = "///:Continuing",
    packMarker = "###", // Packed file header tag
    eol = // Line separator on current system
      System.getProperty("line.separator"),
    filesep = // System's file path separator
      System.getProperty("file.separator");
  public static String copyright = "";
  static {
    try {
      BufferedReader cr =
        new BufferedReader(
          new FileReader("Copyright.txt"));
      String crin;
      while((crin = cr.readLine()) != null)
        copyright += crin + "n";
      cr.close();
    } catch(Exception e) {
      copyright = "";
    }
  }
  private String filename, dirname,
    contents = new String();
  private static String chapter = "c02";
  // The file name separator from the old system:
  public static String oldsep;
  public String toString() {
    return dirname + filesep + filename;
  }
  // Constructor for parsing from document file:
  public SourceCodeFile(String firstLine,
      BufferedReader in) {
    dirname = chapter;
    // Skip past marker:
    filename = firstLine.substring(
        startMarker.length()).trim();
    // Find space that terminates file name:
    if(filename.indexOf(' ') != -1)
      filename = filename.substring(
          0, filename.indexOf(' '));
    System.out.println("found: " + filename);
    contents = firstLine + eol;
    if(copyright.length() != 0)
      contents += copyright + eol;
    String s;
    boolean foundEndMarker = false;
    try {
      while((s = in.readLine()) != null) {
        if(s.startsWith(startMarker))
          Pr.error("No end of file marker for " +
            filename);
        // For this program, no spaces before 
        // the "package" keyword are allowed
        // in the input source code:
        else if(s.startsWith("package")) {
          // Extract package name:
          String pdir = s.substring(
            s.indexOf(' ')).trim();
          pdir = pdir.substring(
            0, pdir.indexOf(';')).trim();
          // Capture the chapter from the package
          // ignoring the 'com' subdirectories:
          if(!pdir.startsWith("com")) {
            int firstDot = pdir.indexOf('.');
            if(firstDot != -1)
              chapter =
                pdir.substring(0,firstDot);
            else
              chapter = pdir;
          }
          // Convert package name to path name:
          pdir = pdir.replace(
            '.', filesep.charAt(0));
          System.out.println("package " + pdir);
          dirname = pdir;
        }
        contents += s + eol;
        // Move past continuations:
        if(s.startsWith(beginContinue))
          while((s = in.readLine()) != null)
            if(s.startsWith(endContinue)) {
              contents += s + eol;
              break;
            }
        // Watch for end of code listing:
        if(s.startsWith(endMarker) ||
           s.startsWith(endMarker2)) {
          foundEndMarker = true;
          break;
        }
      }
      if(!foundEndMarker)
        Pr.error(
          "End marker not found before EOF");
      System.out.println("Chapter: " + chapter);
    } catch(IOException e) {
      Pr.error("Error reading line");
    }
  }
  // For recovering from a packed file:
  public SourceCodeFile(BufferedReader pFile) {
    try {
      String s = pFile.readLine();
      if(s == null) return;
      if(!s.startsWith(packMarker))
        Pr.error("Can't find " + packMarker
          + " in " + s);
      s = s.substring(
        packMarker.length()).trim();
      dirname = s.substring(0, s.indexOf('#'));
      filename = s.substring(s.indexOf('#') + 1);
      dirname = dirname.replace(
        oldsep.charAt(0), filesep.charAt(0));
      filename = filename.replace(
        oldsep.charAt(0), filesep.charAt(0));
      System.out.println("listing: " + dirname
        + filesep + filename);
      while((s = pFile.readLine()) != null) {
        // Watch for end of code listing:
        if(s.startsWith(endMarker) ||
           s.startsWith(endMarker2)) {
          contents += s;
          break;
        }
        contents += s + eol;
      }
    } catch(IOException e) {
      System.err.println("Error reading line");
    }
  }
  public boolean hasFile() {
    return filename != null;
  }
  public String directory() { return dirname; }
  public String filename() { return filename; }
  public String contents() { return contents; }
  // To write to a packed file:
  public void writePacked(DataOutputStream out) {
    try {
      out.writeBytes(
        packMarker + dirname + "#"
        + filename + eol);
      out.writeBytes(contents);
    } catch(IOException e) {
      Pr.error("writing " + dirname +
        filesep + filename);
    }
  }
  // To generate the actual file:
  public void writeFile(String rootpath) {
    File path = new File(rootpath, dirname);
    path.mkdirs();
    PrintWriter p =
      IO.psOpen(new File(path, filename));
    p.print(contents);
    IO.close(p);
  }
}
 
class DirMap {
  private Hashtable t = new Hashtable();
  private String rootpath;
  DirMap() {
    rootpath = System.getProperty("user.dir");
  }
  DirMap(String alternateDir) {
    rootpath = alternateDir;
  }
  public void add(SourceCodeFile f){
    String path = f.directory();
    if(!t.containsKey(path))
      t.put(path, new Vector());
    ((Vector)t.get(path)).addElement(f);
  }
  public void writePackedFile(String fname) {
    DataOutputStream packed = IO.dosOpen(fname);
    try {
      packed.writeBytes("###Old Separator:" +
        SourceCodeFile.filesep + "###n");
    } catch(IOException e) {
      Pr.error("Writing separator to " + fname);
    }
    Enumeration e = t.keys();
    while(e.hasMoreElements()) {
      String dir = (String)e.nextElement();
      System.out.println(
        "Writing directory " + dir);
      Vector v = (Vector)t.get(dir);
      for(int i = 0; i &lt; v.size(); i++) {
        SourceCodeFile f =
          (SourceCodeFile)v.elementAt(i);
        f.writePacked(packed);
      }
    }
    IO.close(packed);
  }
  // Write all the files in their directories:
  public void write() {
    Enumeration e = t.keys();
    while(e.hasMoreElements()) {
      String dir = (String)e.nextElement();
      Vector v = (Vector)t.get(dir);
      for(int i = 0; i &lt; v.size(); i++) {
        SourceCodeFile f =
          (SourceCodeFile)v.elementAt(i);
        f.writeFile(rootpath);
      }
      // Add file indicating file quantity
      // written to this directory as a check:
      IO.close(IO.dosOpen(
        new File(new File(rootpath, dir),
          Integer.toString(v.size())+".files")));
    }
  }
}
 
public class CodePackager {
  private static final String usageString =
  "usage: java CodePackager packedFileName" +
  "nExtracts source code files from packed n" +
  "version of Tjava.doc sources into " +
  "directories off current directoryn" +
  "java CodePackager packedFileName newDirn" +
  "Extracts into directories off newDirn" +
  "java CodePackager -p source.txt packedFile" +
  "nCreates packed version of source files" +
  "nfrom text version of Tjava.doc";
  private static void usage() {
    System.err.println(usageString);
    System.exit(1);
  }
  public static void main(String[] args) {
    if(args.length == 0) usage();
    if(args[0].equals("-p")) {
      if(args.length != 3)
        usage();
      createPackedFile(args);
    }
    else {
      if(args.length &gt; 2)
        usage();
      extractPackedFile(args);
    }
  }
  private static String currentLine;
  private static BufferedReader in;
  private static DirMap dm;
  private static void
  createPackedFile(String[] args) {
    dm = new DirMap();
    in = IO.disOpen(args[1]);
    try {
      while((currentLine = in.readLine())
          != null) {
        if(currentLine.startsWith(
            SourceCodeFile.startMarker)) {
          dm.add(new SourceCodeFile(
                   currentLine, in));
        }
        else if(currentLine.startsWith(
            SourceCodeFile.endMarker))
          Pr.error("file has no start marker");
        // Else ignore the input line
      }
    } catch(IOException e) {
      Pr.error("Error reading " + args[1]);
    }
    IO.close(in);
    dm.writePackedFile(args[2]);
  }
  private static void
  extractPackedFile(String[] args) {
    if(args.length == 2) // Alternate directory
      dm = new DirMap(args[1]);
    else // Current directory
      dm = new DirMap();
    in = IO.disOpen(args[0]);
    String s = null;
    try {
       s = in.readLine();
    } catch(IOException e) {
      Pr.error("Cannot read from " + in);
    }
    // Capture the separator used in the system
    // that packed the file:
    if(s.indexOf("###Old Separator:") != -1 ) {
      String oldsep = s.substring(
        "###Old Separator:".length());
      oldsep = oldsep.substring(
        0, oldsep. indexOf('#'));
      SourceCodeFile.oldsep = oldsep;
    }
    SourceCodeFile sf = new SourceCodeFile(in);
    while(sf.hasFile()) {
      dm.add(sf);
      sf = new SourceCodeFile(in);
    }
    dm.write();
  }
} ///:~

You’ll

first notice the

package

statement that is commented out. Since this is the first program in the

chapter, the

package

statement

is necessary to tell

CodePackager

that

the chapter has changed, but putting it in a package would be a problem. When

you create a

package

you tie the resulting program to a particular directory structure, which is

fine for most of the examples in this book. Here, however, the

CodePackager

program must be compiled and run from an arbitrary directory, so the

package

statement is commented out. It will still

look

like an ordinary

package

statement to

CodePackager

though, since the program isn’t sophisticated enough to detect multi-line

comments. (It has no need for such sophistication, a fact that comes in handy

here.)

The

first two classes are support/utility classes designed to make the rest of the

program more consistent to write and easier to read. The first,

is similar to the ANSI C library

perror

since it prints an error message (but also exits the program). The second class

encapsulates the creation of files, a process that was shown in Chapter 10 as

one that rapidly becomes verbose and annoying. In Chapter 10, the proposed

solution created new classes, but here

static

method

calls are used. Within those methods the appropriate exceptions are caught and

dealt with. These methods make the rest of the code much cleaner to read.

The

first class that helps solve the problem is

SourceCodeFile

which represents all the information (including the contents, file name, and

directory) for one source code file in the book. It also contains a set of

String

constants representing the markers that start and end a file, a marker used

inside the packed file, the current system’s end-of-line separator and

file path separator (notice the use of

System.getProperty( )

to get the local version), and a copyright notice, which is extracted from the

following file

Copyright.txt

//////////////////////////////////////////////////
// Copyright (c) Bruce Eckel, 1998
// Source code file from the book "Thinking in Java"
// All rights reserved EXCEPT as allowed by the
// following statements: You may freely use this file
// for your own work (personal or commercial),
// including modifications and distribution in
// executable form only. Permission is granted to use
// this file in classroom situations, including its
// use in presentation materials, as long as the book
// "Thinking in Java" is cited as the source. 
// Except in classroom situations, you may not copy
// and distribute this code; instead, the sole
// distribution point is http://www.BruceEckel.com 
// (and official mirror sites) where it is
// freely available. You may not remove this
// copyright and notice. You may not distribute
// modified versions of the source code in this
// package. You may not use this file in printed
// media without the express permission of the
// author. Bruce Eckel makes no representation about
// the suitability of this software for any purpose.
// It is provided "as is" without express or implied
// warranty of any kind, including any implied
// warranty of merchantability, fitness for a
// particular purpose or non-infringement. The entire
// risk as to the quality and performance of the
// software is with you. Bruce Eckel and the
// publisher shall not be liable for any damages
// suffered by you or any third party as a result of
// using or distributing software. In no event will
// Bruce Eckel or the publisher be liable for any
// lost revenue, profit, or data, or for direct,
// indirect, special, consequential, incidental, or
// punitive damages, however caused and regardless of
// the theory of liability, arising out of the use of
// or inability to use software, even if Bruce Eckel
// and the publisher have been advised of the
// possibility of such damages. Should the software
// prove defective, you assume the cost of all
// necessary servicing, repair, or correction. If you
// think you've found an error, please email all
// modified files with clearly commented changes to:
// Bruce@EckelObjects.com. (please use the same
// address for non-code errors found in the book).

//////////////////////////////////////////////////

When

extracting files from a packed file, the file separator of the system that

packed the file is also noted, so it can be replaced with the correct one for

the local system.

The

subdirectory name for the current chapter is kept in the field

chapter

which is initialized to

c02

(You’ll notice that the listing in Chapter 2 doesn’t contain a

package statement.) The only time that the

chapter

field changes is when a

package

statement is discovered in the current file.

Building
a packed file

The

first constructor is used to extract a file from the ASCII text version of this

book. The calling code (which appears further down in the listing) reads each

line in until it finds one that matches the beginning of a listing. At that

point, it creates a new

SourceCodeFile

object, passing it the first line (which has already been read by the calling

code) and the

BufferedReader
object from which to extract the rest of the source code listing.

this point, you begin to see heavy use of the

String

methods. To extract the file name, the overloaded version of

substring( )
is called that takes the starting offset and goes to the end of the
String.
This starting index is produced by finding the length( )
of the
startMarker.
trim( )
removes white space from both ends of the
String.
The first line can also have words after the name of the file; these are
detected using indexOf( ),
which returns -1 if it cannot find the character you’re looking for and
the value where the first instance of that character is found if it does.
Notice there is also an overloaded version of
indexOf( )
that takes a
String
instead of a character.

Once

the file name is parsed and stored, the first line is placed into the

contents
String

(which is used to hold the entire text of the source code listing). At this

point, the rest of the lines are read and concatenated into the

contents
String

It’s not quite that simple, since certain situations require special

handling. One case is error checking: if you run into a

startMarker

it means that no end marker was placed at the end of the listing that’s

currently being collected. This is an error condition that aborts the program.

The

second special case is the

package

keyword. Although Java is a free-form language, this program requires that the

package

keyword be at the beginning of the line. When the

package

keyword is seen, the package name is extracted by looking for the space at the

beginning and the semicolon at the end. (Note that this could also have been

performed in a single operation by using the overloaded

substring( )

that takes both the starting and ending indexes.) Then the dots in the package

name are replaced by the file separator, although an assumption is made here

that the file separator is only one character long. This is probably true on

all systems, but it’s a place to look if there are problems.

The

default behavior is to concatenate each line to

contents

along with the end-of-line string, until the

endMarker

is discovered, which indicates that the constructor should terminate. If the

end of the file is encountered before the

endMarker

is seen, that’s an error.

Extracting
from a packed file

The

second constructor is used to recover the source code files from a packed file.

Here, the calling method doesn’t have to worry about skipping over the

intermediate text. The file contains all the source-code files, placed

end-to-end. All you need to hand to this constructor is the

BufferedReader

where the information is coming from, and the constructor takes it from there.

There is some meta-information, however, at the beginning of each listing, and

this is denoted by the

packMarker

If the

packMarker

isn’t there, it means the caller is mistakenly trying to use this

constructor where it isn’t appropriate.

Once

the

packMarker

is found, it is stripped off and the directory name (terminated by a ‘

’)

and the file name (which goes to the end of the line) are extracted. In both

cases, the old separator character is replaced by the one that is current to

this machine using the

String
replace( )

method. The old separator is placed at the beginning of the packed file, and

you’ll see how that is extracted later in the listing.

The

rest of the constructor is quite simple. It reads and concatenates each line to

the

contents

until the

endMarker

is found.

Accessing
and writing the listings

The

next set of methods are simple accessors:

directory( )

filename( )

(notice the method can have the same spelling and capitalization as the field)

and

contents( )

and

hasFile( )

to indicate whether this object contains a file or not. (The need for this will

be seen later.)

The

final three methods are concerned with writing this code listing into a file,

either a packed file via

writePacked( )

or a Java source file via

writeFile( )

All

writePacked( )

needs is the

DataOutputStream,

which was opened elsewhere, and represents the file that’s being written.

It puts the header information on the first line and then calls

writeBytes( )

to write

contents

in a “universal” format.

When

writing the Java source file, the file must be created. This is done via

IO.psOpen( )

handing it a

File
object that contains not only the file name but also the path. But the question
now is: does this path exist? The user has the option of placing all the source
code directories into a completely different subdirectory, which might not even
exist. So before each file is written, File.mkdirs( )
is called with the path that you want to write the file into. This will make
the entire path all at once.

Containing
the entire collection of listings

It’s

convenient to organize the listings as subdirectories while the whole

collection is being built in memory. One reason is another sanity check: as

each subdirectory of listings is created, an additional file is added whose

name contains the number of files in that directory.

The

DirMap

class produces this effect and demonstrates the concept of a

“multimap.” This is implemented using a

Hashtable
whose keys are the subdirectories being created and whose values are Vector
objects containing the
SourceCodeFile
objects in that particular directory. Thus, instead of mapping a key to a
single value, the “multimap” maps a key to a set of values via the
associated
Vector.
Although this sounds complex, it’s remarkably straightforward to
implement. You’ll see that most of the size of the
DirMap
class is due to the portions that write to files, not to the
“multimap” implementation.

There

are two ways you can make a

DirMap

the default constructor assumes that you want the directories to branch off of

the current one, and the second constructor lets you specify an alternate

absolute path for the starting directory.

The

add( )

method is where quite a bit of dense action occurs. First, the

directory( )

is extracted from the

SourceCodeFile

you want to add, and then the

Hashtable

is examined to see if it contains that key already. If not, a new

Vector

is added to the

Hashtable

and associated with that key. At this point, the

Vector

is there, one way or another, and it is extracted so the

SourceCodeFile

can be added. Because

Vectors
can be easily combined with
Hashtables
like this, the power of both is amplified.

Writing

a packed file involves opening the file to write (as a

DataOutputStream
so the data is universally recoverable) and writing the header information
about the old separator on the first line. Next, an
Enumeration
of the
Hashtable
keys is produced and stepped through to select each directory and to fetch the
Vector
associated with that directory so each
SourceCodeFile
in that
Vector
can be written to the packed file.

Writing

the Java source files to their directories in

write( )

almost identical to

writePackedFile( )

since both methods simply call the appropriate method in

SourceCodeFile

Here, however, the root path is passed into

SourceCodeFile.writeFile( )

and when all the files have been written the additional file with the name

containing the number of files is also written.

The
main program

The

previously described classes are used within

CodePackager

First you see the usage string that gets printed whenever the end user invokes

the program incorrectly, along with the

usage( )

method that calls it and exits the program. All

main( )

does is determine whether you want to create a packed file or extract from one,

then it ensures the arguments are correct and calls the appropriate method.

When

a packed file is created, it’s assumed to be made in the current

directory, so the

DirMap

is created using the default constructor. After the file is opened each line is

read and examined for particular conditions:

If
the line starts with the starting marker for a source code listing, a new
SourceCodeFile
object is created. The constructor reads in the rest of the source listing. The
handle that results is directly added to the
DirMap.
If
the line starts with the end marker for a source code listing, something has
gone wrong, since end markers should be found only by the
SourceCodeFile
constructor.

When

extracting a packed file, the extraction can be into the current directory or

into an alternate directory, so the

DirMap

object is created accordingly. The file is opened and the first line is read.

The old file path separator information is extracted from this line. Then the

input is used to create the first

SourceCodeFile

object, which is added to the

DirMap

New

SourceCodeFile

objects are created and added as long as they contain a file. (The last one

created will simply return when it runs out of input and then

hasFile( )

will return false.)

Checking
capitalization style

Although

the previous example can come in handy as a guide for some project of your own

that involves text processing, this project will be directly useful because it

performs a style check to make sure that your capitalization conforms to the

de-facto Java style. It opens each

.java

file in the current directory and extracts all the class names and identifiers,

then shows you if any of them don’t meet the Java style.

For

the program to operate correctly, you must first build a class name repository

to hold all the class names in the standard Java library. You do this by moving

into all the source code subdirectories for the standard Java library and

running

ClassScanner

in each subdirectory. Provide as arguments the name of the repository file

(using the same path and name each time) and the

-a

command-line option to indicate that the class names should be added to the

repository.

use the program to check your code, run it and hand it the path and name of the

repository to use. It will check all the classes and identifiers in the current

directory and tell you which ones don’t follow the typical Java

capitalization style.

You

should be aware that the program isn’t perfect; there a few times when it

will point out what it thinks is a problem but on looking at the code

you’ll see that nothing needs to be changed. This is a little annoying,

but it’s still much easier than trying to find all these cases by staring

at your code.

The

explanation immediately follows the listing:

//: ClassScanner.java
// Scans all files in directory for classes
// and identifiers, to check capitalization.
// Assumes properly compiling code listings.
// Doesn't do everything right, but is a very
// useful aid.
import java.io.*;
import java.util.*;
 
class MultiStringMap extends Hashtable {
  public void add(String key, String value) {
    if(!containsKey(key))
      put(key, new Vector());
    ((Vector)get(key)).addElement(value);
  }
  public Vector getVector(String key) {
    if(!containsKey(key)) {
      System.err.println(
        "ERROR: can't find key: " + key);
      System.exit(1);
    }
    return (Vector)get(key);
  }
  public void printValues(PrintStream p) {
    Enumeration k = keys();
    while(k.hasMoreElements()) {
      String oneKey = (String)k.nextElement();
      Vector val = getVector(oneKey);
      for(int i = 0; i &lt; val.size(); i++)
        p.println((String)val.elementAt(i));
    }
  }
}
 
public class ClassScanner {
  private File path;
  private String[] fileList;
  private Properties classes = new Properties();
  private MultiStringMap
    classMap = new MultiStringMap(),
    identMap = new MultiStringMap();
  private StreamTokenizer in;
  public ClassScanner() {
    path = new File(".");
    fileList = path.list(new JavaFilter());
    for(int i = 0; i &lt; fileList.length; i++) {
      System.out.println(fileList[i]);
      scanListing(fileList[i]);
    }
  }
  void scanListing(String fname) {
    try {
      in = new StreamTokenizer(
          new BufferedReader(
            new FileReader(fname)));
      // Doesn't seem to work:
      // in.slashStarComments(true);
      // in.slashSlashComments(true);
      in.ordinaryChar('/');
      in.ordinaryChar('.');
      in.wordChars('_', '_');
      in.eolIsSignificant(true);
      while(in.nextToken() !=
            StreamTokenizer.TT_EOF) {
        if(in.ttype == '/')
          eatComments();
        else if(in.ttype ==
                StreamTokenizer.TT_WORD) {
          if(in.sval.equals("class") ||
             in.sval.equals("interface")) {
            // Get class name:
               while(in.nextToken() !=
                     StreamTokenizer.TT_EOF
                     &amp;&amp; in.ttype !=
                     StreamTokenizer.TT_WORD)
                 ;
               classes.put(in.sval, in.sval);
               classMap.add(fname, in.sval);
          }
          if(in.sval.equals("import") ||
             in.sval.equals("package"))
            discardLine();
          else // It's an identifier or keyword
            identMap.add(fname, in.sval);
        }
      }
    } catch(IOException e) {
      e.printStackTrace();
    }
  }
  void discardLine() {
    try {
      while(in.nextToken() !=
            StreamTokenizer.TT_EOF
            &amp;&amp; in.ttype !=
            StreamTokenizer.TT_EOL)
        ; // Throw away tokens to end of line
    } catch(IOException e) {
      e.printStackTrace();
    }
  }
  // StreamTokenizer's comment removal seemed
  // to be broken. This extracts them:
  void eatComments() {
    try {
      if(in.nextToken() !=
         StreamTokenizer.TT_EOF) {
        if(in.ttype == '/')
          discardLine();
        else if(in.ttype != '*')
          in.pushBack();
        else
          while(true) {
            if(in.nextToken() ==
              StreamTokenizer.TT_EOF)
              break;
            if(in.ttype == '*')
              if(in.nextToken() !=
                StreamTokenizer.TT_EOF
                &amp;&amp; in.ttype == '/')
                break;
          }
      }
    } catch(IOException e) {
      e.printStackTrace();
    }
  }
  public String[] classNames() {
    String[] result = new String[classes.size()];
    Enumeration e = classes.keys();
    int i = 0;
    while(e.hasMoreElements())
      result[i++] = (String)e.nextElement();
    return result;
  }
  public void checkClassNames() {
    Enumeration files = classMap.keys();
    while(files.hasMoreElements()) {
      String file = (String)files.nextElement();
      Vector cls = classMap.getVector(file);
      for(int i = 0; i &lt; cls.size(); i++) {
        String className =
          (String)cls.elementAt(i);
        if(Character.isLowerCase(
             className.charAt(0)))
          System.out.println(
            "class capitalization error, file: "
            + file + ", class: "
            + className);
      }
    }
  }
  public void checkIdentNames() {
    Enumeration files = identMap.keys();
    Vector reportSet = new Vector();
    while(files.hasMoreElements()) {
      String file = (String)files.nextElement();
      Vector ids = identMap.getVector(file);
      for(int i = 0; i &lt; ids.size(); i++) {
        String id =
          (String)ids.elementAt(i);
        if(!classes.contains(id)) {
          // Ignore identifiers of length 3 or
          // longer that are all uppercase
          // (probably static final values):
          if(id.length() &gt;= 3 &amp;&amp;
             id.equals(
               id.toUpperCase()))
            continue;
          // Check to see if first char is upper:
          if(Character.isUpperCase(id.charAt(0))){
            if(reportSet.indexOf(file + id)
                == -1){ // Not reported yet
              reportSet.addElement(file + id);
              System.out.println(
                "Ident capitalization error in:"
                + file + ", ident: " + id);
            }
          }
        }
      }
    }
  }
  static final String usage =
    "Usage: n" +
    "ClassScanner classnames -an" +
    "tAdds all the class names in this n" +
    "tdirectory to the repository file n" +
    "tcalled 'classnames'n" +
    "ClassScanner classnamesn" +
    "tChecks all the java files in this n" +
    "tdirectory for capitalization errors, n" +
    "tusing the repository file 'classnames'";
  private static void usage() {
    System.err.println(usage);
    System.exit(1);
  }
  public static void main(String[] args) {
    if(args.length &lt; 1 || args.length &gt; 2)
      usage();
    ClassScanner c = new ClassScanner();
    File old = new File(args[0]);
    if(old.exists()) {
      try {
        // Try to open an existing 
        // properties file:
        InputStream oldlist =
          new BufferedInputStream(
            new FileInputStream(old));
        c.classes.load(oldlist);
        oldlist.close();
      } catch(IOException e) {
        System.err.println("Could not open "
          + old + " for reading");
        System.exit(1);
      }
    }
    if(args.length == 1) {
      c.checkClassNames();
      c.checkIdentNames();
    }
    // Write the class names to a repository:
    if(args.length == 2) {
      if(!args[1].equals("-a"))
        usage();
      try {
        BufferedOutputStream out =
          new BufferedOutputStream(
            new FileOutputStream(args[0]));
        c.classes.save(out,
          "Classes found by ClassScanner.java");
        out.close();
      } catch(IOException e) {
        System.err.println(
          "Could not write " + args[0]);
        System.exit(1);
      }
    }
  }
}
 
class JavaFilter implements FilenameFilter {
  public boolean accept(File dir, String name) {
    // Strip path information:
    String f = new File(name).getName();
    return f.trim().endsWith(".java");
  }
} ///:~

The

class

MultiStringMap

is a tool that allows you to map a group of strings onto each key entry. As in

the previous example, it uses a

Hashtable
(this time with inheritance) with the key as the single string that’s
mapped onto the
Vector
value. The
add( )
method simply checks to see if there’s a key already in the
Hashtable,
and if not it puts one there. The
getVector( )
method produces a
Vector
for a particular key, and
printValues( ),
which is primarily useful for debugging, prints out all the values
Vector
by
Vector.

keep life simple, the class names from the standard Java libraries are all put

into a

Properties
object (from the standard Java library). Remember that a
Properties
object is a
Hashtable
that holds only

String

objects for both the key and value entries. However, it can be saved to disk
and restored from disk in one method call, so it’s ideal for the
repository of names. Actually, we need only a list of names, and a
Hashtable
can’t accept
null
for either its key or its value entry. So the same object will be used for both
the key and the value.

For

the classes and identifiers that are discovered for the files in a particular

directory, two

MultiStringMap

are used:

classMap

and

identMap

Also, when the program starts up it loads the standard class name repository

into the

Properties

object

called

classes

and when a new class name is found in the local directory that is also added to

classes

well as to

classMap

This way,

classMap

can be used to step through all the classes in the local directory, and

classes

can be used to see if the current token is a class name (which indicates a

definition of an object or method is beginning, so grab the next tokens –

until a semicolon – and put them into

identMap

The

default constructor for

ClassScanner

creates a list of file names (using the

JavaFilter

implementation of

FilenameFilter,
as described in Chapter 10). Then it calls
scanListing( )
for each file name.

Inside

scanListing( )

the source code file is opened and turned into a

StreamTokenizer.
In the documentation, passing
true
to
slashStarComments( )
and
slashSlashComments( )
is supposed to strip those comments out, but this seems to be a bit flawed (it
doesn’t quite work in Java 1.0 ).
Instead, those lines are commented out and the comments are extracted by
another method. To do this, the ‘
/’
must be captured as an ordinary character rather than letting the
StreamTokenizer
absorb it as part of a comment, and the
ordinaryChar( )
method tells the
StreamTokenizer
to
do
this. This is also true for dots (‘
.’),
since we want to have the method calls pulled apart into individual
identifiers. However, the underscore, which is ordinarily treated by
StreamTokenizer
as an individual character, should be left as part of identifiers since it
appears in such
static
final
values as
TT_EOF
etc., used in this very program. The
wordChars( )
method
takes a range of characters you want to add to those that are left inside a
token that is being parsed as a word. Finally, when parsing for one-line
comments or discarding a line we need to know when an end-of-line occurs, so by
calling
eolIsSignificant(true)
the eol will show up rather than being absorbed by the
StreamTokenizer.

The

rest of

scanListing( )

reads and reacts to tokens until the end of the file, signified when

nextToken( )

returns the

final
static

value

StreamTokenizer.TT_EOF

the token is a

‘

’

it is potentially a comment, so

eatComments( )

is called to deal with it. The only other situation we’re interested in

here is if it’s a word, of which there are some special cases.

the word is

class

interface

then the next token represents a class or interface name, and it is put into

classes

and

classMap

If the word is

import

package

then we don’t want the rest of the line. Anything else must be an

identifier (which we’re interested in) or a keyword (which we’re

not, but they’re all lowercase anyway so it won’t spoil things to

put those in). These are added to

identMap

The

discardLine( )

method is a simple tool that looks for the end of a line. Note that any time

you get a new token, you must check for the end of the file.

The

eatComments( )

method is called whenever a forward slash is encountered in the main parsing

loop. However, that doesn’t necessarily mean a comment has been found, so

the next token must be extracted to see if it’s another forward slash (in

which case the line is discarded) or an asterisk. But if it’s neither of

those, it means the token you’ve just pulled out is needed back in the

main parsing loop! Fortunately, the

pushBack( )
method allows you to “push back” the current token onto the input
stream so that when the main parsing loop calls nextToken( )
it will get the one you just pushed back.

For

convenience, the

classNames( )

method produces an array of all the names in the

classes

collection. This method is not used in the program but is helpful for debugging.

The

next two methods are the ones in which the actual checking takes place. In

checkClassNames( )

the class names are extracted from the

classMap

(which, remember, contains only the names in this directory, organized by file

name so the file name can be printed along with the errant class name). This is

accomplished by pulling each associated

Vector

and stepping through that, looking to see if the first character is lower case.

If so, the appropriate error message is printed.

checkIdentNames( )

a similar approach is taken: each identifier name is extracted from

identMap

If the name is not in the

classes

list, it’s assumed to be an identifier or keyword. A special case is

checked: if the identifier length is 3 or more

and

all the characters are uppercase, this identifier is ignored because it’s

probably a

static
final

value such as

TT_EOF

Of course, this is not a perfect algorithm, but it assumes that you’ll

eventually notice any all-uppercase identifiers that are out of place.

Instead

of reporting every identifier that starts with an uppercase character, this

method keeps track of which ones have already been reported in a

Vector

called

reportSet( )

This treats the

Vector

as a “set” that tells you whether an item is already in the set.

The item is produced by concatenating the file name and identifier. If the

element isn’t in the set, it’s added and then the report is made.

The

rest of the listing is comprised of

main( )

which busies itself by handling the command line arguments and figuring out

whether you’re building a repository of class names from the standard

Java library or checking the validity of code you’ve written. In both

cases it makes a

ClassScanner

object.

Whether

you’re building a repository or using one, you must try to open the

existing repository. By making a

File
object and testing for existence, you can decide whether to open the file and
load( )
the
Properties
list
classes
inside
ClassScanner.
(The classes from the repository add to, rather than overwrite, the classes
found by the
ClassScanner
constructor.) If you provide only one command-line argument it means that you
want to perform a check of the class names and identifier names, but if you
provide two arguments (the second being “
-a”)
you’re
building a class name repository. In this case, an output file is opened and
the method
Properties.save( )
is used to write the list into a file, along with a string that provides header
file information.
Contents

Extracting
code listings

Checking
capitalization style

CodeGuru Staff

Company

Categories

Text processing

Extracting code listings

Checking capitalization style

CodeGuru Staff

Company

Categories

Extracting
code listings

Checking
capitalization style