Text processing

Bruce Eckel’s Thinking in Java Contents | Prev | Next

Extracting
code listings

You’ve
no doubt noticed that each complete code listing (not code fragment) in this
book begins and ends with special comment tag marks ‘
//:
and ‘
///:~’.
This meta-information is included so that the code can be automatically
extracted from the book into compilable source-code files. In my previous book,
I had a system that allowed me to automatically incorporate tested code files
into the book. In this book, however, I discovered that it was often easier to
paste the code into the book once it was initially tested and, since it’s
hard to get right the first time, to perform edits to the code within the book.
But how to extract it and test the code? This program is the answer, and it
could come in handy when you set out to solve a text processing problem. It
also demonstrates many of the
String
class features.

I
first save the entire book in ASCII text format into a separate file. The
CodePackager
program has two modes (which you can see described in
usageString):
if you use the
-p
flag, it expects to see an input file containing the ASCII text from the book.
It will go through this file and use the comment tag marks to extract the code,
and it uses the file name on the first line to determine the name of the file.
In addition, it looks for the
package
statement in case it needs to put the file into a special directory (chosen via
the path indicated by the
package
statement).

But
that’s not all. It also watches for the change in chapters by keeping
track of the package names. Since all packages for each chapter begin with
c02,
c03,
c04,
etc. to indicate the chapter where they belong

(except
for those beginning with
com,
which are ignored for the purpose of keeping track of chapters), as long as the
first listing in each chapter contains a
package
statement with the chapter number, the
CodePackager
program can keep track of when the chapter changed and put all the subsequent
files in the new chapter subdirectory.

As
each file is extracted, it is placed into a
SourceCodeFile
object that is then placed into a collection. (This process will be more
thoroughly described later.) These
SourceCodeFile
objects could simply be stored in files, but that brings us to the second use
for this project. If you invoke
CodePackager
without
the
-p
flag it expects a “packed” file as input, which it will then
extract into separate files. So the
-p
flag means that the extracted files will be found “packed” into
this single file.

Why
bother with the packed file? Because different computer platforms have
different ways of storing text information in files. A big issue is the
end-of-line character or characters, but other issues can also exist. However,
Java has a special type of IO stream – the
DataOutputStream

which promises that, regardless of what machine the data is coming from, the
storage of that data will be in a form that can be correctly retrieved by any
other machine by using a
DataInputStream.
That is, Java handles all of the
platform-specific
details, which is a large part of the promise of Java. So the
-p
flag stores everything into a single file in a universal format. You download
this file and the Java program from the Web, and when you run
CodePackager
on this file
without
the
-p
flag the files will all be extracted to appropriate places on your system. (You
can specify an alternate subdirectory; otherwise the subdirectories will just
be created in the current directory.) To ensure that no system-specific formats
remain,
File
objects are used everywhere a path or a file is described. In addition,
there’s a sanity check: an empty file is placed in each subdirectory; the
name of that file indicates how many files you should find in that subdirectory.

Here
is the code, which will be described in detail at the end of the listing:

//: CodePackager.java
// "Packs" and "unpacks" the code in "Thinking 
// in Java" for cross-platform distribution.
/* Commented so CodePackager sees it and starts
   a new chapter directory, but so you don't
   have to worry about the directory where this
   program lives:
package c17;
*/
import java.util.*;
import java.io.*;
 
class Pr {
  static void error(String e) {
    System.err.println("ERROR: " + e);
    System.exit(1);
  }
}
 
class IO {
  static BufferedReader disOpen(File f) {
    BufferedReader in = null;
    try {
      in = new BufferedReader(
        new FileReader(f));
    } catch(IOException e) {
      Pr.error("could not open " + f);
    }
    return in;
  }
  static BufferedReader disOpen(String fname) {
    return disOpen(new File(fname));
  }
  static DataOutputStream dosOpen(File f) {
    DataOutputStream in = null;
    try {
      in = new DataOutputStream(
        new BufferedOutputStream(
          new FileOutputStream(f)));
    } catch(IOException e) {
      Pr.error("could not open " + f);
    }
    return in;
  }
  static DataOutputStream dosOpen(String fname) {
    return dosOpen(new File(fname));
  }
  static PrintWriter psOpen(File f) {
    PrintWriter in = null;
    try {
      in = new PrintWriter(
        new BufferedWriter(
          new FileWriter(f)));
    } catch(IOException e) {
      Pr.error("could not open " + f);
    }
    return in;
  }
  static PrintWriter psOpen(String fname) {
    return psOpen(new File(fname));
  }
  static void close(Writer os) {
    try {
      os.close();
    } catch(IOException e) {
      Pr.error("closing " + os);
    }
  }
  static void close(DataOutputStream os) {
    try {
      os.close();
    } catch(IOException e) {
      Pr.error("closing " + os);
    }
  }
  static void close(Reader os) {
    try {
      os.close();
    } catch(IOException e) {
      Pr.error("closing " + os);
    }
  }
}
 
class SourceCodeFile {
  public static final String
    startMarker = "//:", // Start of source file
    endMarker = "} ///:~", // End of source
    endMarker2 = "}; ///:~", // C++ file end
    beginContinue = "} ///:Continued",
    endContinue = "///:Continuing",
    packMarker = "###", // Packed file header tag
    eol = // Line separator on current system
      System.getProperty("line.separator"),
    filesep = // System's file path separator
      System.getProperty("file.separator");
  public static String copyright = "";
  static {
    try {
      BufferedReader cr =
        new BufferedReader(
          new FileReader("Copyright.txt"));
      String crin;
      while((crin = cr.readLine()) != null)
        copyright += crin + "n";
      cr.close();
    } catch(Exception e) {
      copyright = "";
    }
  }
  private String filename, dirname,
    contents = new String();
  private static String chapter = "c02";
  // The file name separator from the old system:
  public static String oldsep;
  public String toString() {
    return dirname + filesep + filename;
  }
  // Constructor for parsing from document file:
  public SourceCodeFile(String firstLine,
      BufferedReader in) {
    dirname = chapter;
    // Skip past marker:
    filename = firstLine.substring(
        startMarker.length()).trim();
    // Find space that terminates file name:
    if(filename.indexOf(' ') != -1)
      filename = filename.substring(
          0, filename.indexOf(' '));
    System.out.println("found: " + filename);
    contents = firstLine + eol;
    if(copyright.length() != 0)
      contents += copyright + eol;
    String s;
    boolean foundEndMarker = false;
    try {
      while((s = in.readLine()) != null) {
        if(s.startsWith(startMarker))
          Pr.error("No end of file marker for " +
            filename);
        // For this program, no spaces before 
        // the "package" keyword are allowed
        // in the input source code:
        else if(s.startsWith("package")) {
          // Extract package name:
          String pdir = s.substring(
            s.indexOf(' ')).trim();
          pdir = pdir.substring(
            0, pdir.indexOf(';')).trim();
          // Capture the chapter from the package
          // ignoring the 'com' subdirectories:
          if(!pdir.startsWith("com")) {
            int firstDot = pdir.indexOf('.');
            if(firstDot != -1)
              chapter =
                pdir.substring(0,firstDot);
            else
              chapter = pdir;
          }
          // Convert package name to path name:
          pdir = pdir.replace(
            '.', filesep.charAt(0));
          System.out.println("package " + pdir);
          dirname = pdir;
        }
        contents += s + eol;
        // Move past continuations:
        if(s.startsWith(beginContinue))
          while((s = in.readLine()) != null)
            if(s.startsWith(endContinue)) {
              contents += s + eol;
              break;
            }
        // Watch for end of code listing:
        if(s.startsWith(endMarker) ||
           s.startsWith(endMarker2)) {
          foundEndMarker = true;
          break;
        }
      }
      if(!foundEndMarker)
        Pr.error(
          "End marker not found before EOF");
      System.out.println("Chapter: " + chapter);
    } catch(IOException e) {
      Pr.error("Error reading line");
    }
  }
  // For recovering from a packed file:
  public SourceCodeFile(BufferedReader pFile) {
    try {
      String s = pFile.readLine();
      if(s == null) return;
      if(!s.startsWith(packMarker))
        Pr.error("Can't find " + packMarker
          + " in " + s);
      s = s.substring(
        packMarker.length()).trim();
      dirname = s.substring(0, s.indexOf('#'));
      filename = s.substring(s.indexOf('#') + 1);
      dirname = dirname.replace(
        oldsep.charAt(0), filesep.charAt(0));
      filename = filename.replace(
        oldsep.charAt(0), filesep.charAt(0));
      System.out.println("listing: " + dirname
        + filesep + filename);
      while((s = pFile.readLine()) != null) {
        // Watch for end of code listing:
        if(s.startsWith(endMarker) ||
           s.startsWith(endMarker2)) {
          contents += s;
          break;
        }
        contents += s + eol;
      }
    } catch(IOException e) {
      System.err.println("Error reading line");
    }
  }
  public boolean hasFile() {
    return filename != null;
  }
  public String directory() { return dirname; }
  public String filename() { return filename; }
  public String contents() { return contents; }
  // To write to a packed file:
  public void writePacked(DataOutputStream out) {
    try {
      out.writeBytes(
        packMarker + dirname + "#"
        + filename + eol);
      out.writeBytes(contents);
    } catch(IOException e) {
      Pr.error("writing " + dirname +
        filesep + filename);
    }
  }
  // To generate the actual file:
  public void writeFile(String rootpath) {
    File path = new File(rootpath, dirname);
    path.mkdirs();
    PrintWriter p =
      IO.psOpen(new File(path, filename));
    p.print(contents);
    IO.close(p);
  }
}
 
class DirMap {
  private Hashtable t = new Hashtable();
  private String rootpath;
  DirMap() {
    rootpath = System.getProperty("user.dir");
  }
  DirMap(String alternateDir) {
    rootpath = alternateDir;
  }
  public void add(SourceCodeFile f){
    String path = f.directory();
    if(!t.containsKey(path))
      t.put(path, new Vector());
    ((Vector)t.get(path)).addElement(f);
  }
  public void writePackedFile(String fname) {
    DataOutputStream packed = IO.dosOpen(fname);
    try {
      packed.writeBytes("###Old Separator:" +
        SourceCodeFile.filesep + "###n");
    } catch(IOException e) {
      Pr.error("Writing separator to " + fname);
    }
    Enumeration e = t.keys();
    while(e.hasMoreElements()) {
      String dir = (String)e.nextElement();
      System.out.println(
        "Writing directory " + dir);
      Vector v = (Vector)t.get(dir);
      for(int i = 0; i < v.size(); i++) {
        SourceCodeFile f =
          (SourceCodeFile)v.elementAt(i);
        f.writePacked(packed);
      }
    }
    IO.close(packed);
  }
  // Write all the files in their directories:
  public void write() {
    Enumeration e = t.keys();
    while(e.hasMoreElements()) {
      String dir = (String)e.nextElement();
      Vector v = (Vector)t.get(dir);
      for(int i = 0; i < v.size(); i++) {
        SourceCodeFile f =
          (SourceCodeFile)v.elementAt(i);
        f.writeFile(rootpath);
      }
      // Add file indicating file quantity
      // written to this directory as a check:
      IO.close(IO.dosOpen(
        new File(new File(rootpath, dir),
          Integer.toString(v.size())+".files")));
    }
  }
}
 
public class CodePackager {
  private static final String usageString =
  "usage: java CodePackager packedFileName" +
  "nExtracts source code files from packed n" +
  "version of Tjava.doc sources into " +
  "directories off current directoryn" +
  "java CodePackager packedFileName newDirn" +
  "Extracts into directories off newDirn" +
  "java CodePackager -p source.txt packedFile" +
  "nCreates packed version of source files" +
  "nfrom text version of Tjava.doc";
  private static void usage() {
    System.err.println(usageString);
    System.exit(1);
  }
  public static void main(String[] args) {
    if(args.length == 0) usage();
    if(args[0].equals("-p")) {
      if(args.length != 3)
        usage();
      createPackedFile(args);
    }
    else {
      if(args.length > 2)
        usage();
      extractPackedFile(args);
    }
  }
  private static String currentLine;
  private static BufferedReader in;
  private static DirMap dm;
  private static void
  createPackedFile(String[] args) {
    dm = new DirMap();
    in = IO.disOpen(args[1]);
    try {
      while((currentLine = in.readLine())
          != null) {
        if(currentLine.startsWith(
            SourceCodeFile.startMarker)) {
          dm.add(new SourceCodeFile(
                   currentLine, in));
        }
        else if(currentLine.startsWith(
            SourceCodeFile.endMarker))
          Pr.error("file has no start marker");
        // Else ignore the input line
      }
    } catch(IOException e) {
      Pr.error("Error reading " + args[1]);
    }
    IO.close(in);
    dm.writePackedFile(args[2]);
  }
  private static void
  extractPackedFile(String[] args) {
    if(args.length == 2) // Alternate directory
      dm = new DirMap(args[1]);
    else // Current directory
      dm = new DirMap();
    in = IO.disOpen(args[0]);
    String s = null;
    try {
       s = in.readLine();
    } catch(IOException e) {
      Pr.error("Cannot read from " + in);
    }
    // Capture the separator used in the system
    // that packed the file:
    if(s.indexOf("###Old Separator:") != -1 ) {
      String oldsep = s.substring(
        "###Old Separator:".length());
      oldsep = oldsep.substring(
        0, oldsep. indexOf('#'));
      SourceCodeFile.oldsep = oldsep;
    }
    SourceCodeFile sf = new SourceCodeFile(in);
    while(sf.hasFile()) {
      dm.add(sf);
      sf = new SourceCodeFile(in);
    }
    dm.write();
  }
} ///:~ 

You’ll
first notice the
package
statement that is commented out. Since this is the first program in the
chapter, the
package
statement
is necessary to tell
CodePackager
that
the chapter has changed, but putting it in a package would be a problem. When
you create a
package,
you tie the resulting program to a particular directory structure, which is
fine for most of the examples in this book. Here, however, the
CodePackager
program must be compiled and run from an arbitrary directory, so the
package
statement is commented out. It will still
look
like an ordinary
package
statement to
CodePackager,
though, since the program isn’t sophisticated enough to detect multi-line
comments. (It has no need for such sophistication, a fact that comes in handy
here.)

The
first two classes are support/utility classes designed to make the rest of the
program more consistent to write and easier to read. The first,
Pr,
is similar to the ANSI C library
perror,
since it prints an error message (but also exits the program). The second class
encapsulates the creation of files, a process that was shown in Chapter 10 as
one that rapidly becomes verbose and annoying. In Chapter 10, the proposed
solution created new classes, but here
static
method
calls are used. Within those methods the appropriate exceptions are caught and
dealt with. These methods make the rest of the code much cleaner to read.

The
first class that helps solve the problem is
SourceCodeFile,
which represents all the information (including the contents, file name, and
directory) for one source code file in the book. It also contains a set of
String
constants representing the markers that start and end a file, a marker used
inside the packed file, the current system’s end-of-line separator and
file path separator (notice the use of
System.getProperty( )
to get the local version), and a copyright notice, which is extracted from the
following file
Copyright.txt.

//////////////////////////////////////////////////
// Copyright (c) Bruce Eckel, 1998
// Source code file from the book "Thinking in Java"
// All rights reserved EXCEPT as allowed by the
// following statements: You may freely use this file
// for your own work (personal or commercial),
// including modifications and distribution in
// executable form only. Permission is granted to use
// this file in classroom situations, including its
// use in presentation materials, as long as the book
// "Thinking in Java" is cited as the source. 
// Except in classroom situations, you may not copy
// and distribute this code; instead, the sole
// distribution point is http://www.BruceEckel.com 
// (and official mirror sites) where it is
// freely available. You may not remove this
// copyright and notice. You may not distribute
// modified versions of the source code in this
// package. You may not use this file in printed
// media without the express permission of the
// author. Bruce Eckel makes no representation about
// the suitability of this software for any purpose.
// It is provided "as is" without express or implied
// warranty of any kind, including any implied
// warranty of merchantability, fitness for a
// particular purpose or non-infringement. The entire
// risk as to the quality and performance of the
// software is with you. Bruce Eckel and the
// publisher shall not be liable for any damages
// suffered by you or any third party as a result of
// using or distributing software. In no event will
// Bruce Eckel or the publisher be liable for any
// lost revenue, profit, or data, or for direct,
// indirect, special, consequential, incidental, or
// punitive damages, however caused and regardless of
// the theory of liability, arising out of the use of
// or inability to use software, even if Bruce Eckel
// and the publisher have been advised of the
// possibility of such damages. Should the software
// prove defective, you assume the cost of all
// necessary servicing, repair, or correction. If you
// think you've found an error, please email all
// modified files with clearly commented changes to:
// Bruce@EckelObjects.com. (please use the same
// address for non-code errors found in the book).
//////////////////////////////////////////////////

When
extracting files from a packed file, the file separator of the system that
packed the file is also noted, so it can be replaced with the correct one for
the local system.

The
subdirectory name for the current chapter is kept in the field
chapter,
which is initialized to
c02.
(You’ll notice that the listing in Chapter 2 doesn’t contain a
package statement.) The only time that the
chapter
field changes is when a
package
statement is discovered in the current file.


Building
a packed file

Once
the file name is parsed and stored, the first line is placed into the
contents
String
(which is used to hold the entire text of the source code listing). At this
point, the rest of the lines are read and concatenated into the
contents
String.
It’s not quite that simple, since certain situations require special
handling. One case is error checking: if you run into a
startMarker,
it means that no end marker was placed at the end of the listing that’s
currently being collected. This is an error condition that aborts the program.

The
second special case is the
package
keyword. Although Java is a free-form language, this program requires that the
package
keyword be at the beginning of the line. When the
package
keyword is seen, the package name is extracted by looking for the space at the
beginning and the semicolon at the end. (Note that this could also have been
performed in a single operation by using the overloaded
substring( )
that takes both the starting and ending indexes.) Then the dots in the package
name are replaced by the file separator, although an assumption is made here
that the file separator is only one character long. This is probably true on
all systems, but it’s a place to look if there are problems.

The
default behavior is to concatenate each line to
contents,
along with the end-of-line string, until the
endMarker
is discovered, which indicates that the constructor should terminate. If the
end of the file is encountered before the
endMarker
is seen, that’s an error.


Extracting
from a packed file

The
second constructor is used to recover the source code files from a packed file.
Here, the calling method doesn’t have to worry about skipping over the
intermediate text. The file contains all the source-code files, placed
end-to-end. All you need to hand to this constructor is the
BufferedReader
where the information is coming from, and the constructor takes it from there.
There is some meta-information, however, at the beginning of each listing, and
this is denoted by the
packMarker.
If the
packMarker
isn’t there, it means the caller is mistakenly trying to use this
constructor where it isn’t appropriate.

The
rest of the constructor is quite simple. It reads and concatenates each line to
the
contents
until the
endMarker
is found.


Accessing
and writing the listings

The
next set of methods are simple accessors:
directory( ),
filename( )
(notice the method can have the same spelling and capitalization as the field)
and
contents( ),
and
hasFile( )
to indicate whether this object contains a file or not. (The need for this will
be seen later.)

The
final three methods are concerned with writing this code listing into a file,
either a packed file via
writePacked( )
or a Java source file via
writeFile( ).
All
writePacked( )
needs is the
DataOutputStream,
which was opened elsewhere, and represents the file that’s being written.
It puts the header information on the first line and then calls
writeBytes( )
to write
contents
in a “universal” format.


Containing
the entire collection of listings

It’s
convenient to organize the listings as subdirectories while the whole
collection is being built in memory. One reason is another sanity check: as
each subdirectory of listings is created, an additional file is added whose
name contains the number of files in that directory.

There
are two ways you can make a
DirMap:
the default constructor assumes that you want the directories to branch off of
the current one, and the second constructor lets you specify an alternate
absolute path for the starting directory.

Writing
the Java source files to their directories in
write( )
is
almost identical to
writePackedFile( )
since both methods simply call the appropriate method in
SourceCodeFile.
Here, however, the root path is passed into
SourceCodeFile.writeFile( )
and when all the files have been written the additional file with the name
containing the number of files is also written.


The
main program

The
previously described classes are used within
CodePackager.
First you see the usage string that gets printed whenever the end user invokes
the program incorrectly, along with the
usage( )
method that calls it and exits the program. All
main( )
does is determine whether you want to create a packed file or extract from one,
then it ensures the arguments are correct and calls the appropriate method.

When
a packed file is created, it’s assumed to be made in the current
directory, so the
DirMap
is created using the default constructor. After the file is opened each line is
read and examined for particular conditions:

  1. If
    the line starts with the starting marker for a source code listing, a new
    SourceCodeFile
    object is created. The constructor reads in the rest of the source listing. The
    handle that results is directly added to the
    DirMap.
  2. If
    the line starts with the end marker for a source code listing, something has
    gone wrong, since end markers should be found only by the
    SourceCodeFile
    constructor.

Checking
capitalization style

Although
the previous example can come in handy as a guide for some project of your own
that involves text processing, this project will be directly useful because it
performs a style check to make sure that your capitalization conforms to the
de-facto Java style. It opens each
.java
file in the current directory and extracts all the class names and identifiers,
then shows you if any of them don’t meet the Java style.

For
the program to operate correctly, you must first build a class name repository
to hold all the class names in the standard Java library. You do this by moving
into all the source code subdirectories for the standard Java library and
running
ClassScanner
in each subdirectory. Provide as arguments the name of the repository file
(using the same path and name each time) and the
-a
command-line option to indicate that the class names should be added to the
repository.

To
use the program to check your code, run it and hand it the path and name of the
repository to use. It will check all the classes and identifiers in the current
directory and tell you which ones don’t follow the typical Java
capitalization style.

You
should be aware that the program isn’t perfect; there a few times when it
will point out what it thinks is a problem but on looking at the code
you’ll see that nothing needs to be changed. This is a little annoying,
but it’s still much easier than trying to find all these cases by staring
at your code.

The
explanation immediately follows the listing:

//: ClassScanner.java
// Scans all files in directory for classes
// and identifiers, to check capitalization.
// Assumes properly compiling code listings.
// Doesn't do everything right, but is a very
// useful aid.
import java.io.*;
import java.util.*;
 
class MultiStringMap extends Hashtable {
  public void add(String key, String value) {
    if(!containsKey(key))
      put(key, new Vector());
    ((Vector)get(key)).addElement(value);
  }
  public Vector getVector(String key) {
    if(!containsKey(key)) {
      System.err.println(
        "ERROR: can't find key: " + key);
      System.exit(1);
    }
    return (Vector)get(key);
  }
  public void printValues(PrintStream p) {
    Enumeration k = keys();
    while(k.hasMoreElements()) {
      String oneKey = (String)k.nextElement();
      Vector val = getVector(oneKey);
      for(int i = 0; i < val.size(); i++)
        p.println((String)val.elementAt(i));
    }
  }
}
 
public class ClassScanner {
  private File path;
  private String[] fileList;
  private Properties classes = new Properties();
  private MultiStringMap
    classMap = new MultiStringMap(),
    identMap = new MultiStringMap();
  private StreamTokenizer in;
  public ClassScanner() {
    path = new File(".");
    fileList = path.list(new JavaFilter());
    for(int i = 0; i < fileList.length; i++) {
      System.out.println(fileList[i]);
      scanListing(fileList[i]);
    }
  }
  void scanListing(String fname) {
    try {
      in = new StreamTokenizer(
          new BufferedReader(
            new FileReader(fname)));
      // Doesn't seem to work:
      // in.slashStarComments(true);
      // in.slashSlashComments(true);
      in.ordinaryChar('/');
      in.ordinaryChar('.');
      in.wordChars('_', '_');
      in.eolIsSignificant(true);
      while(in.nextToken() !=
            StreamTokenizer.TT_EOF) {
        if(in.ttype == '/')
          eatComments();
        else if(in.ttype ==
                StreamTokenizer.TT_WORD) {
          if(in.sval.equals("class") ||
             in.sval.equals("interface")) {
            // Get class name:
               while(in.nextToken() !=
                     StreamTokenizer.TT_EOF
                     && in.ttype !=
                     StreamTokenizer.TT_WORD)
                 ;
               classes.put(in.sval, in.sval);
               classMap.add(fname, in.sval);
          }
          if(in.sval.equals("import") ||
             in.sval.equals("package"))
            discardLine();
          else // It's an identifier or keyword
            identMap.add(fname, in.sval);
        }
      }
    } catch(IOException e) {
      e.printStackTrace();
    }
  }
  void discardLine() {
    try {
      while(in.nextToken() !=
            StreamTokenizer.TT_EOF
            && in.ttype !=
            StreamTokenizer.TT_EOL)
        ; // Throw away tokens to end of line
    } catch(IOException e) {
      e.printStackTrace();
    }
  }
  // StreamTokenizer's comment removal seemed
  // to be broken. This extracts them:
  void eatComments() {
    try {
      if(in.nextToken() !=
         StreamTokenizer.TT_EOF) {
        if(in.ttype == '/')
          discardLine();
        else if(in.ttype != '*')
          in.pushBack();
        else
          while(true) {
            if(in.nextToken() ==
              StreamTokenizer.TT_EOF)
              break;
            if(in.ttype == '*')
              if(in.nextToken() !=
                StreamTokenizer.TT_EOF
                && in.ttype == '/')
                break;
          }
      }
    } catch(IOException e) {
      e.printStackTrace();
    }
  }
  public String[] classNames() {
    String[] result = new String[classes.size()];
    Enumeration e = classes.keys();
    int i = 0;
    while(e.hasMoreElements())
      result[i++] = (String)e.nextElement();
    return result;
  }
  public void checkClassNames() {
    Enumeration files = classMap.keys();
    while(files.hasMoreElements()) {
      String file = (String)files.nextElement();
      Vector cls = classMap.getVector(file);
      for(int i = 0; i < cls.size(); i++) {
        String className =
          (String)cls.elementAt(i);
        if(Character.isLowerCase(
             className.charAt(0)))
          System.out.println(
            "class capitalization error, file: "
            + file + ", class: "
            + className);
      }
    }
  }
  public void checkIdentNames() {
    Enumeration files = identMap.keys();
    Vector reportSet = new Vector();
    while(files.hasMoreElements()) {
      String file = (String)files.nextElement();
      Vector ids = identMap.getVector(file);
      for(int i = 0; i < ids.size(); i++) {
        String id =
          (String)ids.elementAt(i);
        if(!classes.contains(id)) {
          // Ignore identifiers of length 3 or
          // longer that are all uppercase
          // (probably static final values):
          if(id.length() >= 3 &&
             id.equals(
               id.toUpperCase()))
            continue;
          // Check to see if first char is upper:
          if(Character.isUpperCase(id.charAt(0))){
            if(reportSet.indexOf(file + id)
                == -1){ // Not reported yet
              reportSet.addElement(file + id);
              System.out.println(
                "Ident capitalization error in:"
                + file + ", ident: " + id);
            }
          }
        }
      }
    }
  }
  static final String usage =
    "Usage: n" +
    "ClassScanner classnames -an" +
    "tAdds all the class names in this n" +
    "tdirectory to the repository file n" +
    "tcalled 'classnames'n" +
    "ClassScanner classnamesn" +
    "tChecks all the java files in this n" +
    "tdirectory for capitalization errors, n" +
    "tusing the repository file 'classnames'";
  private static void usage() {
    System.err.println(usage);
    System.exit(1);
  }
  public static void main(String[] args) {
    if(args.length < 1 || args.length > 2)
      usage();
    ClassScanner c = new ClassScanner();
    File old = new File(args[0]);
    if(old.exists()) {
      try {
        // Try to open an existing 
        // properties file:
        InputStream oldlist =
          new BufferedInputStream(
            new FileInputStream(old));
        c.classes.load(oldlist);
        oldlist.close();
      } catch(IOException e) {
        System.err.println("Could not open "
          + old + " for reading");
        System.exit(1);
      }
    }
    if(args.length == 1) {
      c.checkClassNames();
      c.checkIdentNames();
    }
    // Write the class names to a repository:
    if(args.length == 2) {
      if(!args[1].equals("-a"))
        usage();
      try {
        BufferedOutputStream out =
          new BufferedOutputStream(
            new FileOutputStream(args[0]));
        c.classes.save(out,
          "Classes found by ClassScanner.java");
        out.close();
      } catch(IOException e) {
        System.err.println(
          "Could not write " + args[0]);
        System.exit(1);
      }
    }
  }
}
 
class JavaFilter implements FilenameFilter {
  public boolean accept(File dir, String name) {
    // Strip path information:
    String f = new File(name).getName();
    return f.trim().endsWith(".java");
  }
} ///:~ 

For
the classes and identifiers that are discovered for the files in a particular
directory, two
MultiStringMaps
are used:
classMap
and
identMap.
Also, when the program starts up it loads the standard class name repository
into the
Properties
object
called
classes,
and when a new class name is found in the local directory that is also added to
classes
as
well as to
classMap.
This way,
classMap
can be used to step through all the classes in the local directory, and
classes
can be used to see if the current token is a class name (which indicates a
definition of an object or method is beginning, so grab the next tokens –
until a semicolon – and put them into
identMap).

Inside
scanListing( )
the source code file is opened and turned into a
StreamTokenizer.
In the documentation, passing
true
to
slashStarComments( )
and
slashSlashComments( )
is supposed to strip those comments out, but this seems to be a bit flawed (it
doesn’t quite work in Java 1.0
).
Instead, those lines are commented out and the comments are extracted by
another method. To do this, the ‘
/
must be captured as an ordinary character rather than letting the
StreamTokenizer
absorb it as part of a comment, and the
ordinaryChar( )
method tells the
StreamTokenizer
to
do
this. This is also true for dots (‘
.’),
since we want to have the method calls pulled apart into individual
identifiers. However, the underscore, which is ordinarily treated by
StreamTokenizer
as an individual character, should be left as part of identifiers since it
appears in such
static
final
values as
TT_EOF
etc., used in this very program. The
wordChars( )
method
takes a range of characters you want to add to those that are left inside a
token that is being parsed as a word. Finally, when parsing for one-line
comments or discarding a line we need to know when an end-of-line occurs, so by
calling
eolIsSignificant(true)
the eol will show up rather than being absorbed by the
StreamTokenizer.

The
rest of
scanListing( )
reads and reacts to tokens until the end of the file, signified when
nextToken( )
returns the
final
static
value
StreamTokenizer.TT_EOF.

If
the token is a
/
it is potentially a comment, so
eatComments( )
is called to deal with it. The only other situation we’re interested in
here is if it’s a word, of which there are some special cases.

If
the word is
class
or
interface
then the next token represents a class or interface name, and it is put into
classes
and
classMap.
If the word is
import
or
package,
then we don’t want the rest of the line. Anything else must be an
identifier (which we’re interested in) or a keyword (which we’re
not, but they’re all lowercase anyway so it won’t spoil things to
put those in). These are added to
identMap.

The
discardLine( )
method is a simple tool that looks for the end of a line. Note that any time
you get a new token, you must check for the end of the file.

For
convenience, the
classNames( )
method produces an array of all the names in the
classes
collection. This method is not used in the program but is helpful for debugging.

The
next two methods are the ones in which the actual checking takes place. In
checkClassNames( ),
the class names are extracted from the
classMap
(which, remember, contains only the names in this directory, organized by file
name so the file name can be printed along with the errant class name). This is
accomplished by pulling each associated
Vector
and stepping through that, looking to see if the first character is lower case.
If so, the appropriate error message is printed.

In
checkIdentNames( ),
a similar approach is taken: each identifier name is extracted from
identMap.
If the name is not in the
classes
list, it’s assumed to be an identifier or keyword. A special case is
checked: if the identifier length is 3 or more
and
all the characters are uppercase, this identifier is ignored because it’s
probably a
static
final
value such as
TT_EOF.
Of course, this is not a perfect algorithm, but it assumes that you’ll
eventually notice any all-uppercase identifiers that are out of place.

Instead
of reporting every identifier that starts with an uppercase character, this
method keeps track of which ones have already been reported in a
Vector
called
reportSet( ).
This treats the
Vector
as a “set” that tells you whether an item is already in the set.
The item is produced by concatenating the file name and identifier. If the
element isn’t in the set, it’s added and then the report is made.

The
rest of the listing is comprised of
main( ),
which busies itself by handling the command line arguments and figuring out
whether you’re building a repository of class names from the standard
Java library or checking the validity of code you’ve written. In both
cases it makes a
ClassScanner
object.

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read