SHARE
Facebook X Pinterest WhatsApp

StreamTokenizer

Bruce Eckel’s Thinking in Java Contents | Prev | Next Although StreamTokenizer is not derived from InputStream or OutputStream, it works only with InputStream objects, so it rightfully belongs in the IO portion of the library. The StreamTokenizer class is used to break any InputStream into a sequence of “tokens,” which are bits of text […]

Written By
thumbnail
CodeGuru Staff
CodeGuru Staff
Mar 1, 2001
CodeGuru content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Although


StreamTokenizer
is not derived from
InputStream
or
OutputStream,
it works only with
InputStream
objects, so it rightfully belongs in the IO portion of the library.

The


StreamTokenizer

class is used to break any


InputStream

into a sequence of


“tokens,” which are bits of text delimited by whatever you choose.
For example, your tokens could be words, and then they would be delimited by
white space and punctuation.

Consider


a program to count the occurrence of words in a text file:

//: SortedWordCount.java
// Counts words in a file, outputs
// results in sorted form.
import java.io.*;
import java.util.*;
import c08.*; // Contains StrSortVector
 
class Counter {
  private int i = 1;
  int read() { return i; }
  void increment() { i++; }
}
 
public class SortedWordCount {
  private FileInputStream file;
  private StreamTokenizer st;
  private Hashtable counts = new Hashtable();
  SortedWordCount(String filename)
    throws FileNotFoundException {
    try {
      file = new FileInputStream(filename);
      st = new StreamTokenizer(file);
      st.ordinaryChar('.');
      st.ordinaryChar('-');
    } catch(FileNotFoundException e) {
      System.out.println(
        "Could not open " + filename);
      throw e;
    }
  }
  void cleanup() {
    try {
      file.close();
    } catch(IOException e) {
      System.out.println(
        "file.close() unsuccessful");
    }
  }
  void countWords() {
    try {
      while(st.nextToken() !=
        StreamTokenizer.TT_EOF) {
        String s;
        switch(st.ttype) {
          case StreamTokenizer.TT_EOL:
            s = new String("EOL");
            break;
          case StreamTokenizer.TT_NUMBER:
            s = Double.toString(st.nval);
            break;
          case StreamTokenizer.TT_WORD:
            s = st.sval; // Already a String
            break;
          default: // single character in ttype
            s = String.valueOf((char)st.ttype);
        }
        if(counts.containsKey(s))
          ((Counter)counts.get(s)).increment();
        else
          counts.put(s, new Counter());
      }
    } catch(IOException e) {
      System.out.println(
        "st.nextToken() unsuccessful");
    }
  }
  Enumeration values() {
    return counts.elements();
  }
  Enumeration keys() { return counts.keys(); }
  Counter getCounter(String s) {
    return (Counter)counts.get(s);
  }
  Enumeration sortedKeys() {
    Enumeration e = counts.keys();
    StrSortVector sv = new StrSortVector();
    while(e.hasMoreElements())
      sv.addElement((String)e.nextElement());
    // This call forces a sort:
    return sv.elements();
  }
  public static void main(String[] args) {
    try {
      SortedWordCount wc =
        new SortedWordCount(args[0]);
      wc.countWords();
      Enumeration keys = wc.sortedKeys();
      while(keys.hasMoreElements()) {
        String key = (String)keys.nextElement();
        System.out.println(key + ": "
                 + wc.getCounter(key).read());
      }
      wc.cleanup();
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
} ///:~ 

It


makes sense to present these in a sorted form, but since Java 1.0


and Java 1.1

don’t have any sorting methods, that will have to be mixed in. This is
easy enough to do with a
StrSortVector.
(This was created in Chapter 8, and is part of the package created in that
chapter. Remember that the starting directory for all the subdirectories in
this book must be in your class path for the program to compile successfully.)

To


open the file, a


FileInputStream

is used, and to turn the file into words a


StreamTokenizer
is created from the
FileInputStream.
In
StreamTokenizer,
there is a default list of separators, and you can add more with a set of
methods. Here,
ordinaryChar( )
is used to say “This character has no significance that I’m
interested in,” so the parser doesn’t include it as part of any of
the words that it creates. For example, saying
st.ordinaryChar(‘.’)
means that periods will not be included as parts of the words that are parsed.
You can find more information in the online documentation that comes with Java.

In


countWords( )

,


the tokens are pulled one at a time from the stream, and the


ttype

information is used to determine what to do with each token, since a token can


be an end-of-line, a number, a string, or a single character.

Once


a token is found, the

Hashtable
counts

is queried to see if it already contains the token as a key. If it does, the
corresponding
Counter
object is incremented to indicate that another instance of this word has been
found. If not, a new
Counter
is created – since the
Counter
constructor initializes its value to one, this also acts to count the word.
SortedWordCount

is not a type of


Hashtable

,


so it wasn’t inherited. It performs a specific type of functionality, so


even though the


keys( )

and


values( )

methods must be re-exposed, that still doesn’t mean that

inheritance
should be used since a number of
Hashtable
methods are inappropriate here. In addition, other methods like
getCounter( ),
which get the
Counter
for a particular
String,
and
sortedKeys( ),
which produces an
Enumeration,
finish the change in the shape of
SortedWordCount’s
interface.

In


main( )

you can see the use of a


SortedWordCount

to open and count the words in a file – it just takes two lines of code.


Then an enumeration to a sorted list of keys (words) is extracted, and this is


used to pull out each key and associated


Count

.


Note that the call to


cleanup( )

is necessary to ensure that the file is closed.

A


second example using


StreamTokenizer

can be found in Chapter 17.


StringTokenizer

Although


it isn’t part of the IO library, the


StringTokenizer

has sufficiently similar functionality to


StreamTokenizer

that it will be described here.

The


StringTokenizer
returns the tokens within a string one at a time. These tokens are consecutive
characters delimited by tabs, spaces, and newlines. Thus, the tokens of the
string “Where is my cat?” are “Where”,
“is”, “my”, and “cat?” Like the
StreamTokenizer,
you
can tell the
StringTokenizer
to break up the input in any way that you want, but with
StringTokenizer
you
do this by passing a second argument to the constructor, which is a
String
of
the delimiters you wish to use. In general, if you need more sophistication,
use a
StreamTokenizer.

You


ask a


StringTokenizer

object for the next token in the string using the


nextToken( )

method, which either returns the token or an empty string to indicate that no


tokens remain.

As


an example, the following program performs a limited analysis of a sentence,


looking for key phrase sequences to indicate whether happiness or sadness is


implied.

//: AnalyzeSentence.java
// Look for particular sequences
// within sentences.
import java.util.*;
 
public class AnalyzeSentence {
  public static void main(String[] args) {
    analyze("I am happy about this");
    analyze("I am not happy about this");
    analyze("I am not! I am happy");
    analyze("I am sad about this");
    analyze("I am not sad about this");
    analyze("I am not! I am sad");
    analyze("Are you happy about this?");
    analyze("Are you sad about this?");
    analyze("It's you! I am happy");
    analyze("It's you! I am sad");
  }
  static StringTokenizer st;
  static void analyze(String s) {
    prt("nnew sentence >> " + s);
    boolean sad = false;
    st = new StringTokenizer(s);
    while (st.hasMoreTokens()) {
      String token = next();
      // Look until you find one of the
      // two starting tokens:
      if(!token.equals("I") &&
         !token.equals("Are"))
        continue; // Top of while loop
      if(token.equals("I")) {
        String tk2 = next();
        if(!tk2.equals("am")) // Must be after I
          break; // Out of while loop
        else {
          String tk3 = next();
          if(tk3.equals("sad")) {
            sad = true;
            break; // Out of while loop
          }
          if (tk3.equals("not")) {
            String tk4 = next();
            if(tk4.equals("sad"))
              break; // Leave sad false
            if(tk4.equals("happy")) {
              sad = true;
              break;
            }
          }
        }
      }
      if(token.equals("Are")) {
        String tk2 = next();
        if(!tk2.equals("you"))
          break; // Must be after Are
        String tk3 = next();
        if(tk3.equals("sad"))
          sad = true;
        break; // Out of while loop
      }
    }
    if(sad) prt("Sad detected");
  }
  static String next() {
    if(st.hasMoreTokens()) {
      String s = st.nextToken();
      prt(s);
      return s;
    }
    else
      return "";
  }
  static void prt(String s) {
    System.out.println(s);
  }
} ///:~ 

For


each string being analyzed, a


while

loop is entered and tokens are pulled off the string. Notice the first


if

statement, which says to


continue

(go back to the beginning of the loop and start again) if the token is neither


an “I” nor an “Are.”


This


means that it will get tokens until an “I” or an “Are”


is found. You might think to use the


==

instead of the

equals( )
method, but that won’t work correctly, since
==
compares handle values while
equals( )
compares contents.

The


logic of the rest of the


analyze( )

method is that the pattern that’s being searched for is “I am


sad,” “I am not happy,” or “Are you sad?” Without


the


break

statement, the code for this would be even messier than it is. You should be


aware that a typical parser (this is a primitive example of one) normally has a


table of these tokens and a piece of code that moves through the states in the


table as new tokens are read.

You


should think of the


StringTokenizer

only as shorthand for a simple and specific kind of


StreamTokenizer

.


However, if you have a


String

that you want to tokenize and


StringTokenizer

is too limited, all you have to do is turn it into a stream with


StringBufferInputStream

and then use that to create a much more powerful


StreamTokenizer

.

Contents

|

Prev

|

Next

Recommended for you...

Top Five Most Popular Front-end Frameworks
Tapas Pal
Mar 5, 2018
DocFlex/Javadoc: Multi-Format Doclet & Rapid Doclet Development Tool
CodeGuru Staff
Apr 23, 2012
Top Down with Memorization Complexity
CodeGuru Staff
Feb 24, 2012
Building and Using the Secret Service Java API
CodeGuru Staff
Feb 21, 2012
CodeGuru Logo

CodeGuru covers topics related to Microsoft-related software development, mobile development, database management, and web application programming. In addition to tutorials and how-tos that teach programmers how to code in Microsoft-related languages and frameworks like C# and .Net, we also publish articles on software development tools, the latest in developer news, and advice for project managers. Cloud services such as Microsoft Azure and database options including SQL Server and MSSQL are also frequently covered.

Property of TechnologyAdvice. © 2025 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.