Compression

Bruce Eckel’s Thinking in Java Contents | Prev | Next

One
aspect of these Java 1.1 classes stands out: They are not derived from the new
Reader
and
Writer
classes, but instead are part of the
InputStream
and
OutputStream
hierarchies. So you might be forced to mix the two types of streams. (Remember
that you can use
InputStreamReader
and
OutputStreamWriter
to provide easy conversion between one type and another.)

CheckedInputStream

GetCheckSum( )
produces checksum for any
InputStream
(not just decompression)

CheckedOutputStream

GetCheckSum( )
produces checksum for any
OutputStream
(not just compression)

DeflaterOutputStream

Base
class for compression classes

ZipOutputStream

A
DeflaterOutputStream
that
compresses data into the Zip file format

GZIPOutputStream

A
DeflaterOutputStream
that
compresses data into the GZIP file format

InflaterInputStream

Base
class for decompression classes

ZipInputStream

A
DeflaterInputStream
that
Decompresses data that has been stored in the Zip file format

GZIPInputStream

A
DeflaterInputStream
that
decompresses data that has been stored in the GZIP file format

Simple
compression with GZIP

The
GZIP interface is simple and thus is probably more appropriate when you have a
single stream of data that you want to compress (rather than a collection of
dissimilar pieces of data). Here’s an example that compresses a single
file:

//: GZIPcompress.java
// Uses Java 1.1 GZIP compression to compress
// a file whose name is passed on the command
// line.
import java.io.*;
import java.util.zip.*;
 
public class GZIPcompress {
  public static void main(String[] args) {
    try {
      BufferedReader in =
        new BufferedReader(
          new FileReader(args[0]));
      BufferedOutputStream out =
        new BufferedOutputStream(
          new GZIPOutputStream(
            new FileOutputStream("test.gz")));
      System.out.println("Writing file");
      int c;
      while((c = in.read()) != -1)
        out.write(c);
      in.close();
      out.close();
      System.out.println("Reading file");
      BufferedReader in2 =
        new BufferedReader(
          new InputStreamReader(
            new GZIPInputStream(
              new FileInputStream("test.gz"))));
      String s;
      while((s = in2.readLine()) != null)
        System.out.println(s);
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
} ///:~ 

The
use of the compression classes is straightforward – you simply wrap your
output stream in a
GZIPOutputStream
or
ZipOutputStream
and your input stream in a
GZIPInputStream
or
ZipInputStream.
All else is ordinary IO reading and writing. This is, however, a good example
of when you’re forced to mix the old IO streams with the new:
in
uses the
Reader
classes, whereas
GZIPOutputStream’s
constructor can accept only an
OutputStream
object, not a
Writer
object.

Multi-file
storage with Zip

//: ZipCompress.java
// Uses Java 1.1 Zip compression to compress
// any number of files whose names are passed
// on the command line.
import java.io.*;
import java.util.*;
import java.util.zip.*;
 
public class ZipCompress {
  public static void main(String[] args) {
    try {
      FileOutputStream f =
        new FileOutputStream("test.zip");
      CheckedOutputStream csum =
        new CheckedOutputStream(
          f, new Adler32());
      ZipOutputStream out =
        new ZipOutputStream(
          new BufferedOutputStream(csum));
      out.setComment("A test of Java Zipping");
      // Can't read the above comment, though
      for(int i = 0; i < args.length; i++) {
        System.out.println(
          "Writing file " + args[i]);
        BufferedReader in =
          new BufferedReader(
            new FileReader(args[i]));
        out.putNextEntry(new ZipEntry(args[i]));
        int c;
        while((c = in.read()) != -1)
          out.write(c);
        in.close();
      }
      out.close();
      // Checksum valid only after the file
      // has been closed!
      System.out.println("Checksum: " +
        csum.getChecksum().getValue());
      // Now extract the files:
      System.out.println("Reading file");
      FileInputStream fi =
         new FileInputStream("test.zip");
      CheckedInputStream csumi =
        new CheckedInputStream(
          fi, new Adler32());
      ZipInputStream in2 =
        new ZipInputStream(
          new BufferedInputStream(csumi));
      ZipEntry ze;
      System.out.println("Checksum: " +
        csumi.getChecksum().getValue());
      while((ze = in2.getNextEntry()) != null) {
        System.out.println("Reading file " + ze);
        int x;
        while((x = in2.read()) != -1)
          System.out.write(x);
      }
      in2.close();
      // Alternative way to open and read
      // zip files:
      ZipFile zf = new ZipFile("test.zip");
      Enumeration e = zf.entries();
      while(e.hasMoreElements()) {
        ZipEntry ze2 = (ZipEntry)e.nextElement();
        System.out.println("File: " + ze2);
        // ... and extract the data as before
      }
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
} ///:~ 

To
extract files,
ZipInputStream
has a
getNextEntry( )
method that returns the next
ZipEntry
if there is one. As a more succinct alternative, you can read the file using a
ZipFile
object, which has a method
entries( )
to
return an
Enumeration
to the
ZipEntries.

In
order to read the checksum you must somehow have access to the associated
Checksum
object. Here, a handle to the
CheckedOutputStream
and
CheckedInputStream
objects is retained, but you could also just hold onto a handle to the
Checksum
object.

A
baffling method in Zip streams is
setComment( ).
As shown above, you can set a comment when you’re writing a file, but
there’s no way to recover the comment in the
ZipInputStream.
Comments appear to be supported fully on an entry-by-entry basis only via
ZipEntry.

The
Java archive (jar) utility

JAR
files are particularly helpful when you deal with the Internet. Before JAR
files, your Web browser would have to make repeated requests of a Web server in
order to download all of the files that make up an applet. In addition, each of
these files was uncompressed. By combining all of the files for a particular
applet into a single JAR file, only one server request is necessary and the
transfer is faster because of compression. And each entry in a JAR file can be
digitally signed for security (refer to the Java documentation for details).

The
jar
utility that comes with Sun’s JDK automatically compresses the files of
your choice. You invoke it on the command line:

jar
[options] destination [manifest] inputfile(s)

The
options are simply a collection of letters (no hyphen or any other indicator is
necessary). These are:

c

Creates
a new or empty archive.

t

Lists
the table of contents.

x

Extracts
all files

x
file

Extracts
the named file

f

Says:
“I’m going to give you the name of the file.” If you
don’t use this,
jar
assumes
that its input will come from standard input, or, if it is creating a file, its
output will go to standard output.

m

Says
that the first argument will be the name of the user-created manifest file

v

Generates
verbose output describing what
jar
is doing

O

Only
store the files; doesn’t compress the files (use to create a JAR file
that you can put in your classpath)

M

Don’t
automatically create a manifest file

If
a subdirectory is included in the files to be put into the JAR file, that
subdirectory is automatically added, including all of its subdirectories, etc.
Path information is also preserved.

Here
are some typical ways to invoke
jar:

jar
cf myJarFile.jar *.class

This
creates a JAR file called
myJarFile.jar
that contains all of the class files in the current directory, along with an
automatically-generated manifest file.

jar
cmf myJarFile.jar myManifestFile.mf *.class

Like
the previous example, but adding a user-created manifest file called
myManifestFile.mf.

jar
tf myJarFile.jar

Produces
a table of contents of the files in
myJarFile.jar.

jar
tvf myJarFile.jar

Adds
the “verbose” flag to give more detailed information about the
files in
myJarFile.jar.

jar
cvf myApp.jar audio classes image

Assuming
audio,
classes,
and
image
are subdirectories, this combines all of the subdirectories into the file
myApp.jar.
The “verbose” flag is also included to give extra feedback while the
jar
program is working.

If
you create a JAR file using the
O
option, that file can be placed in your CLASSPATH:

CLASSPATH="lib1.jar;lib2.jar;"

Then
Java can search
lib1.jar
and
lib2.jar
for class files.

The
jar
tool isn’t as useful as a
zip
utility. For example, you can’t add or update files to an existing JAR
file; you can create JAR files only from scratch. Also, you can’t move
files into a JAR file, erasing them as they are moved. However, a JAR file
created on one platform will be transparently readable by the
jar
tool on any other platform (a problem that sometimes plagues
zip
utilities).

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read