Compound File Stream and Storage Manipulation

.

Introduction

Did you ever want to group a bunch of files together into a
single file for run-time read/write access but didn’t want to
bother with a file format structure for accessing the files?
There are many uses for this technology such as revision/undo
storing, dynamic access to resources, incremental updates, WAD
files etc. Microsoft’s answer to implementing these types of
solutions is to use a technology known as Compound Files (CF).
Note that Compound files are an implementation of the ActiveX
structured storage model (From MSDN article, "Containers:
Compound Files)

CFs may be viewed as a file system within a file. They allow
you to create files (known as streams) and directories (known as
sub-storages) within a single file. Compound files offer some
advantages of a database (such as transactions with rollback) and
general file system functionality. Files within the CF may be
read/written from/to incrementally just as they are within a
normal filesystem.

The Problem

Application programming to access a CF usually requires quite
a bit of manipulations of the IStorage and IStream interfaces
that are daunting to many. In addition, management of the
interfaces at the right time can cause problems if not handled
correctly.

Solutions

What is needed is another model for accessing streams and
sub-storages within a CF. A very simple model that every
application programmer is familiar with the concept of files.
Using MFC, they are managed by the CFile class. Using this model,
we can extend the class to accommodate the CFs.

This project presents the following solutions.

  • An MFC CFile derived class (CStgFile) that allow simple
    CFile type access to a stream within a file.
  • Methods for creation of CFs (CreateStg()) and the
    creation of single level sub-storages (MkStg()).
  • An OLE Automation (COM) class ("gstg.core") for
    manipulating CFs from scripting languages (and/or a
    CDispatch derived MFC class).
  • JavaScript examples of copying files in/out of a CF.

In addition, there is additional code to provide the following
external file-system support.

  • An MFC class (CScanDir) that is used to scan a
    file-system directory for a file specification and return
    the results in a string array. Support for overriding the
    default behavior is also provided.
  • An OLE Automation (COM) class ("gstg.dir" ) for
    accessing directory information from scripting languages
    (and/or a CDispatch derived MFC class).
  • JavaScript examples of scanning a directory for files and
    sub-directories.

And finally, an example to demonstrate the functionality:

  • Copy a sub-directory of files from a file system into a
    sub-storage in a CF.

Development Methodology

  • The core code (CStgFile, CScanDir) is first developed as
    reusable MFC classes.
  • They are then "wrapped" with an OLE Automation
    (COM) layer that may be used by OLE Automation scripting
    engines (VB, VBA, WSH, etc.) and/or other MFC application
    via a CDispatch derived interface (using the TLB).
  • Finally, JavaScript test scripts are developed for
    exercising the basic functionality of the code before
    integration to a more thorough test.

Limitations

To reduce the complexities of illustrating these concepts, the
following limitations were imposed.

  • CFs do not use TRANSACTED file semantics. All accesses
    are DIRECT.
  • CF implementation limited to one level of sub-storages.
  • Very little error (return codes) checking is performed in
    the OLE Automation wrappers.
  • Methods are not "friendly" to errant
    programming practices.

Examples of Usage

MFC Example

To illustrate how simple it is to use the CStgFile MFC class
for copying an external file to a newly created CF, the following
MFC code may be used.



CFile	File( "tmp.tmp", CFile::modeRead );	// open a source file
CStgFile	FileStg;	// instance the CF wrapper

FileStg.CreateStg( "tmp.stg" );	// creates the storage
FileStg.Open( "tmp.tmp", CFile::modeCreate | CFile::modeWrite);

while( 1 )	// copy all bytes to stream
{
	UINT	cB = 0;
	BYTE	rgB[512*8];
	while( (cB = FileSrc.Read( rgB, sizeof(rgB) )) > 0 )
	{
		FileStg.Write( rgB, cB );
	}
}

FileStg.Close();	// close the stream
FileStg.CloseStg();	// close the CF file

Notice in this example, the one call to CreateStg() converts
accesses to the file to using a CF. If this call is omitted, the
access to the object uses the normal CFile methods. This may be
useful in debugging when you wish to access the streams as normal
files.

JavaScript Example

To perform the same operation in JavaScript, the solution is
even simpler:

var objStg = WScript.CreateObject( "gstg.core" ); // create object
objStg.Create( "tmp.stg" );                       // create the CF
objStg.CopyTo( "tmp.tmp", "tmp.tmp" );            // copy the external file
objStg.Close();                                   // close the CF

Other Examples

Other script examples are in the scr, scr/stg and scr/dir
sub-directories. These include an example to copy a whole
directory of files from the file system into a sub-storage of a
CF (cp_bmps.js).

Conclusion

Using Compound Files becomes much easier using the CStgFile
class for accessing streams and sub-storages. There are many
other uses and advantages of using Compound Files that are beyond
the scope of this document. Please refer to MSDN for further
reading about OLE Compound Files.

Notes

  • In order to run-the OLE Automation examples, you must
    register the DLL[s].
  • In order to run the JavaScript example, you must use the
    Window Scripting Host CScript application. This is
    available for download from Microsoft and comes with
    Window98.
  • Use the DFView application from Microsoft to view CFs
    created with the CStgFile class.
  • These classes were developed with MS Visual C++ V5.0 and
    should be compatible with previous releases of the
    compiler.
  • There are some characters that are considered invalid for
    stream names (e.g. ‘!’).

Other Uses

After understanding how CFs store streams of data, other uses
become apparent:

  • A Web-Site in a file.
  • Resources for localization.
  • BLOB type storage insertion/retrieval without the
    database overhead.
  • Archival of data with direct access.
  • Etc.

Future

The class for accessing streams within a CF should be extended
to support N levels of organization. In addition, support for
selecting TRANSACTIONs should be included.

Download demo project – 46 KB

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read