Compound File Stream and Storage Manipulation

.

Introduction

Did you ever want to group a bunch of files together into a single file for run-time read/write access but didn't want to bother with a file format structure for accessing the files? There are many uses for this technology such as revision/undo storing, dynamic access to resources, incremental updates, WAD files etc. Microsoft's answer to implementing these types of solutions is to use a technology known as Compound Files (CF). Note that Compound files are an implementation of the ActiveX structured storage model (From MSDN article, "Containers: Compound Files)

CFs may be viewed as a file system within a file. They allow you to create files (known as streams) and directories (known as sub-storages) within a single file. Compound files offer some advantages of a database (such as transactions with rollback) and general file system functionality. Files within the CF may be read/written from/to incrementally just as they are within a normal filesystem.

The Problem

Application programming to access a CF usually requires quite a bit of manipulations of the IStorage and IStream interfaces that are daunting to many. In addition, management of the interfaces at the right time can cause problems if not handled correctly.

Solutions

What is needed is another model for accessing streams and sub-storages within a CF. A very simple model that every application programmer is familiar with the concept of files. Using MFC, they are managed by the CFile class. Using this model, we can extend the class to accommodate the CFs.

This project presents the following solutions.

  • An MFC CFile derived class (CStgFile) that allow simple CFile type access to a stream within a file.
  • Methods for creation of CFs (CreateStg()) and the creation of single level sub-storages (MkStg()).
  • An OLE Automation (COM) class ("gstg.core") for manipulating CFs from scripting languages (and/or a CDispatch derived MFC class).
  • JavaScript examples of copying files in/out of a CF.

In addition, there is additional code to provide the following external file-system support.

  • An MFC class (CScanDir) that is used to scan a file-system directory for a file specification and return the results in a string array. Support for overriding the default behavior is also provided.
  • An OLE Automation (COM) class ("gstg.dir" ) for accessing directory information from scripting languages (and/or a CDispatch derived MFC class).
  • JavaScript examples of scanning a directory for files and sub-directories.

And finally, an example to demonstrate the functionality:

  • Copy a sub-directory of files from a file system into a sub-storage in a CF.

Development Methodology

  • The core code (CStgFile, CScanDir) is first developed as reusable MFC classes.
  • They are then "wrapped" with an OLE Automation (COM) layer that may be used by OLE Automation scripting engines (VB, VBA, WSH, etc.) and/or other MFC application via a CDispatch derived interface (using the TLB).
  • Finally, JavaScript test scripts are developed for exercising the basic functionality of the code before integration to a more thorough test.

Limitations

To reduce the complexities of illustrating these concepts, the following limitations were imposed.

  • CFs do not use TRANSACTED file semantics. All accesses are DIRECT.
  • CF implementation limited to one level of sub-storages.
  • Very little error (return codes) checking is performed in the OLE Automation wrappers.
  • Methods are not "friendly" to errant programming practices.

Examples of Usage

MFC Example

To illustrate how simple it is to use the CStgFile MFC class for copying an external file to a newly created CF, the following MFC code may be used.



CFile	File( "tmp.tmp", CFile::modeRead );	// open a source file
CStgFile	FileStg;	// instance the CF wrapper

FileStg.CreateStg( "tmp.stg" );	// creates the storage
FileStg.Open( "tmp.tmp", CFile::modeCreate | CFile::modeWrite); 

while( 1 )	// copy all bytes to stream
{
	UINT	cB = 0;
	BYTE	rgB[512*8];
	while( (cB = FileSrc.Read( rgB, sizeof(rgB) )) > 0 )
	{
		FileStg.Write( rgB, cB );
	}
}

FileStg.Close();	// close the stream
FileStg.CloseStg();	// close the CF file

Notice in this example, the one call to CreateStg() converts accesses to the file to using a CF. If this call is omitted, the access to the object uses the normal CFile methods. This may be useful in debugging when you wish to access the streams as normal files.

JavaScript Example

To perform the same operation in JavaScript, the solution is even simpler:

var objStg = WScript.CreateObject( "gstg.core" ); // create object
objStg.Create( "tmp.stg" );                       // create the CF
objStg.CopyTo( "tmp.tmp", "tmp.tmp" );            // copy the external file
objStg.Close();                                   // close the CF

Other Examples

Other script examples are in the scr, scr/stg and scr/dir sub-directories. These include an example to copy a whole directory of files from the file system into a sub-storage of a CF (cp_bmps.js).

Conclusion

Using Compound Files becomes much easier using the CStgFile class for accessing streams and sub-storages. There are many other uses and advantages of using Compound Files that are beyond the scope of this document. Please refer to MSDN for further reading about OLE Compound Files.

Notes

  • In order to run-the OLE Automation examples, you must register the DLL[s].
  • In order to run the JavaScript example, you must use the Window Scripting Host CScript application. This is available for download from Microsoft and comes with Window98.
  • Use the DFView application from Microsoft to view CFs created with the CStgFile class.
  • These classes were developed with MS Visual C++ V5.0 and should be compatible with previous releases of the compiler.
  • There are some characters that are considered invalid for stream names (e.g. '!').

Other Uses

After understanding how CFs store streams of data, other uses become apparent:

  • A Web-Site in a file.
  • Resources for localization.
  • BLOB type storage insertion/retrieval without the database overhead.
  • Archival of data with direct access.
  • Etc.

Future

The class for accessing streams within a CF should be extended to support N levels of organization. In addition, support for selecting TRANSACTIONs should be included.

Download demo project - 46 KB



Comments

  • Compound File .NET

    Posted by Fransoa on 11/29/2012 12:25am

    Compound File .NET is compound file API for .NET Framework. The API allows you to create/read/parse/edit compound documents (OLE files). Supports all versions of .NET Framework, .NET Compact Framework and Mono. http://www.independentsoft.de

    Reply
  • Nice....

    Posted by Legacy on 12/02/2003 12:00am

    Originally posted by: Leonhardt Wille

    Hi!
    Good job...
    how about managing date/time information for the Streams?
    Do you know anything about that?
    In the reference it says that the STATST holds the information for create/modify/access time for every stream, but it seems as if you have to set that by yourself...
    Any idea how this can be done?

    regards
    Leo

    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Download the Information Governance Survey Benchmark Report to gain insights that can help you further establish business value in your Records and Information Management (RIM) program and across your entire organization. Discover how your peers in the industry are dealing with this evolving information lifecycle management environment and uncover key insights such as: 87% of organizations surveyed have a RIM program in place 8% measure compliance 64% cannot get employees to "let go" of information for …

  • It's time high-level executives and IT compliance officers recognize and acknowledge the danger of malicious insiders, an increased attack surface and the potential for breaches caused by employee error or negligence. See why there is extra emphasis on insider threats.

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds