Designing File Formats

Suppose you are designing a program to produce a graph. The height, width, limits, and scales are to be defined in a graph configuration file. You are also assigned to write a user-friendly program that asks the operator questions and writes a configuration file so he or she does not have to learn the text editor. How should you design a configuration file?

One way would be as follows:

height (in inches)
width (in inches)
x lower limit
x upper limit
y lower limit
y upper limit

A typical plotter configuration file might look like:


This file does contain all the data, but in looking at it, you have trouble identifying what, for example, is the value of the Y lower limit. A solution is to comment the file so the configuration program writes out not only the data, but also a string describing the data.

10.0			height (in inches) 
7.0			width (in inches) 
0			x lower limit 
100			x upper limit 
30			y lower limit 
300			y upper limit 
0.5			x-scale  
2.0			y-scale 

Now the file is human-readable. But suppose a user runs the plot program and types in the wrong filename, and the program gets the lunch menu for today instead of a plot configuration file. The program is probably going to get very upset when it tries to construct a plot whose dimensions are "BLT on white" versus "Meatloaf and gravy."

The result is that you wind up with egg on your face. There should be some way of identifying a file as a plot configuration file. One method of doing this is to put the words "Plot Configuration File" on the first line of the file. Then, when someone tries to give your program the wrong file, the program will print an error message.

This takes care of the wrong file problem, but what happens when you are asked to enhance the program and add optional logarithmic plotting? You could simply add another line to the configuration file, but what about all those old files? It's not reasonable to ask everyone to throw them away. The best thing to do (from a user's point of view) is to accept old format files. You can make this easier by putting a version number in the file.

A typical file now looks like:

Plot Configuration File V1.0 
log			Logarithmic or normal plot 
10.0			height (in inches) 
7.0			width (in inches) 
0			x lower limit 
100			x upper limit 
30			y lower limit 
300			y upper limit 
0.5			x-scale  
2.0			y-scale 

In binary files, it is common practice to put an identification number in the first four bytes of the file. This is called the magic number. The magic number should be different for each type of file.

One method for choosing a magic number is to start with the first four letters of the program name (e.g., list) and convert them to hex: 0x6c607374. Then add 0x80808080 to the number: 0xECE0F3F4.

This generates a magic number that is probably unique. The high bit is set on each byte to make the byte non-ASCII and avoid confusion between ASCII and binary files. On most Unix systems and Linux, you'll find a file called /etc/magic, which contains information on other magic numbers used by various programs.

When reading and writing a binary file containing many different types of structures, it is easy to get lost. For example, you might read a name structure when you expected a size structure. This is usually not detected until later in the program. To locate this problem early, you can put magic numbers at the beginning of each structure. Then if the program reads the name structure and the magic number is not correct, it knows something is wrong.

Magic numbers for structures do not need to have the high bit set on each byte. Making the magic number just four ASCII characters makes it easy to pick out the beginning of structures in a file dump.

C-Style I/O Routines

C++ allows you to use the C I/O library in C++ programs. Many times this occurs because someone took a C program, translated it to C++, and didn't want to bother translating the I/O calls. In some cases, the old C library is better and easier to use than the new C++ library. For example, C string-conversion routines such as std::sscanf and std::sprintf use a far more compact formatting specification system than their C++ counterparts. (Note that it is a matter of taste whether or not compact is better.)

The declarations for the structures and functions used by the C I/O functions are stored in the standard include file <cstdio>.

The declaration for a file variable is:

std::FILE *file_variable;      /* Comment */ 

For example:

#include <cstdio> 
std::FILE *in_file;  /* File containing the input data */ 

Before a file can be used, it must be opened using the function std::fopen. std::fopen returns a pointer to the file structure for the file. The format for std::fopen is:

file_variable = std::fopen(name, mode); 
A file variable.

Actual name of the file ("data.txt", "temp.dat", etc.).

Indicates whether the file is to be read or written. Mode is "w" for writing and "r" for reading.

The function std::fclose closes the file. The format of std::fclose is:

status = std::fclose(file_variable); 

The variable status will be zero if the std::fclose was successful or nonzero for an error.

C provides three preopened files. These are listed in Table 16-8.

Table 16-8: Standard files




Standard input (open for reading). Equivalent to C++'s cin.


Standard output (open for writing). Equivalent to C++'s cout.


Standard error (open for writing). Equivalent to C++'s cerr.


(There is no C file equivalent to C++'s clog.)

The function std::fgetc reads a single character from a file. If there is no more data in the file, the function returns the constant EOF (EOF is defined in cstdio). Note that std::fgetc returns an integer, not a character. This is necessary because the EOF flag must be a noncharacter value.

Example 16-6 counts the number of characters in the file input.txt.

Example 16-6: copy/copy.cpp

#include <cstdio>
#include <cstdlib>      /* ANSI Standard C file */
#include <iostream>
const char FILE_NAME[] = "input.txt";   // Name of the input file
int main(  )
    int  count = 0;  // number of characters seen 
    std::FILE *in_file;   // input file 
    int ch;          // character or EOF flag from input 
    in_file = std::fopen(FILE_NAME, "rb");
    if (in_file == NULL) {
        std::cerr << "Can not open " << FILE_NAME << '\n';
    while (true) {
        ch = std::fgetc(in_file);
        if (ch == EOF)
    std::cout << "Number of characters in " << FILE_NAME << 
        " is " << count << '\n';
    return (0);

A similar function, std::fputc, exists for writing a single character. Its format is:

std::fputc(character,  file); 

The functions std::fgets and std::fputs work on one line at a time. The format of the std::fgets call is:

line_ptr = std::fgets(line, size, file); 
Equal to line if the read was successful, or NULL if EOF or an error is detected.

A character array where the function places the line.

The size of the character array. std::fgets reads until it gets a line (complete with ending \n) or it reads size - 1 characters. It then ends the string with a null (\0).

For example:

        char    line[100]; 
        . . . 
        std::fgets(line, sizeof(line), in_file); 

std::fputs is similar to std::fgets except that it writes a line instead of reading one. The format of the std::fputs function is:

line_ptr = std::fputs(line, file); 

The parameters to std::fputs are similar to the ones for std::fgets. std::fputs needs no size because it gets the size of the line to write from the length of the line. (It keeps writing until it hits a null character, '\0').

TIP:   The C++ function getline reads and discards the end-of-line character ('\n'). The C std::fgets reads the entire line, including the end-of-line and stores it in the buffer. So the '\n' is put in the buffer when you use std::fgets. This can sometimes cause surprising results.

Page:  1   2   3   4   5   6   7   8   9   Next 


  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • This paper introduces IBM Java on the IBM PowerLinux 7R2 server and describes IBM's implementation of the Java platform, which includes IBM's Java Virtual Machine and development toolkit.

  • The explosion in mobile devices and applications has generated a great deal of interest in APIs. Today's businesses are under increased pressure to make it easy to build apps, supply tools to help developers work more quickly, and deploy operational analytics so they can track users, developers, application performance, and more. Apigee Edge provides comprehensive API delivery tools and both operational and business-level analytics in an integrated platform. It is available as on-premise software or through …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds