Designing File Formats

Suppose you are designing a program to produce a graph. The height, width, limits, and scales are to be defined in a graph configuration file. You are also assigned to write a user-friendly program that asks the operator questions and writes a configuration file so he or she does not have to learn the text editor. How should you design a configuration file?

One way would be as follows:

height (in inches)
width (in inches)
x lower limit
x upper limit
y lower limit
y upper limit
x-scale
y-scale

A typical plotter configuration file might look like:

10.0 
7.0 
0 
100 
30 
300 
0.5 
2.0 

This file does contain all the data, but in looking at it, you have trouble identifying what, for example, is the value of the Y lower limit. A solution is to comment the file so the configuration program writes out not only the data, but also a string describing the data.

10.0			height (in inches) 
7.0			width (in inches) 
0			x lower limit 
100			x upper limit 
30			y lower limit 
300			y upper limit 
0.5			x-scale  
2.0			y-scale 

Now the file is human-readable. But suppose a user runs the plot program and types in the wrong filename, and the program gets the lunch menu for today instead of a plot configuration file. The program is probably going to get very upset when it tries to construct a plot whose dimensions are "BLT on white" versus "Meatloaf and gravy."

The result is that you wind up with egg on your face. There should be some way of identifying a file as a plot configuration file. One method of doing this is to put the words "Plot Configuration File" on the first line of the file. Then, when someone tries to give your program the wrong file, the program will print an error message.

This takes care of the wrong file problem, but what happens when you are asked to enhance the program and add optional logarithmic plotting? You could simply add another line to the configuration file, but what about all those old files? It's not reasonable to ask everyone to throw them away. The best thing to do (from a user's point of view) is to accept old format files. You can make this easier by putting a version number in the file.

A typical file now looks like:

Plot Configuration File V1.0 
log			Logarithmic or normal plot 
10.0			height (in inches) 
7.0			width (in inches) 
0			x lower limit 
100			x upper limit 
30			y lower limit 
300			y upper limit 
0.5			x-scale  
2.0			y-scale 

In binary files, it is common practice to put an identification number in the first four bytes of the file. This is called the magic number. The magic number should be different for each type of file.

One method for choosing a magic number is to start with the first four letters of the program name (e.g., list) and convert them to hex: 0x6c607374. Then add 0x80808080 to the number: 0xECE0F3F4.

This generates a magic number that is probably unique. The high bit is set on each byte to make the byte non-ASCII and avoid confusion between ASCII and binary files. On most Unix systems and Linux, you'll find a file called /etc/magic, which contains information on other magic numbers used by various programs.

When reading and writing a binary file containing many different types of structures, it is easy to get lost. For example, you might read a name structure when you expected a size structure. This is usually not detected until later in the program. To locate this problem early, you can put magic numbers at the beginning of each structure. Then if the program reads the name structure and the magic number is not correct, it knows something is wrong.

Magic numbers for structures do not need to have the high bit set on each byte. Making the magic number just four ASCII characters makes it easy to pick out the beginning of structures in a file dump.

C-Style I/O Routines

C++ allows you to use the C I/O library in C++ programs. Many times this occurs because someone took a C program, translated it to C++, and didn't want to bother translating the I/O calls. In some cases, the old C library is better and easier to use than the new C++ library. For example, C string-conversion routines such as std::sscanf and std::sprintf use a far more compact formatting specification system than their C++ counterparts. (Note that it is a matter of taste whether or not compact is better.)

The declarations for the structures and functions used by the C I/O functions are stored in the standard include file <cstdio>.

The declaration for a file variable is:

std::FILE *file_variable;      /* Comment */ 

For example:

#include <cstdio> 
 
std::FILE *in_file;  /* File containing the input data */ 

Before a file can be used, it must be opened using the function std::fopen. std::fopen returns a pointer to the file structure for the file. The format for std::fopen is:

file_variable = std::fopen(name, mode); 
file_variable
A file variable.

name
Actual name of the file ("data.txt", "temp.dat", etc.).

mode
Indicates whether the file is to be read or written. Mode is "w" for writing and "r" for reading.

The function std::fclose closes the file. The format of std::fclose is:

status = std::fclose(file_variable); 

The variable status will be zero if the std::fclose was successful or nonzero for an error.

C provides three preopened files. These are listed in Table 16-8.

Table 16-8: Standard files

File

Description

stdin

Standard input (open for reading). Equivalent to C++'s cin.

stdout

Standard output (open for writing). Equivalent to C++'s cout.

stderr

Standard error (open for writing). Equivalent to C++'s cerr.

 

(There is no C file equivalent to C++'s clog.)

The function std::fgetc reads a single character from a file. If there is no more data in the file, the function returns the constant EOF (EOF is defined in cstdio). Note that std::fgetc returns an integer, not a character. This is necessary because the EOF flag must be a noncharacter value.

Example 16-6 counts the number of characters in the file input.txt.

Example 16-6: copy/copy.cpp

#include <cstdio>
#include <cstdlib>      /* ANSI Standard C file */
#include <iostream>
 
const char FILE_NAME[] = "input.txt";   // Name of the input file
 
int main(  )
{
    int  count = 0;  // number of characters seen 
    std::FILE *in_file;   // input file 
 
    int ch;          // character or EOF flag from input 
 
    in_file = std::fopen(FILE_NAME, "rb");
    if (in_file == NULL) {
        std::cerr << "Can not open " << FILE_NAME << '\n';
        exit(8);
    }
 
    while (true) {
        ch = std::fgetc(in_file);
        if (ch == EOF)
            break;
        ++count;
    }
    std::cout << "Number of characters in " << FILE_NAME << 
        " is " << count << '\n';
 
    std::fclose(in_file);
    return (0);
}

A similar function, std::fputc, exists for writing a single character. Its format is:

std::fputc(character,  file); 

The functions std::fgets and std::fputs work on one line at a time. The format of the std::fgets call is:

line_ptr = std::fgets(line, size, file); 
line_ptr
Equal to line if the read was successful, or NULL if EOF or an error is detected.

line
A character array where the function places the line.

size
The size of the character array. std::fgets reads until it gets a line (complete with ending \n) or it reads size - 1 characters. It then ends the string with a null (\0).

For example:

        char    line[100]; 
        . . . 
        std::fgets(line, sizeof(line), in_file); 

std::fputs is similar to std::fgets except that it writes a line instead of reading one. The format of the std::fputs function is:

line_ptr = std::fputs(line, file); 

The parameters to std::fputs are similar to the ones for std::fgets. std::fputs needs no size because it gets the size of the line to write from the length of the line. (It keeps writing until it hits a null character, '\0').

TIP:   The C++ function getline reads and discards the end-of-line character ('\n'). The C std::fgets reads the entire line, including the end-of-line and stores it in the buffer. So the '\n' is put in the buffer when you use std::fgets. This can sometimes cause surprising results.

Page:  1   2   3   4   5   6   7   8   9   Next 



Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: October 29, 2014 @ 11:00 a.m. ET / 8:00 a.m. PT Are you interested in building a cognitive application using the power of IBM Watson? Need a platform that provides speed and ease for rapidly deploying this application? Join Chris Madison, Watson Solution Architect, as he walks through the process of building a Watson powered application on IBM Bluemix. Chris will talk about the new Watson Services just released on IBM bluemix, but more importantly he will do a step by step cognitive …

  • Live Event Date: October 23, 2014 @ 12:00 p.m. ET / 9:00 a.m. PT Despite the current "virtualize everything" mentality, there are advantages to utilizing physical hardware for certain tasks. This is especially true for backups. In many cases, it is clearly in an organization's best interest to make use of physical, purpose-built backup appliances rather than relying on virtual backup software (VBA - Virtual Backup Appliances). Join us for this eSeminar to learn why physical appliances are preferable to …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds