Binary and ASCII Files

So far we have limited ourselves to ASCII files. "ASCII" stands for American Standard Code for Information Interchange. It is a set of 95 printable characters and 33 control codes. (A complete list of ASCII codes can be found in Appendix A.) ASCII files are human-readable. When you write a program, the prog.cc file is ASCII.

Terminals, keyboards, and printers deal with character data. When you want to write a number like 1234 to the screen, it must be converted to four characters ("1", "2", "3", and "4") and written.

Similarly, when you read a number from the keyboard, the data must be converted from characters to integers. This is done by the >> operator.

The ASCII character "0" has the value 48, "1" the value 49, and so on. When you want to convert a single digit from ASCII to integer, you must subtract 48:

int integer; 
char ch; 
 
ch = '5'; 
integer = ch - 48; 
std::cout << "Integer " << integer << '\n'; 

Rather than remember that the character "0" is 48, you can just subtract '0':

integer = ch - '0'; 

Computers work on binary data. When reading numbers from an ASCII file, the program must process the character data through a conversion routine like the integer conversion routine just defined. This is expensive. Binary files require no conversion. They also generally take up less space than ASCII files. The drawback is that they cannot be directly printed on a terminal or printer. (If you've ever seen a long printout coming out of the printer displaying pages with a few characters at the top that look like "!E#(@$%@^Aa^AA^^JHC%^X", you know what happens when you try to print a binary file.)

ASCII files are portable (for the most part). They can be moved from machine to machine with very little trouble. Binary files are almost certainly nonportable. Unless you are an expert programmer, it is almost impossible to make a portable binary file.

Which file type should you use? In most cases, ASCII is best. If you have small to medium amounts of data, the conversion time does not seriously affect the performance of your program. (Who cares if it takes 0.5 seconds to start up instead of 0.3?) ASCII files also make it easy to verify the data.

Only when you are using large amounts of data will the space and performance problems force you to use the binary format.

The End-of-Line Puzzle

Back in the dark ages BC (Before Computers), there existed a magical device called a Teletype Model 33. This amazing machine contained a shift register made out of a motor and a rotor as well as a keyboard ROM consisting solely of levers and springs.

The Teletype contained a keyboard, printer, and paper tape reader/punch. It could transmit messages over telephones using a modem at the blazing rate of 10 characters per second.

But Teletype had a problem. It took 0.2 seconds to move the printhead from the right side to the left. 0.2 seconds is two character times. If a second character came while the printhead was in the middle of a return, that character was lost.

The Teletype people solved this problem by making end-of-line two characters: <carriage return> to position the printhead at the left margin, and <line feed> to move the paper up one line. That way the <line feed> "printed" while the printhead was racing back to the left margin.

When the early computers came out, some designers realized that using two characters for end-of-line wasted storage (at this time storage was very expensive). Some picked <line feed> for their end-of-line, and some chose <carriage return>. Some of the die-hards stayed with the two-character sequence.

Unix uses <line feed> for end-of-line. The newline character \n is code 0xA (LF or <line feed>).

MS-DOS/Windows uses the two characters <carriage return><line feed>. Compiler designers had problems dealing with the old C programs that thought newline was just <line feed>. The solution was to add code to the I/O library that stripped out the <carriage return> characters from ASCII input files and changed <line feed> to <carriage return><line feed> on output.

In MS-DOS/Windows, whether or not a file is opened as ASCII or binary is important to note. The flag std::ios::binary is used to indicate a binary file:

// Open ASCII file for reading
ascii_file.open("name", std::ios::in);         
 
// Open binary file for reading 
binary_file.open("name", std::ios::in|std::ios::binary);       

Unix programmers don't have to worry about the C++ library automatically fixing their ASCII files. In Unix, a file is a file, and ASCII is no different from binary. In fact, you can write a half-ASCII/half-binary file if you want to.

The member function put can be used to write out a single byte of a binary file. The following program (shown in Example 16-4) writes numbers 0 to 127 to a file called test.out. It works just fine in Unix, creating a 128-byte long file; however, in MS-DOS/Windows, the file contains 129 bytes. Why?

Example 16-4: wbin/wbin.cpp

#include <iostream>
#include <fstream>
#include <cstdlib>
 
int main(  )
{
    int cur_char;   // current character to write 
    std::ofstream out_file; // output file 
 
    out_file.open("test.out", std::ios::out);
    if (out_file.bad(  )) {
        (std::cerr << "Can not open output file\n");
        exit (8);
    }
 
    for (cur_char = 0; cur_char < 128; ++cur_char) {
        out_file << cur_char;
    }
    return (0);
}

Hint: Here is a hex dump of the MS-DOS/Windows file:

000:0001 0203 0405 0607 0809 0d0a 0b0c 0d0e 
010:0f10 1112 1314 1516 1718 191a 1b1c 1d1e 
020:1f20 2122 2324 2526 2728 292a 2b2c 2d2e 
030:2f30 3132 3334 3536 3738 393a 3b3c 3d3e 
040:3f40 4142 4344 4546 4748 494a 4b4c 4d4e 
050:4f50 5152 5354 5556 5758 595a 5b5c 5d5e 
060:5f60 6162 6364 6566 6768 696a 6b6c 6d6e 
070:6f70 7172 7374 7576 7778 797a 7b7c 7d7e 
080:7f 

Page:  1   2   3   4   5   6   7   8   9   Next 



Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: December 11, 2014 @ 1:00 p.m. ET / 10:00 a.m. PT Market pressures to move more quickly and develop innovative applications are forcing organizations to rethink how they develop and release applications. The combination of public clouds and physical back-end infrastructures are a means to get applications out faster. However, these hybrid solutions complicate DevOps adoption, with application delivery pipelines that span across complex hybrid cloud and non-cloud environments. Check out this …

  • VMware vCloud® Government Service provided by Carpathia® is an enterprise-class hybrid cloud service that delivers the tried and tested VMware capabilities widely used by government organizations today, with the added security and compliance assurance of FedRAMP authorization. The hybrid cloud is becoming more and more prevalent – in fact, nearly three-fourths of large enterprises expect to have hybrid deployments by 2015, according to a recent Gartner analyst report. Learn about the benefits of …

Most Popular Programming Stories

More for Developers

RSS Feeds