PCM Audio and Wave Files

This article is intended to give an overview of how PCM audio is stored and processed on a PC. It also explains the structure of wave files and shows you how to work with them using code written in C++. But, the actual theory behind PCM, such as aliasing and so forth, is beyond the scope of this article. I may try to address those in a separate, future article.

In Brief, What Is PCM?

In the digital domain, PCM (Pulse Code Modulation) is the most straightforward mechanism to store audio. The analog audio is sampled in accordance with the Nyquest theorem and the individual samples are stored sequentially in binary format.

The wave file is the most common format for storing PCM data. But, before you dig into the internals of wave files, it would help to get some insights to the Interchange Format Files that form the basis for the wave file format.

Interchange Format Files (IFF)

It is a "Meta" file format developed by a company named Electronic Arts. The full name of this format is ElectronicArts Interchange File Format 1985 (EA IFF 85). IFF lays down a top-level protocol on what the structure of IFF compliant files should look like. It targets issues such as versioning, compatibility, portability, and so forth. It helps specify standardized file formats that aren't tied to a particular product.

For the purpose of explaining wave files, it would surmise to say that the wave file format is based on the generic IFF format. If you are interested in digging deeper into IFF, here is a wonderful link:


The WAVE File Format

The WAVE File Format supports a variety of bit resolutions, sample rates, and channels of audio. I would say that this is the most popular format for storing PCM audio on the PC and has become synonymous with the term "raw digital audio."

The WAVE file format is based on Microsoft's version of the Electronic Arts Interchange File Format method for storing data. In keeping with the dictums of IFF, data in a Wave file is stored in many different "chunks." So, if a vendor wants to store additional information in a Wave file, he just adds info to new chunks instead of trying to tweak the base file format or come up with his own proprietary file format. That is the primary goal of the IFF.

As mentioned earlier, a WAVE file is a collection of a number of different types of chunks. But, there are three chunks that are required to be present in a valid wave file:

  1. 'RIFF', 'WAVE' chunk
  2. "fmt" chunk
  3. 'data' chunk

All other chunks are optional. The Riff wave chunk is the identifier chunk that tells us that this is a wave file. The "fmt" chunk contains important parameters describing the waveform, such as its sample rate, bits per sample, and so forth. The Data chunk contains the actual waveform data.

An application that uses a WAVE file must be able to read the three required chunks although it can ignore the optional chunks. But, all applications that perform a copy operation on wave files should copy all of the chunks in the WAVE.

The Riff chunk is always the first chunk. The fmt chunk should be present before the data chunk. Apart from this, there are no restrictions upon the order of the chunks within a WAVE file.

Here is an example of the layout for a minimal WAVE file. It consists of a single WAVE containing the three required chunks.

While interpreting WAVE files, the unit of measurement used is a "sample." Literally, it is what it says. A sample represents data captured during a single sampling cycle. So, if you are sampling at 44 KHz, you will have 44 K samples. Each sample could be represented as 8 bits, 16 bits, 24 bits, or 32 bits. (There is no restriction on how many bits you use for a sample except that it has to be a multiple of 8.) To some extent, the more the number of bits in a sample, the better the quality of the audio.

One annoying detail to note is that 8-bit samples are represented as "unsigned" values whereas 16-bit and higher are represented by "signed" values. I don't know why this discrepancy exists; that's just the way it is.

The data bits for each sample should be left-justified and padded with 0s. For example, consider the case of a 10-bit sample (as samples must be multiples of 8, we need to represent it as 16 bits). The 10 bits should be left-justified so that they become bits 6 to 15 inclusive, and bits 0 to 5 should be set to zero.

As an example, here is a 10-bit sample with a value of 0100001111 left-justified as a 16-bit word.

The image kind of got blurred during conversion from BMP to a GIF, but I hope it is legible enough. The MSB ends in 15. That's what I drew, but it kind of looks like a 16, doesnt it? Pardon me for that.

Given the fact that the WAVE format uses Intel's little endian byte order, the LSB is stored first, as shown here:

The analogy I have provided is for mono audio, meaning that you have just one "channel." When you deal with stereo audio, 3D audio, and so forth, you are in effect dealing with multiple channels, meaning you have multiple samples describing the audio in any given moment in time. For example, for stereo audio, at any given point in time you need to know what the audio signal was for the left channel as well as the right channel. So, you will have to read and write two samples at a time.

Say you sample at 44 KHz for stereo audio; then effectively, you will have 44 K * 2 samples. If you are using 16 bits per sample, then given the duration of audio, you can calculate the total size of the wave file as:

Size in bytes = sampling rate * number of channels * (bits per sample / 8) * duration in seconds

Number of samples per second = sampling rate * number of channels

When you are dealing with such multi-channel sounds, single sample points from each channel are interleaved. Instead of storing all of the sample points for the left channel first, and then storing all of the sample points for the right channel next, you "interleave" the two channels' samples together. You would store the first sample of the left channel. Then, you would store the first sample of the right channel, and so on.

PCM Audio and Wave Files

When a device needs to reproduce the stored stereo audio (or any multi-channel audio), it will process the left and right channels (or however many channels there are) simultaneously. This collective piece of information is called a sample frame.


Note: Although the example shows just stereo audio, it is the same principle for 3-channel, 5-channel, and so forth. So, for 3-channel audio, the layout would be left, right, center, left, right, center, and so on.

So far, you have covered the very basics of PCM audio and how it is represented in a wave file. It is time to take a look at some code and see how you can use C++ to manage wave files. Start by laying out the structures for the different chunks of a wave file.

The first chunk is the riff header chunk and can be represented as follows. You use a TCHAR that is defined as a normal ASCII char or as a wide character depending upon whether the UNICODE directive has been set on your compiler.

   TCHAR szRiffID[4];        // 'R','I','F','F'
      DWORD dwRiffSize;
TCHAR szRiffFormat[4]; // 'W','A','V','E' };

I guess it is self explanatory. The second chunk is the fmt chunk. It describes the properties of the wave file, such as bits per sample, number of channels, and the like. You can use a helper structure to neatly represent the chunk as:

   WORD wFormatTag;
   WORD wChannels;
   DWORD dwSamplesPerSec;
   DWORD dwAvgBytesPerSec;
   WORD wBlockAlign;
   WORD wBitsPerSample;
struct FMT_BLOCK
   TCHAR szFmtID[4];    // 'f','m','t',' ' please note the
                        // space character at the fourth location.
      DWORD dwFmtSize;
      WAVE_FORMAT wavFormat;

Lastly, you describe the data block that contains the actual waveform data:

   TCHAR szDataID[4];    // 'd','a','t','a'
   DWORD dwDataSize;

That's it. That's all you need to describe a wave form. Of course, there a lot of optional chunks that you can have (they should be before the data block and after the fmt block). Just as an example, here is an optional chunk that you could use:

Note Chunk, used to store "comments" about the wave data:

   TCHAR ID[4];    // 'note'
   long chunkSize;
   long dwIdentifier;
   TCHAR dwText[];

Please note that I used "long" here. You can use a DWORD or long interchangeably.

I have put together a small application that will help you understand a lot about how to use C++ to work with wave files. Here is what the source code does:

  1. Takes a wave file and reverses it (stores the reversed wave in the destination file).
  2. Shows how you can use memory mapped files to handle wave files efficiently. Wave files tend to become huge beasts; using a memory-mapped file could turn out to be a life saver.
  3. Shows how you can do duplex operation (read and write) at the same time using just one wave handling class.
  4. And so on. The sample should get you started in building lots of fun wave applications.
  5. Shows how you can use "progressive" writes, so you can terminate the app halfway through and you will still have a "valid" wave file. You can use this feature to immediately write a voice recording app almost effortlessly. Give it a try.

What the source code does not try to do:

  1. It does not try to teach coding.
  2. It leaves a lot of room for optimizations. The code just shows you one way of working with wave files to reverse them; it will take a LONG time to reverse a big wave file.
  3. A better approach would be to use an intermediate buffer to swap chunks of data in memory, but I leave that as an exercise for you.

This article was originally published on January 18th, 2005

About the Author

Vinayak Raghuvamshi

I am known as the Carpal Tunnel Coder - You probably guessed a lot about me already :-)


Most Popular Programming Stories

More for Developers

RSS Feeds

Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date