Building the Right Environment to Support AI, Machine Learning and Deep Learning
- A small explanation about sound, and sound format.
- How to read/write to a wave file.
- How to play short length sounds with DirectXSound buffers (loading the entire sound file at once)
- How to record/play with DirectXSound using a split buffer for very long sounds (for streaming).
- How to display the wave data (oscilloscope).
- How to convert from one format to another (like 8-bit to 16-bit, or 11000 Hz to 8000 Hz, Stereo to Mono, and so forth).
- How to split a stereo sound to two mono sounds.
- How to mix sounds, change the volume digitally.
What Is Sound?
In computers, sound is digital. That means that sound is represented by a long array of numbers. (That is not the official definition, by the way). Therefore, if you want to modify the sound (the volume, for example), you have to change those numbers in one way or another (I'll explain how to do that later).
Sound Format: What Is ... ?
- Bit Rate: In a wave file, the bit rate can be of 8-bit or 16-bit. 8-bit is one byte; therefore, the sound is stored in arrays of Byte type; 16-bit waves (2 bytes) are stored in an Integer array type. However, you also can store a 16-bit wave in a Byte array (as long as the length is always divisible by 2). Don't get me wrong; this does not mean that you can't store sound in any other data type. In fact, sound can be stored in almost any data type, even strings; as long as you know how to handle the data buffer, it really does not matter. But preferably, it is better to store the sound in simple data types such as Byte and Integer, corresponding to 8-bit and 16-bit waves. Another reason why you should store the sound in Byte and Integer data types is that it is easier to manipulate the sound (in other words, change the volume, and so on).
- Stereo, Mono: Everyone who has any clue about music should know this. It is how many channels the wave file has. Mono means one channel, Stereo means two channels.
- Samples Per Second: When the sound card converts from analog to digital, it takes "samples" of the wave, and it does so really fast, as in thousands of times per second. Usually, sample rates range from 8,000 Hz to 44,100 Hz, although I've seen sample rates from 4,000 Hz to 110,000 Hz. When the sound is mono, it reads one number per sample; when the sample is stereo, it reads two numbers per sample. So, a sample includes either one channel or two channels of data.
- Block Align: This is the most complicated one. Block Align is a number that tells the program how much it should read (how many bytes) to get a complete sample. The sample can be 8- or 16-bit, and also Stereo or Mono.
It is calculated using this formula: BlockAlign = BitsPerSample * Channels / 8
So, if you have a sound that is 8 bits, and mono, the block align should be 1 byte; if your sound is 16 bits, and stereo, it should be 4 bytes.
- Average Bytes Per Second: Yes you guessed it, this tells the program how many bytes are in one second for this sound format.
The easiest way to calculate this number is with this formula: AvgBytesPerSec = SamplesPerSec * BlockAlign
- Frequency: A frequency is how many impulses are in one second. A frequency is represented by hertz (Hz). For example, if the frequency is 1000 Hz, it means that the sound has 1000 impulses per second.
- Impulse: An impulse looks like a sinus (or cosinus). When represented graphically, it looks like this:
A wave file has a header that is composed of three individual small headers, and the wave data (raw data). These are the headers:
Private Type FileHeader lRiff As Long lFileSize As Long lWave As Long lFormat As Long lFormatLength As Long End Type Private Type WaveFormat wFormatTag As Integer nChannels As Integer nSamplesPerSec As Long nAvgBytesPerSec As Long nBlockAlign As Integer wBitsPerSample As Integer End Type Private Type ChunkHeader lType As Long lLen As Long End Type
They are written in the wave file in the same order you see in the code above.
First is the File Header, which tells the program reading the file that this file IS a wave. You have the total file size, wave format identifier, the format identifier, and the length of the wave format structure, following the wave structure itself.
The last part of the header is the Chunk Header. The wave data (raw data) does not have to be in one big chunk; it can be in multiple chunks, as long as the data is preceded by the chunk header.
Writing to a wave file is the easiest ever. You just fill in the values in the structures, and then write them in the file one by one, like here:
Private Sub WaveWriteHeader(ByVal OutFileNum As Integer, _ WaveFmt As WaveFormat) Dim header As FileHeader Dim chunk As ChunkHeader With header .lRiff = &H46464952 ' "RIFF" .lFileSize = 0 .lWave = &H45564157 ' "WAVE" .lFormat = &H20746D66 ' "fmt " .lFormatLength = Len(WaveFmt) End With chunk.lType = &H61746164 ' "data" chunk.lLen = 0 Put #OutFileNum, 1, header Put #OutFileNum, , WaveFmt Put #OutFileNum, , chunk End Sub
If you look at the code closely, you will see that the variable lFileSize of the FileHeader structure is 0, and the lLen of ChunkHeader is 0. This is because you did not write the wave data yet, so you don't know the sizes.
Right after you write the header information (having the two variables of 0), you have to write the wave data. When you are done, you will know the exact length of the file, and the length of the wave data.