Sound fingerprinting and .NET Part 1

By Wikipedia’s definition, an acoustic fingerprint is “a condensed digital summary, a fingerprint, if you will deterministically generated from an audio signal, that can be used to identify an audio sample or quickly locate similar items in an audio database.”

Think of audio fingerprints as you would a human fingerprint. A human fingerprint can be used to identify a person, as each is unique to each individual. The same can be said for audio recordings – each audio fingerprint has a certain signature that, while similar to other audio files, is also unique to that specific piece of audio.

A great example of audio fingerprinting in the wild is to consider the popular Shazam app. Using Shazam, a person can be anywhere – on the street, in their home, at a restaurant – and hear a song. They can use the app to “listen” to the song they are hearing and then receive information about the song, such as its title and who performed the song. This process is achieved using audio fingerprinting.

To better understand what an acoustic fingerprint is, let me explain some common terms:

A fingerprinting algorithm is a procedure that maps a large data item to a much shorter bit string keeping its fingerprint that uniquely identifies the original data. A deterministic algorithm will always produce the same output, with the underlying machine always passing through the same sequence of states. An audio signal is simply a representation of sound.

What is Acoustic Fingerprinting?

Acoustic fingerprinting can be used to identify songs, melodies, tunes, or even advertisements. Acoustic fingerprinting can also be used for sound effect library management, as well as video file identification.

Needless to say, acoustic fingerprints can be used to monitor the use of specific musical works and performances on a radio broadcast, records, CDs, streaming media, and peer-to-peer networks.

For this to work, you need to generate a signature from the audio, or else you won’t be able to search by sound. This can be achieved by creating a time-frequency graph called a spectrogram. A spectrogram – also known as a sonograph, voiceprint, or voicegram – is a visual representation of the spectrum of frequencies of a signal as it varies with time. The data is represented in a three-dimensional or 3D plot.

The pieces of audio are split into segments. Sometimes adjacent segments share a common time boundary, and sometimes adjacent segments may overlap, resulting in a graph that plots three dimensions of audio: frequency vs. amplitude (or intensity) vs. time.

An example of a spectrogram is shown below:

Sound Fingerprinting and Audio Fingerprinting in .Net

Image credit: Wikipedia

In the next article of this series, we will start building our project to identify audio input. You can read part two of this article by visiting Sound Fingerprinting and .Net Part 2.

Hannes DuPreez
Ockert J. du Preez is a passionate coder and always willing to learn. He has written hundreds of developer articles over the years detailing his programming quests and adventures. He has written the following books: Visual Studio 2019 In-Depth (BpB Publications) JavaScript for Gurus (BpB Publications) He was the Technical Editor for Professional C++, 5th Edition (Wiley) He was a Microsoft Most Valuable Professional for .NET (2008–2017).

More by Author

Must Read