By Hiren Hindocha, CEO, Digital Nirvana
Audio fingerprinting-based content tracking has evolved rapidly and gained larger business adaption thanks to simpler underlying technologies as compared with digital video fingerprinting technology.
Audio fingerprinting as a means to detect and track advertisements and music albums in broadcast and internet radio has immense applications. For example, businesses can monitor their own ad campaigns and competitive companies to acquire an in-depth business intelligence.
The music industry can take advantage of audio fingerprinting in many ways, including album tracking, ratings and intellectual property protection. This provides music producers with an extensive opportunity to monetize their content.
Fingerprinting is based on the complex signal analysis that dates back to the early 20th century mathematical concept of Fourier Transformation. It is a technique used to represent a signal in time as the combination of the component base frequencies that comprise it. It starts by converting the time domain signal into the frequency domain signal. To illustrate the concept of signal decomposition, please see Figure 1 and Figure 2 below, representing spectral density of a 1 kHz sine wave and a music piece, respectively.
Figures 1: A 2D color plot highlights a straight line with significantly high amplitude at 1kHz across 10 seconds time scale, and a simple 2D plot with a peak at 1kHz and ambient noise peaks around it.
Figure 2: (a) 2D color plot with respect to time showing spectral density of a music piece captured over 10 seconds. There are multiple frequency components with varying amplitudes across time. (b) Illustrates a simple 2D spectral density plot with several high frequency peaks around 1kHz.
The next step in building a fingerprint is to identify peaks and lows in the time-frequency data and save it in a compact format. Figure 3 below illustrates peaks and lows marked for our sample music piece. The greater number of min-max points there are per fingerprint, the more accurate the precision and recall metrics will be. But more data points demand more space, and matching these large sets consumes more computing resources. In general, a trade-off occurs between desired accuracy and computational and storage requirements. Thus, another key component in the system after building the spectrogram is the data-compaction and pattern matching. Following the spectral analysis and data-compaction, the binary data is stored as fingerprint in a database for future comparison and matching. There are multiple techniques and efficient algorithms for spectral analysis and pattern matching, but detailing those lies outside the scope of this article. For simplicity, consider the min-max points in the spectrogram as data points in three-dimensional space, and these points with their relative positions are saved as fingerprint of the audio piece.
The RAD System
A RAD system, such as Digital-Nirvana’s, can automatically detect commercials, promos, and songs that are broadcast repeatedly across multiple radio stations and add that information to its fingerprint database for future detection. Highly efficient fingerprinting and detection algorithm guarantees 99.99% accuracy. RADs greatly reduces bulky set-up and configuration and provides a comprehensive Cloud-based solution to track broadcast radio.
By deploying a radio station and publishing fingerprints to the cloud, hundreds of stations can be added with great ease. Considering the importance of gaining business intelligence and protecting intellectual property, small and medium-sized businesses are showing great interest in professional audio tracking solutions.
A rich set of web APIs enable users to design and deploy custom information dashboards, statistic summaries, and alerts regarding monitored data.
The RADs workflow has three main modules: one capture and processing module, the central fingerprint server, and lastly the user-facing application server. The capture station records and processes broadcast radio by segmenting it into overlapping parts. Typically each segment is 15 seconds in duration, with an overlap of five seconds on either side. The segment and overlap duration guarantee the highest level of detection accuracy possible. Each 15-second raw audio segment is sent for a multi-stage processing for fingerprint extraction. This typically consists of audio filtering, followed by spectral analysis and statistical data analysis to extract min-max points, and then data compaction. This compact binary data called the fingerprint is pushed on to the central database with additional metadata, such as the recording time, channel and custom user data. The capture station continuously records, segments and publishes fingerprints to the database server along with recording metadata. In a real-world setup, multiple such recording stations continually send fingerprint data to the central database server for processing.
The fingerprint server forms the brains of the entire RAD system. For each fingerprint posted by a capture station, matching fingerprints already in the system are tracked and identified. The matching algorithm typically works by comparing the spatio-temporal properties of fingerprints and assigns numeric probability scores. This exhaustive list is sent to a data filtering module where high probability matches with proper user tag are presented as identified. Those without tag information are processed and presented to users for a one-time review and tagging. The newly tagged data and fingerprint information are saved to the database for future comparison and identification.
Once data is processed by the fingerprint server, it’s sent to the application server where the user can organize and filter these reports by station, content type or add custom fields. Users can also perform key-word search based on tags or custom fields to retrieve data quickly and effectively from a large dataset.
Professional audio tracking applications have added a new dimension to data gathering and IP protection for the radio industry on the whole. Over the last decade, numerous businesses have deployed these technologies to continuously monitor a staggering number of radio stations across the globe. Audio fingerprinting presents an efficient solution to the unauthorized distribution of music and for businesses eager to elicit immediate and actionable business intelligence regarding advertising expenditures.
RBR + TVBR
About the author: Hiren Hindocha is the co-founder and President/CEO of Fremont, Calif.-based Digital Nirvana. Prior to co-founding the company, he served as VP of eCommerce at Go.com. He holds an MS in Computer Science from Cleveland State University and a BE in Electrical Engineering from India’s Jawaharlal Nehru Technological University, Hyderabad.
Technically speaking, we give you the insight, tools and knowledge to keep executive management aware of the tech trends and developments they need to know. The RBR + TVBR Weekly Tech Roundup – sent to your inbox by request each Tuesday.
Exclusive. Inclusive. More reasons to consume #RadiosBestRead.