Power-law Distribution in Encoded MFCC Frames of Speech, Music, and Environmental Sound Signals

Autor/a

Haro, M.

Serrà, J.

Corral, A.

Herrera, P.

Fecha de publicación

2012-01-01



Resumen

Many sound-related applications use Mel-Frequency Cepstral Coefficients (MFCC) to describe audio timbral content. Most of the research efforts dealing with MFCCs have been focused on the study of different classification and clustering algorithms, the use of complementary audio descriptors, or the effect of different distance measures. The goal of this paper is to focus on the statistical properties of the MFCC descriptor itself. For that purpose, we use a simple encoding process that maps a short-time MFCC vector to a dictionary of binary code-words. We study and characterize the rank-frequency distribution of such MFCC code-words, considering speech, music, and environmental sound sources. We show that, regardless of the sound source, MFCC code-words follow a shifted power-law distribution. This implies that there are a few code-words that occur very frequently and many that happen rarely. We also observe that the inner structure of the most frequent code-words has characteristic patterns. For instance, close MFCC coefficients tend to have similar quantization values in the case of music signals. Finally, we study the rank-frequency distributions of individual music recordings and show that they present the same type of heavy-tailed distribution as found in the large-scale databases. This fact is exploited in two supervised semantic inference tasks: genre and instrument classification. In particular, we obtain similar classification results as the ones obtained by considering all frames in the recordings by just using 50 (properly selected) frames. Beyond this particular example, we believe that the fact that MFCC frames follow a power-law distribution could potentially have important implications for future audio-based applications.

Tipo de documento

Artículo
Versión publicada

Lengua

Inglés

Materias CDU

51 - Matemáticas

Palabras clave

Matemàtiques

Páginas

8 p.

Es versión de

Proceedings of the 21st international conference on World Wide Web

Documentos

ACorral11MaRcAt.pdf

1.762Mb

 

Derechos

L'accés als continguts d'aquest document queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons:http://creativecommons.org/licenses/by-nc-nd/4.0/

Este ítem aparece en la(s) siguiente(s) colección(ones)

CRM Articles [656]