Power-law Distribution in Encoded MFCC Frames of Speech, Music, and Environmental Sound Signals

dc.contributor.author
Haro, M.
dc.contributor.author
Serrà, J.
dc.contributor.author
Corral, A.
dc.contributor.author
Herrera, P.
dc.date.accessioned
2020-11-12T10:47:57Z
dc.date.accessioned
2024-09-19T14:34:22Z
dc.date.available
2020-11-12T10:47:57Z
dc.date.available
2024-09-19T14:34:22Z
dc.date.issued
2012-01-01
dc.identifier.uri
http://hdl.handle.net/2072/377748
dc.description.abstract
Many sound-related applications use Mel-Frequency Cepstral Coefficients (MFCC) to describe audio timbral content. Most of the research efforts dealing with MFCCs have been focused on the study of different classification and clustering algorithms, the use of complementary audio descriptors, or the effect of different distance measures. The goal of this paper is to focus on the statistical properties of the MFCC descriptor itself. For that purpose, we use a simple encoding process that maps a short-time MFCC vector to a dictionary of binary code-words. We study and characterize the rank-frequency distribution of such MFCC code-words, considering speech, music, and environmental sound sources. We show that, regardless of the sound source, MFCC code-words follow a shifted power-law distribution. This implies that there are a few code-words that occur very frequently and many that happen rarely. We also observe that the inner structure of the most frequent code-words has characteristic patterns. For instance, close MFCC coefficients tend to have similar quantization values in the case of music signals. Finally, we study the rank-frequency distributions of individual music recordings and show that they present the same type of heavy-tailed distribution as found in the large-scale databases. This fact is exploited in two supervised semantic inference tasks: genre and instrument classification. In particular, we obtain similar classification results as the ones obtained by considering all frames in the recordings by just using 50 (properly selected) frames. Beyond this particular example, we believe that the fact that MFCC frames follow a power-law distribution could potentially have important implications for future audio-based applications.
eng
dc.format.extent
8 p.
cat
dc.language.iso
eng
cat
dc.relation.ispartof
Proceedings of the 21st international conference on World Wide Web
cat
dc.rights
L'accés als continguts d'aquest document queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons:http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.source
RECERCAT (Dipòsit de la Recerca de Catalunya)
dc.subject.other
Matemàtiques
cat
dc.title
Power-law Distribution in Encoded MFCC Frames of Speech, Music, and Environmental Sound Signals
cat
dc.type
info:eu-repo/semantics/article
cat
dc.type
info:eu-repo/semantics/publishedVersion
cat
dc.subject.udc
51
cat
dc.embargo.terms
cap
cat
dc.identifier.doi
10.1145/2187980.2188220
cat
dc.rights.accessLevel
info:eu-repo/semantics/openAccess


Documents

ACorral11MaRcAt.pdf

1.762Mb PDF

This item appears in the following Collection(s)

CRM Articles [656]