TASE: Task-Aware Speech Enhancement for Wake-Up Word Detection in Voice Assistants

dc.contributor.author
Cámbara, Guillermo
dc.contributor.author
López, Fernando
dc.contributor.author
Bonet, David
dc.contributor.author
Gómez, Pablo
dc.contributor.author
Segura, Carlos
dc.contributor.author
Farrús, Mireia
dc.contributor.author
Luque, Jordi
dc.date.issued
2022-03-09T13:27:10Z
dc.date.issued
2022-03-09T13:27:10Z
dc.date.issued
2022-02-14
dc.date.issued
2022-03-09T13:27:10Z
dc.identifier
2076-3417
dc.identifier
https://hdl.handle.net/2445/183953
dc.identifier
719156
dc.description.abstract
Wake-up word spotting in noisy environments is a critical task for an excellent user experience with voice assistants. Unwanted activation of the device is often due to the presence of noises coming from background conversations, TVs, or other domestic appliances. In this work, we propose the use of a speech enhancement convolutional autoencoder, coupled with on-device keyword spotting, aimed at improving the trigger word detection in noisy environments. The end-to-end system learns by optimizing a linear combination of losses: a reconstruction-based loss, both at the log-mel spectrogram and at the waveform level, as well as a specific task loss that accounts for the cross-entropy error reported along the keyword spotting detection. We experiment with several neural network classifiers and report that deeply coupling the speech enhancement together with a wake-up word detector, e.g., by jointly training them, significantly improves the performance in the noisiest conditions. Additionally, we introduce a new publicly available speech database recorded for the Telefónica's voice assistant, Aura. The OK Aura Wake-up Word Dataset incorporates rich metadata, such as speaker demographics or room conditions, and comprises hard negative examples that were studiously selected to present different levels of phonetic similarity with respect to the trigger words 'OK Aura'. Keywords: speech enhancement; wake-up word; keyword spotting; deep learning; convolutional neural network
dc.format
16 p.
dc.format
application/pdf
dc.format
application/pdf
dc.language
eng
dc.publisher
MDPI
dc.relation
Reproducció del document publicat a: https://doi.org/10.3390/app12041974
dc.relation
Applied Sciences, 2022, vol. 12, num. 4, p. 1974
dc.relation
https://doi.org/10.3390/app12041974
dc.relation
info:eu-repo/grantAgreement/EC/H2020/101042315/EU//INGENIOUS
dc.relation
info:eu-repo/grantAgreement/EC/H2020/871793/EU//ACCORDION
dc.relation
info:eu-repo/grantAgreement/EC/H2020/833435/EU//INGENIOUS
dc.rights
cc-by (c) Cámbara, Guillermo et al., 2022
dc.rights
https://creativecommons.org/licenses/by/4.0/
dc.rights
info:eu-repo/semantics/openAccess
dc.source
Articles publicats en revistes (Filologia Catalana i Lingüística General)
dc.subject
Reconeixement automàtic de la parla
dc.subject
Lingüística computacional
dc.subject
Aprenentatge automàtic
dc.subject
Xarxes neuronals convolucionals
dc.subject
Automatic speech recognition
dc.subject
Computational linguistics
dc.subject
Machine learning
dc.subject
Convolutional neural networks
dc.title
TASE: Task-Aware Speech Enhancement for Wake-Up Word Detection in Voice Assistants
dc.type
info:eu-repo/semantics/article
dc.type
info:eu-repo/semantics/publishedVersion


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)