UNFOLD: a memory-efficient speech recognizer using on-the-fly WFST composition

Home | About RECERCAT | Contact

Català | Castellano

All of RECERCAT

By Communities &
Collections By Defense Date By Authors By Titles By Subject

This Collection

By Defense Date By Authors By Titles By Subject

Statistics

View Statistics All RECERCAT

My RECERCAT

Other repositories directory

RECERCAT Home > Universitat Politècnica de Catalunya > Documents de recerca > View document

To access the full text documents, please follow this link: http://hdl.handle.net/2117/113149

Title:	UNFOLD: a memory-efficient speech recognizer using on-the-fly WFST composition
Author:	Yazdani Aminabadi, Reza; Arnau Montañés, José María; González Colás, Antonio María
Other authors:	Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors; Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors
Abstract:	Accurate, real-time Automatic Speech Recognition (ASR) requires huge memory storage and computational power. The main bottleneck in state-of-the-art ASR systems is the Viterbi search on a Weighted Finite State Transducer (WFST). The WFST is a graph-based model created by composing an Acoustic Model (AM) and a Language Model (LM) offline. Offline composition simplifies the implementation of a speech recognizer as only one WFST has to be searched. However, the size of the composed WFST is huge, typically larger than a Gigabyte, resulting in a large memory footprint and memory bandwidth requirements. In this paper, we take a completely different approach and propose a hardware accelerator for speech recognition that composes the AM and LM graphs on-the-fly. In our ASR system, the fully-composed WFST is never generated in main memory. On the contrary, only the subset required for decoding each input speech fragment is dynamically generated from the AM and LM models. In addition to the direct benefits of this on-the-fly composition, the resulting approach is more amenable to further reduction in storage requirements through compression techniques. The resulting accelerator, called UNFOLD, performs the decoding in real-time using the compressed AM and LM models, and reduces the size of the datasets from more than one Gigabyte to less than 40 Megabytes, which can be very important in small form factor mobile and wearable devices. Besides, UNFOLD improves energy-efficiency by orders of magnitude with respect to CPUs and GPUs. Compared to a state-of-the-art Viterbi search accelerators, the proposed ASR system outperforms by providing 31x reduction in memory footprint and 28% energy savings on average.
Abstract:	Peer Reviewed
Subject(s):	-Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic -Automatic speech recognition -Speech recognition -Special purpose systems -Reconeixement automàtic de la parla
Rights:
Document type:	Article - Published version Conference Object
Published by:	Association for Computing Machinery (ACM)
Share:

Show full item record

All of RECERCAT

This Collection

Statistics

My RECERCAT

Related documents

Other documents of the same author