Automatic viseme vocabulary construction to enhance continuous lip-reading

Home | About RECERCAT | Contact

Català | Castellano

All of RECERCAT

By Communities &
Collections By Defense Date By Authors By Titles By Subject

This Collection

By Defense Date By Authors By Titles By Subject

Statistics

View Statistics All RECERCAT

My RECERCAT

Other repositories directory

RECERCAT Home > Universitat Pompeu Fabra > Articles, congressos, llibres > View document

To access the full text documents, please follow this link: http://hdl.handle.net/10230/32161

Title:	Automatic viseme vocabulary construction to enhance continuous lip-reading
Author:	Fernandez-Lopez, Adriana; Sukno, Federico Mateo
Abstract:	Comunicació presentada a: 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2017), celebrat del 27 de febrer a l'1 de març de 2017 a Porto, Portugal.
Abstract:	Speech is the most common communication method between humans and involves the perception of both auditory and visual channels. Automatic speech recognition focuses on interpreting the audio signals, but it has been demonstrated that video can provide information that is complementary to the audio. Thus, the study of automatic lip-reading is important and is still an open problem. One of the key challenges is the definition of the visual elementary units (the visemes) and their vocabulary. Many researchers have analyzed the importance of the phoneme to viseme mapping and have proposed viseme vocabularies with lengths between 11 and 15 visemes. These viseme vocabularies have usually been manually defined by their linguistic properties and in some cases using decision trees or clustering techniques. In this work, we focus on the automatic construction of an optimal viseme vocabulary based on the association of phonemes with similar appearance. To this end, we construct an automatic system that uses local appearance descriptors to extract the main characteristics of the mouth region and HMMs to model the statistic relations of both viseme and phoneme sequences. To compare the performance of the system different descriptors (PCA, DCT and SIFT) are analyzed. We test our system in a Spanish corpus of continuous speech. Our results indicate that we are able to recognize approximately 58% of the visemes, 47% of the phonemes and 23% of the words in a continuous speech scenario and that the optimal viseme vocabulary for Spanish is composed by 20 visemes.
Abstract:	This work is partly supported by the Spanish Ministry of Economy and Competitiveness under the Ramon y Cajal fellowships and the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502), and the Kristina project funded by the European Union Horizon 2020 research and innovation programme under grant agreement No 645012.
Subject(s):	-Lip-reading -Speech recognition -Visemes -Confusion Matrix
Rights:	© 2017 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
Document type:	Conference Object Article - Published version
Published by:	SCITEPRESS
Share:

Show full item record

Related documents

Other documents of the same author

Towards estimating the upper bound of visual-speech recognition: the visual lip-reading feasibility databas

Fernandez-Lopez, Adriana; Martinez, Oriol; Sukno, Federico Mateo

Survey on automatic lip-reading in the era of deep learning

Fernandez-Lopez, Adriana; Sukno, Federico Mateo

Bilinear models for spatio-temporal point distribution analysis: application to extrapolation of left ventricular, biventricular and whole heart cardiac dynamics

Hoogendoorn, Corné; Sukno, Federico Mateo; Ordás, Sebastián; Frangi Caregnato, Alejandro

The Multiscenario Multienvironment BioSecure Multimodal Database (BMDB)

Ortega-Garcia, Javier; Fierrez, Julian; Alonso-Fernández, Fernando; Galbally, Javier; Freire, Manuel R.; González-Rodríguez, Joaquín; García-Mateo, Carmen; Alba-Castro, José-Luís; González-Agulla, Elisardo; Otero-Muras, Enrique; García-Salicetti, Sonia; Allano, Lorene; Ly-Van, Bao; Dorizzi, Bernadette; Kittler, Josef; Bourlai, Thirimachos; Poh, Norman; Deravi, Farzin; Ng, Ming W.R.; Fairhurst, Michael; Hennebert, Jean; Humm, Andreas; Tistarelli, Massimo; Brodo, Linda; Richiardi, Jonas; Drygajlo, Andrzej; Ganster, Harald; Sukno, Federico Mateo; Pavani, Sri-Kaushik; Akarun, Lale; Savran, Arman; Frangi Caregnato, Alejandro

A multimodal annotation schema for non-verbal affective analysis in the health-care domain

Sukno, Federico Mateo; Domínguez Bajo, Mónica; Ruiz, Adrià; Schiller, Dominik; Lingenfelser, Florian; Pragst, Louisa; Kamateri, Eleni; Vrochidis, Stefanos

Accesibility | Legal note | Cookies Policy

Coordination

Supporters