Robust feature extraction for multimodal speaker ID system – The experts’ room

Home | About RECERCAT | Contact

Català | Castellano

All of RECERCAT

By Communities &
Collections By Defense Date By Authors By Titles By Subject

This Collection

By Defense Date By Authors By Titles By Subject

Statistics

View Statistics All RECERCAT

My RECERCAT

Other repositories directory

RECERCAT Home > Universitat Politècnica de Catalunya > Tesines i projectes i treballs de final de carrera > View document

To access the full text documents, please follow this link: http://hdl.handle.net/2099.1/8362

Title:	Robust feature extraction for multimodal speaker ID system – The experts’ room
Author:	Hernanz Nogueras, Sergi
Other authors:	Narayanan, Shrikanth
Abstract:	Projecte final de carrera fet en col.laboració amb l'University of Southern California
Abstract:	All along the current project, the speaker recognition is being reviewed. First simulations in this work use the latest ‘state of the art’ algorithms, and later new approaches and lots of modifications are used. Multimodality is the main idea to achieve better results. The new multimodal data supplied to the speaker recognition system will be articulatory features and video+voice source localization in the meeting room scenario. Some articulatory features have not been widely used for speech analysis so the correct extraction methods are still not developed. On the other hand, voice source and video spatial localization algorithms are known and only the integration methods have to be defined. Theoretical review and a study about integration will follow before finally selecting an algorithm. Machine learning techniques are applied to extract articulatory features, which perform a surprisingly right classification. The usability of those feature extractor outputs for the speaker recognition issue is not that clear, but very important conclusions are set about how the extraction process can affect the posterior usage and how other extraction methods could be approached. During the work, articulatory features demonstrate to be less affected by noise than the baseline MFCC+GMM approach, but the correct extraction methods are still not available. Even using the baseline extraction methods based on MLP, a classification is possible using the articulatory features, and complementarities with baseline methods are demonstrated. The improvement of the whole system adding articulatory features is very small, but demonstrates their usability. The whole process of the articulatory feature integration can surely be reviewed expecting successful results in the future. Due to an extended analysis of how noise poisons the speech features, very concrete conclusions are set about noise rejection and affection. By plotting how the system works against different SNR conditions, behaviors of some methods are explained. In low SNR conditions, very simple changes in the algorithms improve the overall performance, and reveal the lack of noiseoriented design of the baseline. The most of the methods approached in the current work were finally applied to the meeting room scenario at USC. An encouraging but small performance increase was achieved, and so the aim of the current work was considered realized. The trade-off between the spent effort and the small improvement is to be reviewed with further approaches and work.
Subject(s):	-Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic -Speech processing systems -Processament de la parla
Rights:	Attribution-NonCommercial-NoDerivs 3.0 Spain http://creativecommons.org/licenses/by-nc-nd/3.0/es/
Document type:	Bachelor Thesis
Published by:	Universitat Politècnica de Catalunya
Share:

Show full item record

Accesibility | Legal note | Cookies Policy

Coordination

Supporters