Short- and long-term speech features for hybrid HMM-i-vector based speaker diarization system

Inicio | ¿Qué es? | Contacto

English | Català

Consultar RECERCAT

Por comunidades y
colecciones Por fecha Por autores Por títulos Por temas (CDU)

Consultar departamento

Por fecha Por autores Por títulos Por temas (CDU)

Estadisticas

Del documento Todo RECERCAT

Mi RECERCAT

Entrar Alertas por correo-e

Directorio de otros repositorios

RECERCAT Principal > Universitat Politècnica de Catalunya > Documents de recerca > Visualizar documento

Para acceder a los documentos con el texto completo, por favor, siga el siguiente enlace: http://hdl.handle.net/2117/101681

dc.contributor	Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.contributor	Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
dc.contributor.author	Zewoudie, Abraham Woubie
dc.contributor.author	Luque, Jordi
dc.contributor.author	Hernando Pericás, Francisco Javier
dc.date	2016
dc.identifier.citation	Zewoudie, A., Jordi Luque, Hernando, J. Short- and long-term speech features for hybrid HMM-i-vector based speaker diarization system. A: The Speaker and Language Recognition Workshop. "ODYSSEY 2016 - The Speaker and Language Recognition Workshop". Bilbao: 2016, p. 400-406.
dc.identifier.citation	10.21437/Odyssey.2016-58
dc.identifier.uri	http://hdl.handle.net/2117/101681
dc.description.abstract	i-vectors have been successfully applied over the last years in speaker recognition tasks. This work aims at assessing the suitability of i-vector modeling within the frame of speaker diarization task. In such context, a weighted cosine-distance between two different sets of i-vectors is proposed for speaker clustering. Speech clusters generated by Viterbi segmentation are first modeled by two different i-vectors. Whilst the first i-vector represents the distribution of the commonly used short-term Mel Frequency Cepstral Coefficients, the second one depicts a selection of voice quality and prosodic features. In order to combine both short- and long-term speech statistics, the cosine-distance scores of those two i-vectors are linearly weighted to obtain a unique similarity score. The final fused score is then used as speaker clustering distance. Our experimental results on two different evaluation sets of the Augmented Multi-party Interaction corpus show the suitability of combining both sources of information within the i-vector space. Our experimental results show that the use of i-vector based clustering technique provide a significant improvement, in terms of diarization error rate, than those based on Gaussian Mixture Modeling technique. Furthermore, this work also reports a significant speaker error reduction by augmenting short-term based i-vector clustering with a second i-vector estimated from voice quality and prosody related speech features.
dc.description.abstract	Peer Reviewed
dc.language.iso	eng
dc.relation	http://www.isca-speech.org/archive/Odyssey_2016/pdfs/18.pdf
dc.relation	info:eu-repo/grantAgreement/EC/H2020/645323/EU/BIg Speech data analytics for cONtact centres/BISON
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject	Automatic speech recognition
dc.subject	i-vectors
dc.subject	Speaker recognition
dc.subject	Speaker error reduction
dc.subject	Reconeixement automàtic de la parla
dc.title	Short- and long-term speech features for hybrid HMM-i-vector based speaker diarization system
dc.type	info:eu-repo/semantics/publishedVersion
dc.type	info:eu-repo/semantics/conferenceObject

Mostrar el registro sencillo del ítem

Documentos relacionados

Otros documentos del mismo autor/a

The use of long-term features for GMM- and i-vector-based speaker diarization systems

Zewoudie, Abraham Woubie; Luque, Jordi; Hernando Pericás, Francisco Javier

Using voice-quality measurements with prosodic and spectral features for speaker diarization

Zewoudie, Abraham Woubie; Luque, Jordi; Hernando Pericás, Francisco Javier

Jitter and Shimmer measurements for speaker diarization

Zewoudie, Abraham Woubie; Luque, Jordi; Hernando Pericás, Francisco Javier

Simultaneous speech detection with spatial features for speaker diarization

Zelenak, Martin; Segura Perales, Carlos; Luque, Jordi; Hernando Pericás, Francisco Javier

Multimodal identification and localization of users in a smart environment

Salah, Albert Ali; Morros Rubió, Josep Ramon; Luque, Jordi; Segura Perales, Carlos; Hernando Pericás, Francisco Javier; Ambekar, Onkar; Schouten, Ben; Pauwels, Eric

Accesibilidad | Aviso legal | Política de Cookies | Documentos de uso interno

Coordinación

Patrocinio