Para acceder a los documentos con el texto completo, por favor, siga el siguiente enlace: http://hdl.handle.net/2117/101681
dc.contributor | Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions |
---|---|
dc.contributor | Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla |
dc.contributor.author | Zewoudie, Abraham Woubie |
dc.contributor.author | Luque, Jordi |
dc.contributor.author | Hernando Pericás, Francisco Javier |
dc.date | 2016 |
dc.identifier.citation | Zewoudie, A., Jordi Luque, Hernando, J. Short- and long-term speech features for hybrid HMM-i-vector based speaker diarization system. A: The Speaker and Language Recognition Workshop. "ODYSSEY 2016 - The Speaker and Language Recognition Workshop". Bilbao: 2016, p. 400-406. |
dc.identifier.citation | 10.21437/Odyssey.2016-58 |
dc.identifier.uri | http://hdl.handle.net/2117/101681 |
dc.description.abstract | i-vectors have been successfully applied over the last years in speaker recognition tasks. This work aims at assessing the suitability of i-vector modeling within the frame of speaker diarization task. In such context, a weighted cosine-distance between two different sets of i-vectors is proposed for speaker clustering. Speech clusters generated by Viterbi segmentation are first modeled by two different i-vectors. Whilst the first i-vector represents the distribution of the commonly used short-term Mel Frequency Cepstral Coefficients, the second one depicts a selection of voice quality and prosodic features. In order to combine both short- and long-term speech statistics, the cosine-distance scores of those two i-vectors are linearly weighted to obtain a unique similarity score. The final fused score is then used as speaker clustering distance. Our experimental results on two different evaluation sets of the Augmented Multi-party Interaction corpus show the suitability of combining both sources of information within the i-vector space. Our experimental results show that the use of i-vector based clustering technique provide a significant improvement, in terms of diarization error rate, than those based on Gaussian Mixture Modeling technique. Furthermore, this work also reports a significant speaker error reduction by augmenting short-term based i-vector clustering with a second i-vector estimated from voice quality and prosody related speech features. |
dc.description.abstract | Peer Reviewed |
dc.language.iso | eng |
dc.relation | http://www.isca-speech.org/archive/Odyssey_2016/pdfs/18.pdf |
dc.relation | info:eu-repo/grantAgreement/EC/H2020/645323/EU/BIg Speech data analytics for cONtact centres/BISON |
dc.rights | info:eu-repo/semantics/openAccess |
dc.subject | Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic |
dc.subject | Automatic speech recognition |
dc.subject | i-vectors |
dc.subject | Speaker recognition |
dc.subject | Speaker error reduction |
dc.subject | Reconeixement automàtic de la parla |
dc.title | Short- and long-term speech features for hybrid HMM-i-vector based speaker diarization system |
dc.type | info:eu-repo/semantics/publishedVersion |
dc.type | info:eu-repo/semantics/conferenceObject |