On the use of agglomerative and spectral clustering in speaker diarization of meetings

Inicio | ¿Qué es? | Contacto

English | Català

Consultar RECERCAT

Por comunidades y
colecciones Por fecha Por autores Por títulos Por temas (CDU)

Consultar departamento

Por fecha Por autores Por títulos Por temas (CDU)

Estadisticas

Del documento Todo RECERCAT

Mi RECERCAT

Entrar Alertas por correo-e

Directorio de otros repositorios

RECERCAT Principal > Universitat Politècnica de Catalunya > Documents de recerca > Visualizar documento

Para acceder a los documentos con el texto completo, por favor, siga el siguiente enlace: http://hdl.handle.net/2117/18147

Título:	On the use of agglomerative and spectral clustering in speaker diarization of meetings
Autor/a:	Hernando Pericás, Francisco Javier
Otros autores:	Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions; Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
Abstract:	In this paper, we present a clustering algorithm for speaker diarization based on spectral clustering. State-of-the-art diariza- tion systems are based on agglomerative hierarchical clustering using Bayesian Information Criterion and other statistical met- rics among clusters which results in a high computational cost and in a time demanding approach. Our proposal avoids the use of such metrics applying Euclidean distances on the eigenvec- tors computed from the normalized graph Laplacian. A hybrid system is proposed in which HMM/GMM modelling and Viterbi alignment are still applied, but the BIC for merging and stop- ping criterion are substituted by a spectral clustering algorithm. Once an initial segmentation is obtained and the clustering align- ment is computed using the Viterbi algorithm, the remaining clusters are modeled by stacking the means of the Gaussians in a super vector. In such a space single value decomposition of the associated normalized graph Laplacian is computed. Most similar clusters are merged based on the Euclidean distances in resulting eigenspace. Cluster number estimation is based on analyzing eigenstructure of the similarity matrix by selecting a threshold on the eigenvalues gap. In experiments, this ap- proach has obtained a comparable performance to the traditional AHC+BIC approach on the Rich Transcription conference eval- uation data. Although it still relies on Gaussian modelling of clusters and Viterbi alignment, the proposed approach leads to a system which runs several times faster than traditional one.
Abstract:	Peer Reviewed
Materia(s):	-Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic -Automatic speech recognition -Reconeixement automàtic de la parla
Derechos:
Tipo de documento:	Artículo - Versión publicada Objeto de conferencia
Compartir:

Mostrar el registro completo del ítem

Documentos relacionados

Otros documentos del mismo autor/a

On the improvement of speaker diarization by detecting overlapped speech

Hernando Pericás, Francisco Javier; Hernando Pericás, Francisco Javier

The detection of overlapping speech with prosodic features for speaker diarization

Zelenak, Martin; Hernando Pericás, Francisco Javier

Two-source acoustic event detection and localization: online implementation in a smart-room

Butko, Taras; Gonzalez Pla, Fran; Segura Perales, Carlos; Nadeu Camprubí, Climent; Hernando Pericás, Francisco Javier

Real-time GPU-based face detection in HD video sequences

Oro, David; Fernández, Carles; Rodriguez Saeta, Javier; Martorell Bofill, Xavier; Hernando Pericás, Francisco Javier

A conversation analysis framework using speech recognition and naïve bayes classification for construction process monitoring

Zhang, T.; Lee, Y. C.; Zhu, Y.; Hernando Pericás, Francisco Javier

Accesibilidad | Aviso legal | Política de Cookies | Documentos de uso interno

Coordinación

Patrocinio