2026-04-14T06:55:54Zhttps://recercat.cat/oai/request

oai:recercat.cat:2117/1129882026-01-14T05:51:24Zcom_2072_1033col_2072_452950

LSTM neural network-based speaker segmentation using acoustic and language modelling India Massana, Miquel Àngel Rodríguez Fonollosa, José Adrián Hernando Pericás, Francisco Javier Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic Automatic speech recognition Neural networks (Neurobiology) Speaker segmentation Neural language modelling I-vectors Speaker factors LSTM neural networks Reconeixement automàtic de la parla Xarxes neuronals (Neurobiologia) This paper presents a new speaker change detection system based on Long Short-Term Memory (LSTM) neural networks using acoustic data and linguistic content. Language modelling is combined with two different Joint Factor Analysis (JFA) acoustic approaches: i-vectors and speaker factors. Both of them are compared with a baseline algorithm that uses cosine distance to detect speaker turn changes. LSTM neural networks with both linguistic and acoustic features have been able to produce a robust speaker segmentation. The experimental results show that our proposal clearly outperforms the baseline system. Peer Reviewed Postprint (published version) 2017 Conference lecture India, M., Fonollosa, José A. R., Hernando, J. LSTM neural network-based speaker segmentation using acoustic and language modelling. A: Annual Conference of the International Speech Communication Association. "INTERSPEECH 2017: 20-24 August 2017: Stockholm". Stockholm: International Speech Communication Association (ISCA), 2017, p. 2834-2838. 1990-9772 https://hdl.handle.net/2117/112988 10.21437/Interspeech.2017 eng http://www.isca-speech.org/archive/Interspeech_2017/pdfs/0407.PDF info:eu-repo/grantAgreement/EC/H2020/115902/EU/Remote Assessment of Disease and Relapse in Central Nervous System Disorders/RADAR-CNS Open Access 5 p. application/pdf International Speech Communication Association (ISCA)