2026-04-18T05:17:36Zhttps://recercat.cat/oai/request

oai:recercat.cat:2117/1804572026-02-07T09:30:50Zcom_2072_1033col_2072_452950

Time-domain speech enhancement using generative adversarial networks Pascual de la Puente, Santiago Serra, Joan Bonafonte Cávez, Antonio Universitat Politècnica de Catalunya. Doctorat en Teoria del Senyal i Comunicacions Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla Àrees temàtiques de la UPC::Enginyeria de la telecomunicació Speech processing systems Neural networks (Computer science) Speech enhancement Audio transformation Generative adversarial network Neural networks Processament de la parla Reconeixement automàtic de la parla Xarxes neuronals (Informàtica) Speech enhancement improves recorded voice utterances to eliminate noise that might be impeding their intelligibility or compromising their quality. Typical speech enhancement systems are based on regression approaches that subtract noise or predict clean signals. Most of them do not operate directly on waveforms. In this work, we propose a generative approach to regenerate corrupted signals into a clean version by using generative adversarial networks on the raw signal. We also explore several variations of the proposed system, obtaining insights into proper architectural choices for an adversarially trained, convolutional autoencoder applied to speech. We conduct both objective and subjective evaluations to assess the performance of the proposed method. The former helps us choose among variations and better tune hyperparameters, while the latter is used in a listening experiment with 42 subjects, confirming the effectiveness of the approach in the real world. We also demonstrate the applicability of the approach for more generalized speech enhancement, where we have to regenerate voices from whispered signals. Peer Reviewed Postprint (author's final draft) 2019-11-01 Article Pascual, S.; Serra, J.; Bonafonte, A. Time-domain speech enhancement using generative adversarial networks. "Speech communication", 1 Novembre 2019, vol. 114, p. 10-21. 0167-6393 https://hdl.handle.net/2117/180457 10.1016/j.specom.2019.09.001 eng https://www.sciencedirect.com/science/article/abs/pii/S0167639319301359 info:eu-repo/grantAgreement/MINECO//TEC2015-69266-P/ES/TECNOLOGIAS DE APRENDIZAJE PROFUNDO APLICADAS AL PROCESADO DE VOZ Y AUDIO/ Open Access 12 p. application/pdf