Wav2Pix: speech-conditioned face generation using generative adversarial networks

Home | About RECERCAT | Contact

Català | Castellano

All of RECERCAT

By Communities &
Collections By Defense Date By Authors By Titles By Subject

This Collection

By Defense Date By Authors By Titles By Subject

Statistics

View Statistics All RECERCAT

My RECERCAT

Other repositories directory

RECERCAT Home > Universitat Politècnica de Catalunya > Documents de recerca > View document

To access the full text documents, please follow this link: http://hdl.handle.net/2117/167073

Title:	Wav2Pix: speech-conditioned face generation using generative adversarial networks
Author:	Cardoso Duarte, Amanda; Roldan, Francisco; Tubau, Miquel; Escur, Janna; Pascual de la Puente, Santiago; Salvador Aguilera, Amaia; Mohedano, Eva; McGuinness, Kevin; Torres Viñals, Jordi; Giró Nieto, Xavier
Other authors:	Universitat Politècnica de Catalunya. Doctorat en Teoria del Senyal i Comunicacions; Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors; Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions; Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla; Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions; Universitat Politècnica de Catalunya. GPI - Grup de Processament d'Imatge i Vídeo
Abstract:	Speech is a rich biometric signal that contains information about the identity, gender and emotional state of the speaker. In this work, we explore its potential to generate face images of a speaker by conditioning a Generative Adversarial Network (GAN) with raw speech input. We propose a deep neural network that is trained from scratch in an end-to-end fashion, generating a face directly from the raw speech waveform without any additional identity information (e.g reference image or one-hot encoding). Our model is trained in a self-supervised approach by exploiting the audio and visual signals naturally aligned in videos. With the purpose of training from video data, we present a novel dataset collected for this work, with high-quality videos of youtubers with notable expressiveness in both the speech and visual signals.
Abstract:	Peer Reviewed
Subject(s):	-Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic -Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Reconeixement de formes -Machine learning -Computer vision -Face -Videos -Generators -Visualization -Feature extraction -Generative adversarial networks -Deep learning -Adversarial learning -Face synthesis -Computer vision. -Aprenentatge automàtic -Visió per ordinador
Rights:
Document type:	Article - Published version Conference Object
Published by:	Institute of Electrical and Electronics Engineers (IEEE)
Share:

Show full item record

All of RECERCAT

This Collection

Statistics

My RECERCAT

Related documents

Other documents of the same author