<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="static/style.xsl"?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-04-17T06:48:14Z</responseDate><request verb="GetRecord" identifier="oai:www.recercat.cat:2117/20204" metadataPrefix="oai_dc">https://recercat.cat/oai/request</request><GetRecord><record><header><identifier>oai:recercat.cat:2117/20204</identifier><datestamp>2025-07-17T00:55:18Z</datestamp><setSpec>com_2072_1033</setSpec><setSpec>col_2072_452950</setSpec></header><metadata><oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:doc="http://www.lyncode.com/xoai" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
   <dc:title>Modelling the effects of spontaneous speech in speech recognition</dc:title>
   <dc:creator>Shulz, Henrik</dc:creator>
   <dc:creator>Rodríguez Fonollosa, José Adrián</dc:creator>
   <dc:contributor>Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions</dc:contributor>
   <dc:contributor>Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla</dc:contributor>
   <dc:subject>Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic</dc:subject>
   <dc:subject>Automatic speech recognition</dc:subject>
   <dc:subject>Reconeixement automàtic de la parla</dc:subject>
   <dc:description>Intrinsic variability of the speaker in spontaneous speech&#xd;
remains a challenge to state of the art Automatic speech&#xd;
recognition (ASR). While planned speech exhibits a&#xd;
moderate variability, the significant variability of spontaneous&#xd;
speech is caused by situation, context, intention,&#xd;
emotion and listeners. This conditioning of speech is observable&#xd;
in terms of speaking rate and in feature space.&#xd;
We analysed broadcast news (BN) and broadcast conversational&#xd;
(BC) speech in terms of phoneme rate (PR) and&#xd;
feature space reduction (FSR), and contrasted both with&#xd;
the planned speech data. Strong statistically significant&#xd;
differences were revealed. We cluster the speech segments&#xd;
with respect to their degree of PR and FSR forming&#xd;
a set of variability classes, and induce the variability&#xd;
classes into the Hidden-Markov-Model (HMM) based&#xd;
acoustic model (AM).&#xd;
In recognition we follow two approaches: the first&#xd;
considers the variability class as context variable, the second&#xd;
relies on prior estimation of the variability class after&#xd;
the first pass of a multi-pass recognition system. Beside&#xd;
explicit modelling of the intrinsic speech variability&#xd;
of the speaker, we furthermore segregate the general&#xd;
speaker specific characteristics by means of speaker&#xd;
adaptive training (SAT) into feature space transforms using&#xd;
ConstrainedMaximumLikelihood Linear Regression&#xd;
(CMLLR), and apply the adaptive approach in third pass&#xd;
recognition.&#xd;
By approaching to model both within speaker variation&#xd;
and between speaker variation in spontaneous&#xd;
speech, we address two fundamental sources of speech variability that determine the performance of ASR systems.</dc:description>
   <dc:description>Peer Reviewed</dc:description>
   <dc:description>Postprint (published version)</dc:description>
   <dc:date>2013</dc:date>
   <dc:type>Conference report</dc:type>
   <dc:identifier>Shulz, H.; Fonollosa, José A. R. Modelling the effects of spontaneous speech in speech recognition. A: Speech Processing Conference. "2013 Speech Processing Conference: conference proceedings: July 1-2, 2013: AFEKA, Tel-Aviv Academic College of Engineering". Tel-Aviv: 2013.</dc:identifier>
   <dc:identifier>https://hdl.handle.net/2117/20204</dc:identifier>
   <dc:language>eng</dc:language>
   <dc:relation>https://events.eventact.com/afeka/aclp2012/Modelling%20the%20Effects%20of%20Spontaneous%20Speech%20in%20Speech%20Recognition_Schulz%20et%20al.pdf</dc:relation>
   <dc:rights>http://creativecommons.org/licenses/by-nc-nd/3.0/es/</dc:rights>
   <dc:rights>Open Access</dc:rights>
   <dc:rights>Attribution-NonCommercial-NoDerivs 3.0 Spain</dc:rights>
   <dc:format>application/pdf</dc:format>
</oai_dc:dc></metadata></record></GetRecord></OAI-PMH>