2026-04-17T01:53:38Zhttps://recercat.cat/oai/request

oai:recercat.cat:2117/1038132025-07-17T08:51:59Zcom_2072_1033col_2072_452950

00925njm 22002777a 4500 dc Nadeu Camprubí, Climent author Macho, D author Hernando Pericás, Francisco Javier author 2000 All speech recognition systems require some form of signal representation that parametrically models the temporal evolution of the spectral envelope. Current parameterizations involve, either explicitly or implicitly, a set of energies from frequency bands which are often distributed in a mel scale. The computation of those filterbank energies (FBE) always includes smoothing of basic spectral measurements and non-linear amplitude compression. A variety of linear transformations are typically applied to this time-frequency representation prior to the Hidden Markov Model (HMM) pattern-matching stage of recognition. In the paper, we will discuss some robustness issues involved in both the computation of the FBEs and the posterior linear transformations, presenting alternative techniques that can improve robustness in additive noise conditions. In particular, the root non-linearity, a voicing-dependent FBE computation technique and a time&frequency filtering (tiffing) technique will be considered. Recognition results for the Aurora database will be shown to illustrate the potential application of these alternatives techniques for enhancing the robustness of speech recognition systems. Peer Reviewed Postprint (published version) Àrees temàtiques de la UPC::Enginyeria de la telecomunicació Telecommunication Telecomunicació Improving the robustness of the usual fbe-based asr front-end