Abstract:
|
Mass media globalization introduces the challenge of multilingualism into most popular speech applications such as text-tospeech synthesis and automatic speech recognition. In Spain as well as in the other countries, the usage of English words is
rapidly growing, however due to the linguistic diversity of the languages spoken across the country, Spanish is not less influenced by inclusions from the four official languages. This work is focused on the pronunciation of Catalan inclusions in Spanish utterances. Our goal was to approach the nativization phenomenon by data-driven methods, making it easily transferable
to other languages without loss in performance. For this particular task, training and test nativization corpora were manually crafted and the task itself was approached using pronunciation by analogy. The results were encouraging and showed that even small corpus of 1000 words allows to capture the analogy in the nativization process. The resulting pronunciations allowed significant improvements in the ntelligibility of Catalan inclusions
in Spanish utterances. |