2026-04-17T21:41:41Zhttps://recercat.cat/oai/request

oai:recercat.cat:2117/4444292026-02-04T04:21:54Zcom_2072_1033col_2072_452950

A linguistic features-based approach for the functional analysis of disinformation in Spanish Puraivan Huenumás, Eduardo Riquelme Csori, Fabian Rolando Venegas Velásquez, René Universitat Politècnica de Catalunya. Departament de Ciències de la Computació Universitat Politècnica de Catalunya. ALBCOM - Algorísmia, Bioinformàtica, Complexitat i Mètodes Formals Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Llenguatge natural Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic Information disorder Disinformation Natural language processing Linguistic feature Machine learning Information disorder has significant negative impacts on contemporary societies. This study presents a hybrid methodology that combines machine learning and natural language processing to analyze corpora of disinformation texts in Spanish. The approach not only adapts linguistic features originally developed for English to another major but less researched language, but also incorporates 251 features organized into six categories, surpassing previous methods in both the number and organization of features. Applied to the CLNews dataset of Spanish rumors, the analysis identified 17 features with statistically significant differences between false and real rumors. Linguistic analysis reveals that false rumors are characterized by more emotional language, greater sentence fragmentation, frequent use of auxiliary verbs, and lower information density, which creates an appearance of detail. Additionally, using BERT, a large language model (LLM), five topics were identified among false rumors, each exhibiting different strategies in terms of fragmentation, grammatical complexity, and information density. Given the above, linguistic features were employed to develop machine learning classifiers, with a linear SVM achieving 86% accuracy. This methodology offers a replicable framework for future research on disinformation and text analysis in Spanish, enhancing the interpretability of results. The methodology shows that classical machine learning models trained on carefully chosen linguistic features can deliver competitive results, surpassing BETO (57%) and RoBERTa-BNE (64%) in accuracy on the CLNews dataset. Moreover, these models demonstrate strong performance when the same features are applied to a different dataset and continue to perform well when the feature selection is adjusted to fit the new context. This work was supported in part by the Project of Pluralismo through the National Agency for Research and Development (ANID), Chile, under Grant PLU230017. The work of Eduardo Puraivan was supported in part by the Escuela de Ingeniería Informática, Universidad de Valparaíso, Chile, under Grant 01.016/2020; and in part by the Beca de Doctorado Nacional, ANID, under Grant 21232242. Peer Reviewed 11.b - Per a 2020, augmentar substancialment el nombre de ciutats i assentaments humans que adopten i posen en marxa polítiques i plans integrats per promoure la inclusió, l’ús eficient dels recursos, la mitigació del canvi climàtic i l’adaptació a aquest, així com la resiliència davant dels desastres, i desenvolupar i posar en pràctica una gestió integral dels riscos de desastre a tots els nivells, d’acord amb el Marc de Sendai per a la reducció del risc de desastres 2015.2030 16 - Pau, Justícia i Institucions Sòlides 16.5 - Reduir substancialment totes les formes de corrupció i suborn 11 - Ciutats i Comunitats Sostenibles 16.7 - Garantir l’adopció de decisions inclusives, participatives i representatives que responguin a les necessitats a tots els nivells 16.10 - Garantir l’accés públic a la informació i protegir les llibertats fonamentals, de conformitat amb les lleis nacionals i els acords internacionals Postprint (published version) 2025 Article Puraivan, E.; Riquelme, F.; Venegas, R. A linguistic features-based approach for the functional analysis of disinformation in Spanish. «IEEE access», 2025, vol. 13, p. 140205-140222. 2169-3536 https://hdl.handle.net/2117/444429 10.1109/ACCESS.2025.3595750 eng https://ieeexplore.ieee.org/document/11112592 http://creativecommons.org/licenses/by/4.0/ Open Access Attribution 4.0 International 18 p. application/pdf Institute of Electrical and Electronics Engineers (IEEE)