A novel Spanish dataset for financial education text simplification targeting visually impaired individuals

Pérez-Rojas, Nelson; Calderón Ramírez, Saúl; Solís, Martín; Romero-Sandoval, Mario Alberto; Arias-Monge, Monica; Saggion, Horacio; Pérez-Rojas, Nelson; Calderón Ramírez, Saúl; Solís, Martín; Romero-Sandoval, Mario Alberto; Arias-Monge, Monica; Saggion, Horacio

A novel Spanish dataset for financial education text simplification targeting visually impaired individuals

To access the full text documents, please follow this link: https://hdl.handle.net/10230/70913

Author

Pérez-Rojas, Nelson

Calderón Ramírez, Saúl

Solís, Martín

Romero-Sandoval, Mario Alberto

Arias-Monge, Monica

Saggion, Horacio

Publication date

2025-07-16T07:21:39Z

2025

Abstract

Automatic Text Simplification (ATS) is a crucial task in natural language processing, aimed at making texts more comprehensible, particularly for specific groups such as individuals with visual impairments. One of the primary challenges in developing models for ATS is the scarcity of data, especially in Spanish. This manuscript introduces a novel dataset tailored for Spanish speakers with visual impairments, consisting of 5,314 pairs of original and simplified sentences created using established simplification rules. Additionally, we evaluate the feasibility of augmenting this dataset using large language models such as Generative Pre-training Transformer (GPT)-3, TUNER, and Multilingual T5 (mT5). We compare the simplifications generated by these models with our dataset to assess their effectiveness in data augmentation. The characteristics of our dataset and the findings from these comparisons are discussed in detail. The dataset is publicly available on Hugging Face at https://huggingface.co/datasets/saul1917/FEINA.

The work of Horacio Saggion was supported in part by the Maria de Maeztu Units of Excellence Program, funded by MCIN/AEI/10.13039/501100011033 under Grant CEX2021-001195-M; and in part by European Union’s Horizon Europe Research and Innovation Program through the iDEM Project under Grant 101132431.

Document Type

Article

Published version

Language

English

Subjects and keywords

Complexity theory; Measurement; Standards; Multilingual; Manuals; Guidelines; Benchmark testing; Annotations; Visualization; Tuners

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Related items

IEEE Access. 2025;13:87472-84

info:eu-repo/grantAgreement/EC/H2020/101132431

info:eu-repo/grantAgreement/ES/2PE/CEX2021-001195-M

Recommended citation

This citation was generated automatically.

Export

DIDL MARC MARC_CCUC METS OAI_DC ORE QDC RDF

Rights

http://creativecommons.org/licenses/by/4.0/

This item appears in the following Collection(s)

Recerca: articles, congressos, llibres [21034]

A novel Spanish dataset for financial education text simplification targeting visually impaired individuals

Author

Publication date

Share

Abstract

Document Type

Language

Subjects and keywords

Publisher

Related items

Recommended citation

Export

Rights

This item appears in the following Collection(s)