NewsCom-TOX: A corpus of comments on news articles annotated for toxicity in Spanish

dc.contributor.author
Taulé Delor, Mariona
dc.contributor.author
Nofre, Montserrat
dc.contributor.author
Bargiela, Víctor
dc.contributor.author
Bonet, Xavier
dc.date.issued
2025-04-02T16:11:33Z
dc.date.issued
2025-04-02T16:11:33Z
dc.date.issued
2024-01-17
dc.date.issued
2025-04-02T16:11:33Z
dc.identifier
1574-020X
dc.identifier
https://hdl.handle.net/2445/220216
dc.identifier
741843
dc.description.abstract
In this article, we present the NewsCom-TOX corpus, a new corpus manually annotated for toxicity in Spanish. NewsCom-TOX consists of 4359 comments in Spanish posted in response to 21 news articles on social media related to immigration, in order to analyse and identify messages with racial and xenophobic content. This corpus is multi-level annotated with different binary linguistic categories -stance, target, stereotype, sarcasm, mockery, insult, improper language, aggressiveness and intolerance- taking into account not only the information conveyed in each comment, but also the whole discourse thread in which the comment occurs, as well as the information conveyed in the news article, including their images. These categories allow us to identify the presence of toxicity and its intensity, that is, the level of toxicity of each comment. All this information is available for research purposes upon request. Here we describe the NewsCom-TOX corpus, the annotation tagset used, the criteria applied and the annotation process carried out, including the inter-annotator agreement tests conducted. A quantitative analysis of the results obtained is also provided. NewsCom-TOX is a linguistic resource that will be valuable for both linguistic and computational research in Spanish in NLP tasks for the detection of toxic information.
dc.format
41 p.
dc.format
application/pdf
dc.format
application/pdf
dc.language
eng
dc.publisher
Springer Verlag
dc.relation
Versió postprint del document publicat a: https://doi.org/10.1007/s10579-023-09711-x
dc.relation
Language Resources And Evaluation, 2023, num.58, p. 1115-1155
dc.relation
https://doi.org/10.1007/s10579-023-09711-x
dc.rights
(c) Springer Verlag, 2023
dc.rights
info:eu-repo/semantics/openAccess
dc.source
Articles publicats en revistes (Filologia Catalana i Lingüística General)
dc.subject
Telenotícies
dc.subject
Fake news
dc.subject
Castellà (Llengua)
dc.subject
Corpus (Lingüística)
dc.subject
Television broadcasting of news
dc.subject
Fake news
dc.subject
Spanish language
dc.subject
Corpora (Linguistics)
dc.title
NewsCom-TOX: A corpus of comments on news articles annotated for toxicity in Spanish
dc.type
info:eu-repo/semantics/article
dc.type
info:eu-repo/semantics/acceptedVersion


Fitxers en aquest element

FitxersGrandàriaFormatVisualització

No hi ha fitxers associats a aquest element.

Aquest element apareix en la col·lecció o col·leccions següent(s)