Stereohoax: a multilingual corpus of racial hoaxes and social media reactions annotated for stereotypes

dc.contributor.author
Schmeisser-Nieto, Wolfgang S.
dc.contributor.author
Cignarella, Alessandra Teresa
dc.contributor.author
Bourgeade, Tom
dc.contributor.author
Frenda, Simona
dc.contributor.author
Ariza Casabona, Alejandro
dc.contributor.author
Laurent, Mario
dc.contributor.author
Cicirelli, Paolo Giovanni
dc.contributor.author
Marra, Andrea
dc.contributor.author
Corbelli, Giuseppe
dc.contributor.author
Benamara, Farah
dc.contributor.author
Bosco, Cristina
dc.contributor.author
Moriceau, Véronique
dc.contributor.author
Paciello, Marinella
dc.contributor.author
Taulé Delor, Mariona
dc.contributor.author
D'Errico, Francesca
dc.date.issued
2025-03-06T18:09:23Z
dc.date.issued
2025-12-18T06:10:29Z
dc.date.issued
2024-12-19
dc.date.issued
2025-03-06T18:09:23Z
dc.identifier
1574-020X
dc.identifier
https://hdl.handle.net/2445/219516
dc.identifier
756918
dc.description.abstract
Stereotypes have been studied extensively in the felds of social psychology and, especially with the recent advances in technology, in computational linguistics. Stereotypes have also gained even more attention nowadays because of a notable rise in their dissemination due to demographic changes and world events. This paper focuses on ethnic stereotypes related to immigration and presents the StereoHoax corpus, a multilingual dataset of 17,814 tweets in French, Italian, and Spanish. The corpus includes conversational threads reporting on and responding to racial hoaxes about immigrants, which we defne as false claims of unlawful actions attributed to specifc ethnic groups. This work describes the data collection process and the fne-grained annotation scheme we used, which is based mainly on the Stereotype Content Model adapted to the study applied to immigrants of Bosco et al. (2023). Quantitative and qualitative analyses show the distribution and correlation of annotated categories across languages, revealing, for instance, intercultural diferences in the expression of stereotypes through forms of discredit. To validate our data, we performed four machine learning experiments using pre-trained BERT-like models in order to lay a foundation for automatic stereotype detection research. Leveraging the StereoHoax corpus, we gained crucial insights into the importance of context, especially in relation to the detection of implicit stereotypes. Overall, we believe that the StereoHoax corpus will prove to be a valuable resource for the automatic detection of stereotypes regarding immigrants and the study of the linguistic and psychological patterns associated with their dissemination.
dc.format
39 p.
dc.format
application/pdf
dc.language
eng
dc.publisher
Springer Verlag
dc.relation
Versió postprint del document publicat a: https://doi.org/https://doi.org/10.1007/s10579-024-09791-3
dc.relation
Language Resources And Evaluation, 2024
dc.relation
https://doi.org/https://doi.org/10.1007/s10579-024-09791-3
dc.rights
(c) Springer Verlag, 2024
dc.rights
info:eu-repo/semantics/openAccess
dc.source
Articles publicats en revistes (Filologia Catalana i Lingüística General)
dc.subject
Migrants
dc.subject
Psicologia social
dc.subject
Migrants
dc.subject
Social psychology
dc.title
Stereohoax: a multilingual corpus of racial hoaxes and social media reactions annotated for stereotypes
dc.type
info:eu-repo/semantics/article
dc.type
info:eu-repo/semantics/acceptedVersion


Fitxers en aquest element

FitxersGrandàriaFormatVisualització

No hi ha fitxers associats a aquest element.

Aquest element apareix en la col·lecció o col·leccions següent(s)