Stereohoax: a multilingual corpus of racial hoaxes and social media reactions annotated for stereotypes

Schmeisser-Nieto, Wolfgang S.; Cignarella, Alessandra Teresa; Bourgeade, Tom; Frenda, Simona; Ariza Casabona, Alejandro; Laurent, Mario; Cicirelli, Paolo Giovanni; Marra, Andrea; Corbelli, Giuseppe; Benamara, Farah; Bosco, Cristina; Moriceau, Véronique; Paciello, Marinella; Taulé Delor, Mariona; D'Errico, Francesca

Stereohoax: a multilingual corpus of racial hoaxes and social media reactions annotated for stereotypes

dc.contributor.author

Schmeisser-Nieto, Wolfgang S.

dc.contributor.author

Cignarella, Alessandra Teresa

dc.contributor.author

Bourgeade, Tom

dc.contributor.author

Frenda, Simona

dc.contributor.author

Ariza Casabona, Alejandro

dc.contributor.author

Laurent, Mario

dc.contributor.author

Cicirelli, Paolo Giovanni

dc.contributor.author

Marra, Andrea

dc.contributor.author

Corbelli, Giuseppe

dc.contributor.author

Benamara, Farah

dc.contributor.author

Bosco, Cristina

dc.contributor.author

Moriceau, Véronique

dc.contributor.author

Paciello, Marinella

dc.contributor.author

Taulé Delor, Mariona

dc.contributor.author

D'Errico, Francesca

dc.date.issued

2025-03-06T18:09:23Z

dc.date.issued

2025-12-18T06:10:29Z

dc.date.issued

2024-12-19

dc.date.issued

2025-03-06T18:09:23Z

dc.identifier

1574-020X

dc.identifier

https://hdl.handle.net/2445/219516

dc.identifier

756918

dc.description.abstract

Stereotypes have been studied extensively in the felds of social psychology and, especially with the recent advances in technology, in computational linguistics. Stereotypes have also gained even more attention nowadays because of a notable rise in their dissemination due to demographic changes and world events. This paper focuses on ethnic stereotypes related to immigration and presents the StereoHoax corpus, a multilingual dataset of 17,814 tweets in French, Italian, and Spanish. The corpus includes conversational threads reporting on and responding to racial hoaxes about immigrants, which we defne as false claims of unlawful actions attributed to specifc ethnic groups. This work describes the data collection process and the fne-grained annotation scheme we used, which is based mainly on the Stereotype Content Model adapted to the study applied to immigrants of Bosco et al. (2023). Quantitative and qualitative analyses show the distribution and correlation of annotated categories across languages, revealing, for instance, intercultural diferences in the expression of stereotypes through forms of discredit. To validate our data, we performed four machine learning experiments using pre-trained BERT-like models in order to lay a foundation for automatic stereotype detection research. Leveraging the StereoHoax corpus, we gained crucial insights into the importance of context, especially in relation to the detection of implicit stereotypes. Overall, we believe that the StereoHoax corpus will prove to be a valuable resource for the automatic detection of stereotypes regarding immigrants and the study of the linguistic and psychological patterns associated with their dissemination.

dc.format

39 p.

dc.format

application/pdf

dc.language

eng

dc.publisher

Springer Verlag

dc.relation

Versió postprint del document publicat a: https://doi.org/https://doi.org/10.1007/s10579-024-09791-3

dc.relation

Language Resources And Evaluation, 2024

dc.relation

https://doi.org/https://doi.org/10.1007/s10579-024-09791-3

dc.rights

info:eu-repo/semantics/openAccess

dc.source

Articles publicats en revistes (Filologia Catalana i Lingüística General)

dc.subject

Migrants

dc.subject

Psicologia social

dc.subject

Migrants

dc.subject

Social psychology

dc.title

Stereohoax: a multilingual corpus of racial hoaxes and social media reactions annotated for stereotypes

dc.type

info:eu-repo/semantics/article

dc.type

info:eu-repo/semantics/acceptedVersion

Fitxers en aquest element

Fitxers	Grandària	Format	Visualització
No hi ha fitxers associats a aquest element.

Aquest element apareix en la col·lecció o col·leccions següent(s)

Filologia Catalana i Lingüística General [949]

ISGlobal - Institut de Salut Global de Barcelona [60808]