SORS: LLMs for measurement in social science

Le Mens, Gaël

SORS: LLMs for measurement in social science

dc.contributor.author

Le Mens, Gaël

dc.date.accessioned

2026-02-11T01:27:54Z

dc.date.available

2026-02-11T01:27:54Z

dc.date.issued

2025-07-09

dc.identifier

Le Mens, G. SORS: LLMs for measurement in social science. A: Severo Ochoa Research Seminars at BSC. «10th Severo Ochoa Research Seminar Lectures at BSC, Barcelona, 2024-25». Barcelona: Barcelona Supercomputing Center, 2025, p. 171-172.

dc.identifier

https://hdl.handle.net/2117/454484

dc.identifier.uri

http://hdl.handle.net/2117/454484

dc.description.abstract

The seminar talk is based on some of my recent work that used LLMs for measurement of similarity in semantic spaces. I will report on using fine-tuned 'BERT' and pre-trained instruction-tuned LLMs (such as GPT-4, Meta Llama 3, or MiXtral) for measuring the typicality of text documents into concepts (tweets in political parties, books in literary genres) and for positioning text documents in policy and ideological spaces. I will also report on a systematic comparison of the performance of the most recent LLMs for these tasks and will outline a strategy for choosing among the available LLMs given the research objectives and constraints that pertain to a specific research project. The talk is based on the following recent papers and some on-going work: 1. Positioning Political Texts with Large Language Models by Asking and Averaging (with Aina Gallego): Using the recent LLMs (2023-2024) to position tweets, party manifestos, political speeches in multiple languages in ideological spaces. Includes a comparison of the performance of various models, including proprietary and open models. 2. Uncovering the Semantics of Concepts Using GPT-4 (with Balázs Kovács, Michael Hannan, & Guillem Pros). PNAS, 2023. Using GPT4- for measuring the typicality of books in literary genres and the typicality of tweets in political parties + comparison to other methods based on BERT, text embeddings and word embeddings. 3. Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality? (with Balázs Kovács, Michael Hannan, & Guillem Pros). Sociological Science. March 2023. Fine-tuning BERT for measuring the typicality of books in literary genres + comparisons with more standard NLP approaches.

dc.format

2 p.

dc.format

application/pdf

dc.language

eng

dc.publisher

Barcelona Supercomputing Center

dc.rights

http://creativecommons.org/licenses/by-nc-nd/4.0/

dc.rights

Open Access

dc.subject

Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors

dc.subject

High performance computing

dc.subject

Càlcul intensiu (Informàtica)

dc.title

SORS: LLMs for measurement in social science

dc.type

Conference report

Ficheros en el ítem

Ficheros	Tamaño	Formato	Ver
No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

Congressos [11156]