Autor/a

Le Mens, Gaël

Data de publicació

2025-07-09



Resum

The seminar talk is based on some of my recent work that used LLMs for measurement of similarity in semantic spaces. I will report on using fine-tuned 'BERT' and pre-trained instruction-tuned LLMs (such as GPT-4, Meta Llama 3, or MiXtral) for measuring the typicality of text documents into concepts (tweets in political parties, books in literary genres) and for positioning text documents in policy and ideological spaces. I will also report on a systematic comparison of the performance of the most recent LLMs for these tasks and will outline a strategy for choosing among the available LLMs given the research objectives and constraints that pertain to a specific research project. The talk is based on the following recent papers and some on-going work: 1. Positioning Political Texts with Large Language Models by Asking and Averaging (with Aina Gallego): Using the recent LLMs (2023-2024) to position tweets, party manifestos, political speeches in multiple languages in ideological spaces. Includes a comparison of the performance of various models, including proprietary and open models. 2. Uncovering the Semantics of Concepts Using GPT-4 (with Balázs Kovács, Michael Hannan, & Guillem Pros). PNAS, 2023. Using GPT4- for measuring the typicality of books in literary genres and the typicality of tweets in political parties + comparison to other methods based on BERT, text embeddings and word embeddings. 3. Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality? (with Balázs Kovács, Michael Hannan, & Guillem Pros). Sociological Science. March 2023. Fine-tuning BERT for measuring the typicality of books in literary genres + comparisons with more standard NLP approaches.

Tipus de document

Conference report

Llengua

Anglès

Publicat per

Barcelona Supercomputing Center

Citació recomanada

Aquesta citació s'ha generat automàticament.

Drets

http://creativecommons.org/licenses/by-nc-nd/4.0/

Open Access

Aquest element apareix en la col·lecció o col·leccions següent(s)

Congressos [11156]