Publication date

2025-07-09



Abstract

The seminar talk is based on some of my recent work that used LLMs for measurement of similarity in semantic spaces. I will report on using fine-tuned 'BERT' and pre-trained instruction-tuned LLMs (such as GPT-4, Meta Llama 3, or MiXtral) for measuring the typicality of text documents into concepts (tweets in political parties, books in literary genres) and for positioning text documents in policy and ideological spaces. I will also report on a systematic comparison of the performance of the most recent LLMs for these tasks and will outline a strategy for choosing among the available LLMs given the research objectives and constraints that pertain to a specific research project. The talk is based on the following recent papers and some on-going work: 1. Positioning Political Texts with Large Language Models by Asking and Averaging (with Aina Gallego): Using the recent LLMs (2023-2024) to position tweets, party manifestos, political speeches in multiple languages in ideological spaces. Includes a comparison of the performance of various models, including proprietary and open models. 2. Uncovering the Semantics of Concepts Using GPT-4 (with Balázs Kovács, Michael Hannan, & Guillem Pros). PNAS, 2023. Using GPT4- for measuring the typicality of books in literary genres and the typicality of tweets in political parties + comparison to other methods based on BERT, text embeddings and word embeddings. 3. Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality? (with Balázs Kovács, Michael Hannan, & Guillem Pros). Sociological Science. March 2023. Fine-tuning BERT for measuring the typicality of books in literary genres + comparisons with more standard NLP approaches.

Document Type

Conference report

Language

English

Publisher

Barcelona Supercomputing Center

Recommended citation

This citation was generated automatically.

Rights

http://creativecommons.org/licenses/by-nc-nd/4.0/

Open Access

This item appears in the following Collection(s)

Congressos [11156]