2026-04-14T04:10:19Zhttps://recercat.cat/oai/request

oai:recercat.cat:2117/3480652026-03-02T05:32:57Zcom_2072_1033col_2072_452951

Clustering and topic modeling for biomedical text mining Rognon, Paul Joris Denis Universitat de Barcelona. Departament de Genètica, Microbiologia i Estadística Reverter Comes, Ferran Vegas Lozano, Esteban Àrees temàtiques de la UPC::Matemàtiques i estadística::Estadística matemàtica Statistical Mathematics -- Applications Text mining Document clustering Topic modeling Word embeddings Biomedical text mining Estadística matemàtica--Aplicacions Classificació AMS::62 Statistics::62P Applications In this work, we study the problem of characterizing an unlabelled corpus of biomedical documents in an unsupervised manner. After a review of the literature on the subject, we propose an integrative approach to the problem. The integration is twofold. On one hand, we integrate, with multiview learning, different text representations derived from a traditional bag-of-words model, Latent Dirichlet Allocation, and a recurrent neural autoencoder. On the other hand, we integrate topic modeling outputs, clustering outputs and biomedical word embeddings to generate an intuitive and comprehensive characterization of the corpus. We also propose a semantic graph that supplies a synthetic visualization of the relationships between topics, clusters, and any other biomedical concept, based on semantic similarity. An application to the CORD-19 dataset, a collection of articles on COVID-19, shows our methodology produces a coherent, meaningful, and informative characterization of the corpus. 2021-06 Master thesis https://hdl.handle.net/2117/348065 FME-2098 eng Restricted access - author's decision application/pdf Universitat Politècnica de Catalunya Universitat de Barcelona