Latent representation of H&E images retains biological information in a breast cancer cohort

Benmussa, Chloé; Sanfeliu, Esther; Martínez Romero, Anabel; González Farré, Blanca; Pascual, Tomás; Gavilá, Joaquín; Levy-Jurgenson, Alona; Shamir, Ariel; Brasó Maristany, Fara; Prat Aparicio, Aleix; Yakhini, Zohar

Latent representation of H&E images retains biological information in a breast cancer cohort

dc.contributor.author

Benmussa, Chloé

dc.contributor.author

Sanfeliu, Esther

dc.contributor.author

Martínez Romero, Anabel

dc.contributor.author

González Farré, Blanca

dc.contributor.author

Pascual, Tomás

dc.contributor.author

Gavilá, Joaquín

dc.contributor.author

Levy-Jurgenson, Alona

dc.contributor.author

Shamir, Ariel

dc.contributor.author

Brasó Maristany, Fara

dc.contributor.author

Prat Aparicio, Aleix

dc.contributor.author

Yakhini, Zohar

dc.date.issued

2026-01-23T17:40:18Z

dc.date.issued

2026-01-23T17:40:18Z

dc.date.issued

2025-09-25

dc.date.issued

2026-01-23T17:40:18Z

dc.identifier

1932-6203

dc.identifier

https://hdl.handle.net/2445/226083

dc.identifier

764305

dc.identifier

40997111

dc.description.abstract

Imaging technologies and staining based pathology are important components of common practice cancer care. Specifically, H&E imaging is standard for almost all cancer patients. Traditionally, H&E images can serve, when used by experienced trained pathologists, to infer important biological properties of the samples. Recent work demonstrated that machine learning and machine vision analysis of H&E images can further expand the scope of the inference. However, H&E images are high-resolution, making them difficult to analyze and possibly noisy. In this work, we propose an autoencoder-based pipeline that greatly reduces the dimension of the data representation while maintaining valuable properties. In particular, we investigate how different latent space dimensions affect bulk label predictions from H&E. We use autoencoders applied to image tiles as a tool in this investigation and also examine other information that may be inferred from image tiles. For example, we show classification results for tiles, such as Luminal A versus Luminal B, with an F1 score larger than 0.85. We also show that Ki67 levels can be inferred from H&E tiles, as shown before on other cohorts, and that inference is still possible when working with lower dimensional latent representations. The two main contributions of this paper are as follows. First, demonstrating that the use of image tiles can be informative, both at the global classification level, and, more importantly, to support the assessment of heterogeneity. Second, reasonably accurate inference can be performed with lower dimensional latent representations of the H&E images.

dc.format

17 p.

dc.format

application/pdf

dc.language

eng

dc.publisher

Public Library of Science (PLoS)

dc.relation

Reproducció del document publicat a: https://doi.org/10.1371/journal.pone.0329221

dc.relation

PLoS One, 2025, vol. 20, num.9

dc.relation

https://doi.org/10.1371/journal.pone.0329221

dc.rights

cc-by (c) Benmussa, Chloé et al., 2025

dc.rights

http://creativecommons.org/licenses/by/4.0/

dc.rights

info:eu-repo/semantics/openAccess

dc.subject

Càncer de mama

dc.subject

Diagnòstic per la imatge

dc.subject

Processament d'imatges

dc.subject

Breast cancer

dc.subject

Diagnostic imaging

dc.subject

Image processing

dc.title

Latent representation of H&E images retains biological information in a breast cancer cohort

dc.type

info:eu-repo/semantics/article

dc.type

info:eu-repo/semantics/publishedVersion

Fitxers en aquest element

Fitxers	Grandària	Format	Visualització
No hi ha fitxers associats a aquest element.

Aquest element apareix en la col·lecció o col·leccions següent(s)

IDIBAPS: Institut d'investigacions Biomèdiques August Pi i Sunyer [3164]

ISGlobal - Institut de Salut Global de Barcelona [60808]

Medicina [2834]