The importance of data transformation in RNA-Seq preprocessing for bladder cancer subtyping

Other authors

Institut Català de la Salut

[Acedo-Terrades A, Perera-Bel J] Hospital del Mar Research Institute (HMRI), Barcelona, Spain. [Nonell L] Bioinformatics Unit, Vall d’Hebron Institute of Oncology (VHIO), Barcelona, Spain

Vall d'Hebron Barcelona Hospital Campus

Publication date

2025-04-01T07:42:31Z

2025-04-01T07:42:31Z

2025-02-10



Abstract

Bladder cancer; Molecular subtypes; RNA sequencing


Cáncer de vejiga; Subtipos moleculares; Secuenciación de ARN


Càncer de bufeta; Subtipus moleculars; Seqüenciació de l'ARN


Objective RNA-Seq provides an accurate quantification of gene expression levels and it is widely used for molecular subtype classification in cancer, with special importance in prognosis. However, the reliability and validity of these analyses can significantly be influenced by how data are processed. In this study we evaluate how RNA-Seq preprocessing methods influence molecular subtype classification in bladder cancer. By benchmarking various aligners, quantifiers and methods of normalization and transformation, we stress the importance of preprocessing choices for accurate and consistent subtype classification. Results Our findings highlight that log-transformation plays a crucial role in centroid-based classifiers such as consensusMIBC and TCGAclas, while distribution-free algorithms like LundTax offer robustness to preprocessing variations. Non log-transformed data resulted in low classification rates and poor agreement with reference classifications in consensusMIBC and TCGAclas classifiers. Additionally, LundTax consistently demonstrated better separation among subtypes, compared to consensusMIBC and TCGAclas, regardless of preprocessing methods. Nonetheless, the study is limited by the lack of a true reference for objective assessment of the accuracy of the assigned subtypes. Hence, future work will be necessary to determine the robustness and scalability of the obtained results.


The work was supported by the following grants and agencies: Project PI19/00004 and PI22/00171, funded by Instituto de Salud Carlos III (ISCIII) and co-funded by the European Union; a grant from FIS-ISCIII (FI20/00095), 2021SGR00042 by Generalitat de Catalunya.

Document Type

Article


Published version

Language

English

Publisher

BMC

Related items

BMC Research Notes;18

https://doi.org/10.1186/s13104-025-07138-x

Recommended citation

This citation was generated automatically.

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International

http://creativecommons.org/licenses/by-nc-nd/4.0/

This item appears in the following Collection(s)