Standardised and reproducible phenotyping using distributed analytics and tools in the Data Analysis and Real World Interrogation Network (DARWIN EU)

dc.contributor.author
Dernie, Francesco
dc.contributor.author
Leis Machín, Angela 1974-
dc.contributor.author
Ramírez Anguita, Juan Manuel
dc.contributor.author
Mayer, Miguel Ángel, 1960-
dc.contributor.author
Prats Uribe, Albert
dc.date.accessioned
2026-01-28T13:29:38Z
dc.date.available
2026-01-28T13:29:38Z
dc.date.issued
2026-01-27T13:50:49Z
dc.date.issued
2026-01-27T13:50:49Z
dc.date.issued
2024
dc.date.issued
2026-01-27T13:50:49Z
dc.identifier
Dernie F, Corby G, Robinson A, Bezer J, Mercade-Besora N, Griffier R, Verdy G, Leis A, Ramirez-Anguita JM, Mayer MA, Brash JT, Seager S, Parry R, Jodicke A, Duarte-Salles T, Rijnbeek PR, Verhamme K, Pacurariu A, Morales D, Pinheiro L, Prieto-Alhambra D, Prats-Uribe A. Standardised and reproducible phenotyping using distributed analytics and tools in the Data Analysis and Real World Interrogation Network (DARWIN EU). Pharmacoepidemiol Drug Saf. 2024;33(11):e70042. DOI: 10.1002/pds.70042
dc.identifier
1053-8569
dc.identifier
https://hdl.handle.net/10230/72370
dc.identifier
http://dx.doi.org/10.1002/pds.70042
dc.identifier.uri
http://hdl.handle.net/10230/72370
dc.description.abstract
Purpose: The generation of representative disease phenotypes is important for ensuring the reliability of the findings of observational studies. The aim of this manuscript is to outline a reproducible framework for reliable and traceable phenotype generation based on real world data for use in the Data Analysis and Real-World Interrogation Network (DARWIN EU). We illustrate the use of this framework by generating phenotypes for two diseases: pancreatic cancer and systemic lupus erythematosus (SLE). Methods: The phenotyping process involves a 14-steps process based on a standard operating procedure co-created by the DARWIN EU Coordination Centre in collaboration with the European Medicines Agency. A number of bespoke R packages were utilised to generate and review codelists for two phenotypes based on real world data mapped to the OMOP Common Data Model. Results: Codelists were generated for both pancreatic cancer and SLE, and cohorts were generated in six OMOP-mapped databases. Diagnostic checks were performed, which showed these cohorts had broadly similar incidence and prevalence figures to previously published literature, despite significant inter-database variability. Co-occurrent symptoms, conditions, and medication use were in keeping with pre-specified clinical descriptions based on previous knowledge. Conclusions: Our detailed phenotyping process makes use of bespoke tools and allows for comprehensive codelist generation and review, as well as large-scale exploration of the characteristics of the resulting cohorts. Wider use of structured and reproducible phenotyping methods will be important in ensuring the reliability of observational studies for regulatory purposes.
dc.format
application/pdf
dc.format
application/pdf
dc.language
eng
dc.publisher
Wiley
dc.relation
Pharmacoepidemiology and Drug Safety. 2024;33(11):e70042
dc.rights
© 2024 The Author(s). Pharmacoepidemiology and Drug Safety published by John Wiley & Sons Ltd. This is an open access article under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
dc.rights
http://creativecommons.org/licenses/by/4.0/
dc.rights
info:eu-repo/semantics/openAccess
dc.subject
Pancreatic cancer
dc.subject
Phenotyping
dc.subject
Systemic lupus erythematosus
dc.title
Standardised and reproducible phenotyping using distributed analytics and tools in the Data Analysis and Real World Interrogation Network (DARWIN EU)
dc.type
info:eu-repo/semantics/article
dc.type
info:eu-repo/semantics/publishedVersion


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)