dc.identifier
Turakhia, Y. SORS: Algorithms, software, and hardware accelerators for the next wave of genomic data. A: Severo Ochoa Research Seminars at BSC. «10th Severo Ochoa Research Seminar Lectures at BSC, Barcelona, 2024-25». Barcelona: Barcelona Supercomputing Center, 2025, p. 154-155.
dc.description.abstract
In this talk, I will discuss how emerging fields such as pangenomics,
pathogen surveillance, wastewater epidemiology, comparative
genomics, and metagenomics are resulting in new waves of genomic
data and applications. I will also discuss the various computational and
storage challenges this data presents and how at Turakhia lab, we are
using a combination of new algorithms, software, FPGA, GPU, and
high-performance computing (HPC) solutions to address them.
In pangenomics, we introduced PanMAN, a compact and unified data
representation that integrates phylogeny, mutational history, genomic
variation, and whole-genome alignments—making it the first of its
kind. PanMAN was used to construct the largest pangenome for SARSCoV-
2 currently available, of over 8 million sequences, which requires
only 366MB of disk space. This was enabled in part by TWILIGHT,
our GPU-accelerated multiple sequence aligner that offers orders-ofmagnitude
speedups and scales far beyond existing tools.
For pathogen surveillance, we developed the UShER toolkit, which
enabled real-time SARS-CoV-2 genomic surveillance and
epidemiological research at a global scale during the COVID-19
pandemic, and has contributed to the designation of over 4,000 lineages.
Building on UShER, we recently created WEPP, a novel HPC tool that
significantly enhances the resolution and timeliness of wastewaterbased
epidemiology, and is enabling powerful new applications.
In comparative genomics, we developed ROADIES, an HPC software
that fully automates accurate species tree inference from raw genome
assemblies. ROADIES is transforming large-scale phylogenetic studies
and is currently being used to analyze assemblies from the Vertebrate
Genomes Project (VGP). Lastly, I will share my vision for
how hardware accelerators can drive the next wave of innovation in
bioinformatics, including some of our work based on high-level
synthesis.