Publication date

2022-09-29



Abstract

K-mers are used on a daily basis in bioinformatics. Although they have existed at the core of several popular tools for genome assembly for quite some time, until recently they have been woefully underutilized. Although k-mer counting is simple and straightforward, it becomes a real challenge when attempting to deal with the huge amounts of data generated in high-throughput sequencing. However, having a simple representation of the actual data with few degrees of freedom (i.e. the k-value and the 4 letters – when dealing with nucleotide sequences), does provide the perfect opportunity to investigate novel mixes of methods and techniques derived from various fields. In that context, the real challenge is to map the biological questions to a corresponding modelling approach. Such examples could be the application of Gödel numbering as a means of transforming the search space for sequence similarity, application of pruned trees and entropy for identifying novel features in sequences, and binning methods for metagenomics classification.

Document Type

Conference report

Language

English

Publisher

Barcelona Supercomputing Center

Recommended citation

This citation was generated automatically.

Rights

http://creativecommons.org/licenses/by-nc-nd/4.0/

Open Access

Attribution-NonCommercial-NoDerivatives 4.0 International

This item appears in the following Collection(s)

Congressos [11159]