K-mers are used on a daily basis in bioinformatics. Although they have existed at the core of several popular tools for genome assembly for quite some time, until recently they have been woefully underutilized. Although k-mer counting is simple and straightforward, it becomes a real challenge when attempting to deal with the huge amounts of data generated in high-throughput sequencing. However, having a simple representation of the actual data with few degrees of freedom (i.e. the k-value and the 4 letters – when dealing with nucleotide sequences), does provide the perfect opportunity to investigate novel mixes of methods and techniques derived from various fields. In that context, the real challenge is to map the biological questions to a corresponding modelling approach. Such examples could be the application of Gödel numbering as a means of transforming the search space for sequence similarity, application of pruned trees and entropy for identifying novel features in sequences, and binning methods for metagenomics classification.
Conference report
English
Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors; High performance computing; Càlcul intensiu (Informàtica)
Barcelona Supercomputing Center
http://creativecommons.org/licenses/by-nc-nd/4.0/
Open Access
Attribution-NonCommercial-NoDerivatives 4.0 International
Congressos [11159]