Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
Universitat Politècnica de Catalunya. SPCOM - Processament del Senyal i Comunicacions
2025-10-30
Scientific document for advising on the programming of Hidden Markov Model processes with large-scale short-sequence datasets.
In the classical setting, the training of a Hidden Markov Model (HMM) typically relies on a single, sufficiently long observation sequence that can be regarded as representative of the underlying stochastic process. In this context, the Expectation Maximization (EM) algorithm is applied in its specialized form for HMMs, namely the Baum Welch algorithm, which has been extensively employed in applications such as speech recognition. The objective of this work is to present pseudocode formulations for both the training and decoding procedures of HMMs in a different scenario, where the available data consist of multiple independent temporal sequences generated by the same model, each of relatively short duration, i.e., containing only a limited number of samples. Special emphasis is placed on the relevance of this formulation to longitudinal studies in population health, where datasets are naturally structured as collections of short trajectories across individuals with point data at follow up.
Preprint
External research report
English
Àrees temàtiques de la UPC::Informàtica::Informàtica teòrica::Algorísmica i teoria de la complexitat; Àrees temàtiques de la UPC::Informàtica::Aplicacions de la informàtica::Bioinformàtica; HMM; Baum-Welch; Longitudional population tracking
http://creativecommons.org/licenses/by-nc-sa/4.0/
Open Access
E-prints [72263]