dc.contributor
Universitat Politècnica de Catalunya. Universitat de Barcelona
dc.contributor
Universitat Rovira i Virgili
dc.contributor
Universitat de Barcelona
dc.contributor
Wang, Ling
dc.contributor
Escalera Guerrero, Sergio
dc.contributor.author
Poniatowski, Kacper Krzysztof
dc.date.accessioned
2026-04-18T01:26:28Z
dc.date.available
2026-04-18T01:26:28Z
dc.date.issued
2026-01-28
dc.identifier
https://hdl.handle.net/2117/460744
dc.identifier.uri
https://hdl.handle.net/2117/460744
dc.description.abstract
Multi-camera player tracking is a fundamental prerequisite for advanced sports analytics, yet it remains a computationally challenging task due to frequent inter-player occlusions, rapid motion, and the visual homogeneity of team uniforms. This thesis presents a robust end-to-end pipeline for the detection and tracking of football players using a calibrated four-camera setup. The proposed system integrates state-of-the-art deep learning techniques with geometric computer vision. We employ a fine-tuned object detector paired with ByteTrack for local perception. To resolve the Multi-Dimensional Assignment (MDA) problem across views, we introduce a Hierarchical Divide- and-Conquer fusion strategy. Unlike naive greedy clustering approaches, this method utilises recursive bipartite matching with a multi-cue cost function incorporating position, velocity, shape, and colour histograms. Furthermore, a Temporal Hinting mechanism is implemented to recover player identities following extended occlusions or spatial discontinuities. Comparative evaluation against a greedy geometric baseline demonstrates substantial improvements in tracking accuracy, with the hierarchical approach achieving 0.844 GS-HOTA compared to 0.416 for the baseline-a 103% relative improvement. Comprehensive evaluation on held-out test sequences across temporal horizons from 5 to 45 minutes confirms exceptional detection stability, with detection accuracy (DetA) maintaining 93.6% and MOTA sustaining 97.0% regardless of sequence length. The system exhibits a consistent identity switch rate of approximately 110 switches per minute, demonstrating temporal stability without compounding drift. These results establish a strong foundation for automated game state reconstruction and tactical analysis in professional sports.
dc.format
application/pdf
dc.publisher
Universitat Politècnica de Catalunya
dc.subject
Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic
dc.subject
Computer vision
dc.subject
Machine learning
dc.subject
Computer vision
dc.subject
Visió per ordinador
dc.subject
Aprenentatge profund
dc.title
Deep learning-based player tracking in sports videos