<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="static/style.xsl"?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-04-17T07:53:51Z</responseDate><request verb="GetRecord" identifier="oai:www.recercat.cat:2117/346221" metadataPrefix="marc">https://recercat.cat/oai/request</request><GetRecord><record><header><identifier>oai:recercat.cat:2117/346221</identifier><datestamp>2025-07-23T04:34:02Z</datestamp><setSpec>com_2072_1033</setSpec><setSpec>col_2072_452951</setSpec></header><metadata><record xmlns="http://www.loc.gov/MARC21/slim" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:doc="http://www.lyncode.com/xoai" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">
   <leader>00925njm 22002777a 4500</leader>
   <datafield ind2=" " ind1=" " tag="042">
      <subfield code="a">dc</subfield>
   </datafield>
   <datafield ind2=" " ind1=" " tag="720">
      <subfield code="a">Tura Vecino, Biel</subfield>
      <subfield code="e">author</subfield>
   </datafield>
   <datafield ind2=" " ind1=" " tag="260">
      <subfield code="c">2021-05</subfield>
   </datafield>
   <datafield ind2=" " ind1=" " tag="520">
      <subfield code="a">Institut de Robòtica i Informàtica Industrial</subfield>
   </datafield>
   <datafield ind2=" " ind1=" " tag="520">
      <subfield code="a">Recent research has shown that, in particular domains, unsupervised learning algorithms are achieving on par, or even better performance than fully supervised algorithms, avoiding the need of human labelled data. The division of a video into events has been an active research topic through unsupervised algorithms, exploiting relations in the video itself for a temporal segmentation task. In particular, self-supervised learning has shown to be very useful learning video representations without any annotations assigned to it. This thesis proposes a self-supervised method for learning event representations of unconstrained complex activity videos. These are sequences of images with high temporal resolution and with very small visual variance between events, with a clear semantic differentiation for humans. The assumption underlying the proposed model is that a video can be represented by a graph that encodes both semantic and temporal similarity between events. Our method follows two steps: first, meaningful initial features are extracted by a spatio-temporal backbone neural network trained on a self-supervised contrastive task. Then, starting with this initial embedding, low-dimensional graph-based event representation features are iteratively learned jointly with its underlying graph structure. The main contribution in this work is to learn a function parameterized by a graph neural network that learns graph-based event feature representations by exploiting the semantic and temporal relatedness through a fully end-to-end self-supervised trainable approach. Experiments were performed in the challenging \textit{Breakfast Action Dataset} and we show that the proposed approach leads to an effective low-dimensional feature representation of the input data, suitable for the downstream task of event segmentation. Moreover, we show that the presented method, followed by a downstream clustering task, achieves on par state-of-the-art metrics on video segmentation of complex activity videos.</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">Àrees temàtiques de la UPC::Matemàtiques i estadística</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">Artificial intelligence</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">Representation learning</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">Graph embedding</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">Video segmentation</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">Event representations</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">Intel·ligència artificial</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">Classificació AMS::68 Computer science::68T Artificial intelligence</subfield>
   </datafield>
   <datafield ind2="0" ind1="0" tag="245">
      <subfield code="a">Learning graph-based event representations for unconstrained video segmentation</subfield>
   </datafield>
</record></metadata></record></GetRecord></OAI-PMH>