dc.contributor.author
Carpes Martínez, Antonio Alberto
dc.date.accessioned
2025-11-07T20:19:32Z
dc.date.available
2025-11-07T20:19:32Z
dc.date.issued
2025-11-06T17:13:20Z
dc.date.issued
2025-11-06T17:13:20Z
dc.identifier
http://hdl.handle.net/10230/71797
dc.identifier.uri
http://hdl.handle.net/10230/71797
dc.description.abstract
Treball fi de màster de: Erasmus Mundus joint Master in Artificial Intelligence (EMAI)
dc.description.abstract
Supervisor: Alessandro De Luca
Co-Supervisor: Magí Dalmau Moreno
dc.description.abstract
This thesis presents a practical implementation of Instant Policy, an In-Context Imitation Learning (ICIL) model characterized by the rapid learning of new tasks, after processing a few number of demonstrations at inference time. The research evaluates how demonstration context modifications affect the model ability to understand and generalize manipulation behaviors using a Franka Emika Panda arm and Intel RealSense D435 camera integrated with Instant Policy, a state-of-the-art one-shot learning model. The core research systematically modifies demonstration buffers to analyze the model contextual reasoning capabilities across different pick-and-place scenarios. Besides, we deploy a modular pipeline that transforms RGB-D input into structured point clouds through YOLOv11-based segmentation, enabling object identification, demonstration extraction and model deployment at test time. To address gripper annotation challenges, we introduce an automated dataset creation methodology combining LangSAM for text-prompt-based segmentation and XMem++ for video mask propagation. The control architecture employs Instant Policy as a Denoising Diffusion Implicit Model, generating action sequences through graph-based reasoning over point clouds and demonstration context. Experimental results demonstrate successful adaptation of pick-and-place behaviors based on different demonstration contexts, with generalization across object pose and background variations. Performance analysis reveals critical dependencies on segmentation quality, highlighting robust perception requirements for real-world deployment. This work validates ICIL viability for robotic pick-and-place tasks, contributing insights into context understanding, automated dataset creation, and empirical validation of ICIL performance in unstructured manipulation scenarios.
dc.format
application/pdf
dc.rights
Llicència CC Reconeixement-NoComercial-SenseObraDerivada 4.0 Internacional (CC BY-NC-ND 4.0)
dc.rights
https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.rights
info:eu-repo/semantics/openAccess
dc.title
From perception to action: implementing in-context imitation learning on a franka robot for pick-and-place tasks
dc.type
info:eu-repo/semantics/masterThesis