From perception to action: implementing in-context imitation learning on a franka robot for pick-and-place tasks

Carpes Martínez, Antonio Alberto

From perception to action: implementing in-context imitation learning on a franka robot for pick-and-place tasks

dc.contributor.author

Carpes Martínez, Antonio Alberto

dc.date.accessioned

2025-11-07T20:19:32Z

dc.date.available

2025-11-07T20:19:32Z

dc.date.issued

2025-11-06T17:13:20Z

dc.date.issued

2025-11-06T17:13:20Z

dc.date.issued

2025

dc.identifier

http://hdl.handle.net/10230/71797

dc.identifier.uri

http://hdl.handle.net/10230/71797

dc.description.abstract

Treball fi de màster de: Erasmus Mundus joint Master in Artificial Intelligence (EMAI)

dc.description.abstract

Supervisor: Alessandro De Luca Co-Supervisor: Magí Dalmau Moreno

dc.description.abstract

This thesis presents a practical implementation of Instant Policy, an In-Context Imitation Learning (ICIL) model characterized by the rapid learning of new tasks, after processing a few number of demonstrations at inference time. The research evaluates how demonstration context modifications affect the model ability to understand and generalize manipulation behaviors using a Franka Emika Panda arm and Intel RealSense D435 camera integrated with Instant Policy, a state-of-the-art one-shot learning model. The core research systematically modifies demonstration buffers to analyze the model contextual reasoning capabilities across different pick-and-place scenarios. Besides, we deploy a modular pipeline that transforms RGB-D input into structured point clouds through YOLOv11-based segmentation, enabling object identification, demonstration extraction and model deployment at test time. To address gripper annotation challenges, we introduce an automated dataset creation methodology combining LangSAM for text-prompt-based segmentation and XMem++ for video mask propagation. The control architecture employs Instant Policy as a Denoising Diffusion Implicit Model, generating action sequences through graph-based reasoning over point clouds and demonstration context. Experimental results demonstrate successful adaptation of pick-and-place behaviors based on different demonstration contexts, with generalization across object pose and background variations. Performance analysis reveals critical dependencies on segmentation quality, highlighting robust perception requirements for real-world deployment. This work validates ICIL viability for robotic pick-and-place tasks, contributing insights into context understanding, automated dataset creation, and empirical validation of ICIL performance in unstructured manipulation scenarios.

dc.format

application/pdf

dc.language

eng

dc.rights

Llicència CC Reconeixement-NoComercial-SenseObraDerivada 4.0 Internacional (CC BY-NC-ND 4.0)

dc.rights

https://creativecommons.org/licenses/by-nc-nd/4.0/

dc.rights

info:eu-repo/semantics/openAccess

dc.subject

Aprenentatge

dc.title

From perception to action: implementing in-context imitation learning on a franka robot for pick-and-place tasks

dc.type

info:eu-repo/semantics/masterThesis

Ficheros en el ítem

Ficheros	Tamaño	Formato	Ver
No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

Treballs d'estudiants [4945]