Universitat Politècnica de Catalunya. Doctorat en Intel·ligència Artificial
Institut de Robòtica i Informàtica Industrial
Universitat Politècnica de Catalunya. ROBiri - Grup de Percepció i Manipulació Robotitzada de l'IRI
2025
Cloth folding is a complex task due to the inevitable self-occlusions of clothes, their complicated dynamics, and the disparate materials, geometries, and textures that garments can have. In this work, we learn folding actions conditioned on text commands. Translating high-level, abstract instructions into precise robotic actions requires sophisticated language understanding and manipulation capabilities. To do that, we leverage a pre-trained vision-language model and repurpose it to predict manipulation actions. Our model, BiFold, can take context into account and achieves state-of-the-art performance on an existing language-conditioned folding benchmark. To address the lack of annotated bimanual folding data, we introduce a novel dataset with automatically parsed actions and language-aligned instructions, enabling better learning of text-conditioned manipulation. BiFold attains the best performance on our dataset and demonstrates strong generalization to new instructions, garments, and environments.
This work was funded by project SGR 00514 (Departament de Recerca i Universitats de la Generalitat de Catalunya) and CSIC project 202350E080 (ClothIRI). O.B. acknowledges travel support from ELISE (GA no 951847).
Peer Reviewed
Postprint (author's final draft)
Conference lecture
English
Àrees temàtiques de la UPC::Informàtica::Robòtica; Adaptation models; Visualization; Translation; Shape; Clothing; Benchmark testing; Robustness; Topology; Robots; Standards
Institute of Electrical and Electronics Engineers (IEEE)
https://ieeexplore.ieee.org/document/11127549
Open Access
E-prints [72986]