<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="static/style.xsl"?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-04-13T04:16:25Z</responseDate><request verb="GetRecord" identifier="oai:www.recercat.cat:2117/442287" metadataPrefix="qdc">https://recercat.cat/oai/request</request><GetRecord><record><header><identifier>oai:recercat.cat:2117/442287</identifier><datestamp>2026-01-28T06:59:18Z</datestamp><setSpec>com_2072_1033</setSpec><setSpec>col_2072_452950</setSpec></header><metadata><qdc:qualifieddc xmlns:qdc="http://dspace.org/qualifieddc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:doc="http://www.lyncode.com/xoai" xsi:schemaLocation="http://purl.org/dc/elements/1.1/ http://dublincore.org/schemas/xmls/qdc/2006/01/06/dc.xsd http://purl.org/dc/terms/ http://dublincore.org/schemas/xmls/qdc/2006/01/06/dcterms.xsd http://dspace.org/qualifieddc/ http://www.ukoln.ac.uk/metadata/dcmi/xmlschema/qualifieddc.xsd">
   <dc:title>BiFold: bimanual cloth folding with language guidance</dc:title>
   <dc:creator>Barbany Mayor, Oriol</dc:creator>
   <dc:creator>Colomé Figueras, Adrià</dc:creator>
   <dc:creator>Torras, Carme</dc:creator>
   <dc:subject>Àrees temàtiques de la UPC::Informàtica::Robòtica</dc:subject>
   <dc:subject>Adaptation models</dc:subject>
   <dc:subject>Visualization</dc:subject>
   <dc:subject>Translation</dc:subject>
   <dc:subject>Shape</dc:subject>
   <dc:subject>Clothing</dc:subject>
   <dc:subject>Benchmark testing</dc:subject>
   <dc:subject>Robustness</dc:subject>
   <dc:subject>Topology</dc:subject>
   <dc:subject>Robots</dc:subject>
   <dc:subject>Standards</dc:subject>
   <dcterms:abstract>Cloth folding is a complex task due to the inevitable self-occlusions of clothes, their complicated dynamics, and the disparate materials, geometries, and textures that garments can have. In this work, we learn folding actions conditioned on text commands. Translating high-level, abstract instructions into precise robotic actions requires sophisticated language understanding and manipulation capabilities. To do that, we leverage a pre-trained vision-language model and repurpose it to predict manipulation actions. Our model, BiFold, can take context into account and achieves state-of-the-art performance on an existing language-conditioned folding benchmark. To address the lack of annotated bimanual folding data, we introduce a novel dataset with automatically parsed actions and language-aligned instructions, enabling better learning of text-conditioned manipulation. BiFold attains the best performance on our dataset and demonstrates strong generalization to new instructions, garments, and environments.</dcterms:abstract>
   <dcterms:abstract>This work was funded by project SGR 00514 (Departament de Recerca i Universitats de la Generalitat de Catalunya) and CSIC project 202350E080 (ClothIRI). O.B. acknowledges travel support from ELISE (GA no 951847).</dcterms:abstract>
   <dcterms:abstract>Peer Reviewed</dcterms:abstract>
   <dcterms:abstract>Postprint (author's final draft)</dcterms:abstract>
   <dcterms:issued>2025</dcterms:issued>
   <dc:type>Conference lecture</dc:type>
   <dc:relation>https://ieeexplore.ieee.org/document/11127549</dc:relation>
   <dc:rights>Open Access</dc:rights>
   <dc:publisher>Institute of Electrical and Electronics Engineers (IEEE)</dc:publisher>
</qdc:qualifieddc></metadata></record></GetRecord></OAI-PMH>