<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="static/style.xsl"?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-04-13T07:15:44Z</responseDate><request verb="GetRecord" identifier="oai:www.recercat.cat:2117/328312" metadataPrefix="qdc">https://recercat.cat/oai/request</request><GetRecord><record><header><identifier>oai:recercat.cat:2117/328312</identifier><datestamp>2026-01-30T07:29:26Z</datestamp><setSpec>com_2072_1033</setSpec><setSpec>col_2072_452950</setSpec></header><metadata><qdc:qualifieddc xmlns:qdc="http://dspace.org/qualifieddc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:doc="http://www.lyncode.com/xoai" xsi:schemaLocation="http://purl.org/dc/elements/1.1/ http://dublincore.org/schemas/xmls/qdc/2006/01/06/dc.xsd http://purl.org/dc/terms/ http://dublincore.org/schemas/xmls/qdc/2006/01/06/dcterms.xsd http://dspace.org/qualifieddc/ http://www.ukoln.ac.uk/metadata/dcmi/xmlschema/qualifieddc.xsd">
   <dc:title>Managing failures in task-based parallel workflows in distributed computing environments</dc:title>
   <dc:creator>Ejarque, Jorge</dc:creator>
   <dc:creator>Bertran, Marta</dc:creator>
   <dc:creator>Álvarez Cid-Fuentes, Javier</dc:creator>
   <dc:creator>Conejero, Javier</dc:creator>
   <dc:creator>Badia Sala, Rosa Maria</dc:creator>
   <dc:subject>Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors</dc:subject>
   <dc:subject>Algorithms</dc:subject>
   <dc:subject>Workflow -- Software</dc:subject>
   <dc:subject>Failure management</dc:subject>
   <dc:subject>Scientific workflows</dc:subject>
   <dc:subject>Parallel programming</dc:subject>
   <dc:subject>Distributed computing</dc:subject>
   <dc:subject>Sistemes operatius distribuïts (Ordinadors)</dc:subject>
   <dc:subject>Processament en paral·lel (Ordinadors)</dc:subject>
   <dc:subject>Cicle de treball -- Programari</dc:subject>
   <dcterms:abstract>Current scientific workflows are large and complex. They normally perform thousands of simulations whose results combined with searching and data analytics algorithms, in order to infer new knowledge, generate a very large amount of data. To this end, workflows comprise many tasks and some of them may fail. Most of the work done about failure management in workflow managers and runtimes focuses on recovering from failures caused by resources (retrying or resubmitting the failed computation in other resources, etc.) However, some of these failures can be caused by the application itself (corrupted data, algorithms which are not converging for certain conditions, etc.), and these fault tolerance mechanisms are not sufficient to perform a successful workflow execution. In these cases, developers have to add some code in their applications to prevent and manage the possible failures. In this paper, we propose a simple interface and a set of transparent runtime mechanisms to simplify how scientists deal with application-based failures in task-based parallel workflows. We have validated our proposal with use-cases from e-science and machine learning to show the benefits of the proposed interface and mechanisms in terms of programming productivity and performance.</dcterms:abstract>
   <dcterms:abstract>This work has been supported by the Spanish Government (contracts SEV2015-0493 and TIN2015-65316-P), by the Generalitat de Catalunya (contract 2014-SGR-1051), and by the European Commission’s Horizon 2020 Framework program through BioExcel Center of Excellence (contracts 823830, and 675728). The research leading to these results has received funding from the collaboration between Fujitsu and BSC (Script Language Platform).</dcterms:abstract>
   <dcterms:abstract>Peer Reviewed</dcterms:abstract>
   <dcterms:abstract>Postprint (author's final draft)</dcterms:abstract>
   <dcterms:issued>2020</dcterms:issued>
   <dc:type>Part of book or chapter of book</dc:type>
   <dc:relation>https://link.springer.com/chapter/10.1007/978-3-030-57675-2_26</dc:relation>
   <dc:relation>info:eu-repo/grantAgreement/EC/H2020/823830/EU/BioExcel Centre of Excellence for ComputationalBiomolecular Research/BioExcel-2</dc:relation>
   <dc:relation>info:eu-repo/grantAgreement/EC/H2020/675728/EU/Centre of Excellence for Biomolecular Research/BioExcel</dc:relation>
   <dc:relation>https://doi.org/10.6084/m9.figshare.12556445</dc:relation>
   <dc:rights>Open Access</dc:rights>
   <dc:publisher>Springer, Cham</dc:publisher>
</qdc:qualifieddc></metadata></record></GetRecord></OAI-PMH>