<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="static/style.xsl"?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-04-19T11:08:29Z</responseDate><request verb="GetRecord" identifier="oai:www.recercat.cat:2117/423489" metadataPrefix="mets">https://recercat.cat/oai/request</request><GetRecord><record><header><identifier>oai:recercat.cat:2117/423489</identifier><datestamp>2026-01-09T11:42:52Z</datestamp><setSpec>com_2072_1033</setSpec><setSpec>col_2072_452950</setSpec></header><metadata><mets xmlns="http://www.loc.gov/METS/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:doc="http://www.lyncode.com/xoai" ID="&#xa;&#x9;&#x9;&#x9;&#x9;DSpace_ITEM_2117-423489" TYPE="DSpace ITEM" PROFILE="DSpace METS SIP Profile 1.0" xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd" OBJID="&#xa;&#x9;&#x9;&#x9;&#x9;hdl:2117/423489">
   <metsHdr CREATEDATE="2026-04-19T13:08:29Z">
      <agent ROLE="CUSTODIAN" TYPE="ORGANIZATION">
         <name>RECERCAT</name>
      </agent>
   </metsHdr>
   <dmdSec ID="DMD_2117_423489">
      <mdWrap MDTYPE="MODS">
         <xmlData xmlns:mods="http://www.loc.gov/mods/v3" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-1.xsd">
            <mods:mods xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-1.xsd">
               <mods:name>
                  <mods:role>
                     <mods:roleTerm type="text">author</mods:roleTerm>
                  </mods:role>
                  <mods:namePart>Giner Miguelez, Joan</mods:namePart>
               </mods:name>
               <mods:name>
                  <mods:role>
                     <mods:roleTerm type="text">author</mods:roleTerm>
                  </mods:role>
                  <mods:namePart>Gómez, Abel</mods:namePart>
               </mods:name>
               <mods:name>
                  <mods:role>
                     <mods:roleTerm type="text">author</mods:roleTerm>
                  </mods:role>
                  <mods:namePart>Cabot, Jordi</mods:namePart>
               </mods:name>
               <mods:originInfo>
                  <mods:dateIssued encoding="iso8601">2025</mods:dateIssued>
               </mods:originInfo>
               <mods:identifier type="none"/>
               <mods:abstract>To ensure the fairness and trustworthiness of machine learning (ML) systems, recent legislative initiatives and relevant research in the ML community have pointed out the need to document the data used to train ML models. Besides, data-sharing practices in many scientific domains have evolved in recent years for reproducibility purposes. In this sense, academic institutions’ adoption of these practices has encouraged researchers to publish their data and technical documentation in peer-reviewed publications such as data papers. In this study, we analyze how this broader scientific data documentation meets the needs of the ML community and regulatory bodies for its use in ML technologies. We examine a sample of 4041 data papers of different domains, assessing their coverage and trends in the requested dimensions and comparing them to those from an ML-focused venue (NeurIPS D&amp;B), which publishes papers describing datasets. As a result, we propose a set of recommendation guidelines for data creators and scientific data publishers to increase their data’s preparedness for its transparent and fairer use in ML technologies.This research has been partially supported by the Spanish government (LOCOSS - PID2020-114615RB-I00), the AIDOaRt project, which has received funding from the ECSEL Joint Undertaking (JU) under grant agreement No 101007350. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and Sweden, Austria, Czech Republic, Finland, France, Italy, and Spain. Jordi Cabot is supported by the Luxembourg National Research Fund (FNR) PEARL program, grant agreement 16544475.Peer ReviewedPostprint (published version)</mods:abstract>
               <mods:language>
                  <mods:languageTerm authority="rfc3066"/>
               </mods:language>
               <mods:accessCondition type="useAndReproduction">http://creativecommons.org/licenses/by-nc-nd/4.0/ Open Access Attribution-NonCommercial-NoDerivatives 4.0 International</mods:accessCondition>
               <mods:subject>
                  <mods:topic>Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic</mods:topic>
               </mods:subject>
               <mods:subject>
                  <mods:topic>ML technologies</mods:topic>
               </mods:subject>
               <mods:subject>
                  <mods:topic>Machine learning systems</mods:topic>
               </mods:subject>
               <mods:subject>
                  <mods:topic>Training data</mods:topic>
               </mods:subject>
               <mods:titleInfo>
                  <mods:title>On the Readiness of Scientific Data Papers for a Fair and Transparent Use in Machine Learning</mods:title>
               </mods:titleInfo>
               <mods:genre>Article</mods:genre>
            </mods:mods>
         </xmlData>
      </mdWrap>
   </dmdSec>
   <structMap LABEL="DSpace Object" TYPE="LOGICAL">
      <div TYPE="DSpace Object Contents" ADMID="DMD_2117_423489"/>
   </structMap>
</mets></metadata></record></GetRecord></OAI-PMH>