CODE-ACCORD: A corpus of building regulatory data for rule generation towards automatic compliance checking

dc.contributor
Universitat Ramon Llull. La Salle
dc.contributor
Lancaster University
dc.contributor
Birmingham City University
dc.contributor
Fraunhofer Institute for Building Physics IBP
dc.contributor
Jönköping University
dc.contributor
Institut Henri Fayol
dc.contributor
Université de Lorraine
dc.contributor.author
Hettiarachchi, Hansi
dc.contributor.author
Dridi, Amna
dc.contributor.author
Gaber, Mohamed
dc.contributor.author
Parsafard, Pouyan
dc.contributor.author
Bocaneala, Nicoleta
dc.contributor.author
Breitenfelder, Katja
dc.contributor.author
Costa, Gonçal
dc.contributor.author
Hedblom, Maria Magdalena
dc.contributor.author
JUGANARU-MATHIEU, Mihaela
dc.contributor.author
Mecharnia, Thamer
dc.contributor.author
park, sumee
dc.contributor.author
Tan, He
dc.contributor.author
Tawil, Abdel-Rahman
dc.contributor.author
Vakaj, Edlira
dc.date.accessioned
2025-10-04T05:13:34Z
dc.date.available
2025-10-04T05:13:34Z
dc.date.created
2024-07-01
dc.date.issued
2025-01-29
dc.identifier.issn
2052-4463
dc.identifier.uri
http://hdl.handle.net/20.500.14342/5562
dc.description.abstract
Automatic Compliance Checking (ACC) within the Architecture, Engineering, and Construction (AEC) sector necessitates automating the interpretation of building regulations to achieve its full potential. Converting textual rules into machine-readable formats is challenging due to the complexities of natural language and the scarcity of resources for advanced Machine Learning (ML). Addressing these challenges, we introduce CODE-ACCORD, a dataset of 862 sentences from the building regulations of England and Finland. Only the self-contained sentences, which express complete rules without needing additional context, were considered as they are essential for ACC. Each sentence was manually annotated with entities and relations by a team of 12 annotators to facilitate machine-readable rule generation, followed by careful curation to ensure accuracy. The final dataset comprises 4,297 entities and 4,329 relations across various categories, serving as a robust ground truth. CODE-ACCORD supports a range of ML and Natural Language Processing (NLP) tasks, including text classification, entity recognition, and relation extraction. It enables applying recent trends, such as deep neural networks and large language models, to ACC.
dc.format.extent
14 p.
dc.language.iso
eng
dc.publisher
Springer Nature
dc.relation.ispartof
Scientific Data, 12, 170 (2025)
dc.rights
© L'autor/a
dc.rights
Attribution 4.0 International
dc.rights.uri
http://creativecommons.org/licenses/by/4.0/
dc.subject
CODE-ACCORD
dc.subject
Arquitectura
dc.subject
Construcció
dc.title
CODE-ACCORD: A corpus of building regulatory data for rule generation towards automatic compliance checking
dc.type
info:eu-repo/semantics/article
dc.subject.udc
62
dc.subject.udc
620
dc.subject.udc
69
dc.subject.udc
72
dc.description.version
info:eu-repo/semantics/publishedVersion
dc.embargo.terms
cap
dc.identifier.doi
https://doi.org/10.1038/s41597-024-04320-x
dc.rights.accessLevel
info:eu-repo/semantics/openAccess


Ficheros en el ítem

FicherosTamañoFormatoVer

No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

La Salle [1048]