Instilling moral value alignment by means of multi-objective reinforcement learning

Rodriguez Soto, Manel; Serramià Amorós, Marc; López Sánchez, Maite; Rodríguez-Aguilar, Juan A. (Juan Antonio)

Instilling moral value alignment by means of multi-objective reinforcement learning

dc.contributor.author

Rodriguez Soto, Manel

dc.contributor.author

Serramià Amorós, Marc

dc.contributor.author

López Sánchez, Maite

dc.contributor.author

Rodríguez-Aguilar, Juan A. (Juan Antonio)

dc.date.issued

2023-02-01T09:10:35Z

dc.date.issued

2023-02-01T09:10:35Z

dc.date.issued

2022-01-24

dc.date.issued

2023-02-01T09:10:35Z

dc.identifier

1388-1957

dc.identifier

https://hdl.handle.net/2445/192920

dc.identifier

715848

dc.description.abstract

AI research is being challenged with ensuring that autonomous agents learn to behave ethically, namely in alignment with moral values. Here, we propose a novel way of tackling the value alignment problem as a two-step process. The first step consists on formalising moral values and value aligned behaviour based on philosophical foundations. Our formalisation is compatible with the framework of (Multi-Objective) Reinforcement Learning, to ease the handling of an agent's individual and ethical objectives. The second step consists in designing an environment wherein an agent learns to behave ethically while pursuing its individual objective. We leverage on our theoretical results to introduce an algorithm that automates our two-step approach. In the cases where value-aligned behaviour is possible, our algorithm produces a learning environment for the agent wherein it will learn a value-aligned behaviour.

dc.format

17 p.

dc.format

application/pdf

dc.language

eng

dc.publisher

Springer

dc.relation

Reproducció del document publicat a: https://doi.org/10.1007/s10676-022-09635-0

dc.relation

Ethics And Information Technology, 2022, vol. 24

dc.relation

https://doi.org/10.1007/s10676-022-09635-0

dc.rights

cc by (c) Manel Rodríguez Soto et al., 2022

dc.rights

http://creativecommons.org/licenses/by/3.0/es/

dc.rights

info:eu-repo/semantics/openAccess

dc.source

Articles publicats en revistes (Matemàtiques i Informàtica)

dc.subject

Intel·ligència artificial

dc.subject

Aprenentatge per reforç (Intel·ligència artificial)

dc.subject

Ètica

dc.subject

Aspectes morals

dc.subject

Artificial intelligence

dc.subject

Reinforcement learning

dc.subject

Ethics

dc.subject

Moral aspects

dc.title

Instilling moral value alignment by means of multi-objective reinforcement learning

dc.type

info:eu-repo/semantics/article

dc.type

info:eu-repo/semantics/publishedVersion

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

ISGlobal - Institut de Salut Global de Barcelona [60808]

Matemàtiques i Informàtica [1007]