2026-04-14T02:42:45Zhttps://recercat.cat/oai/request

oai:recercat.cat:10230/693092025-12-13T21:23:35Zcom_2072_6col_2072_452952

urn:hdl:10230/69309 Hierarchies of reward machines Furelos Blanco, Daniel Law, Mark Jonsson, Anders Broda, Krysia Russo, Alessandra Reward machines Hierarchies Reward machines (RMs) are a recent formalism for representing the reward function of a reinforcement learning task through a finite-state machine whose edges encode subgoals of the task using high-level events. The structure of RMs enables the decomposition of a task into simpler and independently solvable subtasks that help tackle longhorizon and/or sparse reward tasks. We propose a formalism for further abstracting the subtask structure by endowing an RM with the ability to call other RMs, thus composing a hierarchy of RMs (HRM). We exploit HRMs by treating each call to an RM as an independently solvable subtask using the options framework, and describe a curriculum-based method to learn HRMs from traces observed by the agent. Our experiments reveal that exploiting a handcrafted HRM leads to faster convergence than with a flat HRM, and that learning an HRM is feasible in cases where its equivalent flat representation is not. Anders Jonsson is partially funded by TAILOR, AGAUR SGR and Spanish grant PID2019-108141GB-I00 2025-01-27T13:54:20Z 2025-01-27T13:54:20Z 2023 info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion info:eu-repo/grantAgreement/ES/2PE/PID2019-108141GB-I00 Copyright 2023 by the author(s). info:eu-repo/semantics/openAccess PMLR