Approximate policy iteration using regularized Bellman residuals minimization

Inicio | ¿Qué es? | Contacto

English | Català

Consultar RECERCAT

Por comunidades y
colecciones Por fecha Por autores Por títulos Por temas (CDU)

Consultar departamento

Por fecha Por autores Por títulos Por temas (CDU)

Estadisticas

Del documento Todo RECERCAT

Mi RECERCAT

Entrar Alertas por correo-e

Directorio de otros repositorios

RECERCAT Principal > Universitat Politècnica de Catalunya > Documents de recerca > Visualizar documento

Para acceder a los documentos con el texto completo, por favor, siga el siguiente enlace: http://hdl.handle.net/2117/84681

dc.contributor	Universitat Politècnica de Catalunya. Departament de Ciències de la Computació
dc.contributor	Universitat Politècnica de Catalunya. KEMLG - Grup d'Enginyeria del Coneixement i Aprenentatge Automàtic
dc.contributor.author	Esposito, Gennaro
dc.contributor.author	Martín Muñoz, Mario
dc.date	2016
dc.identifier.citation	Esposito, G., Martin, M. Approximate policy iteration using regularized Bellman residuals minimization. "Journal of Experimental & Theoretical Artificial Intelligence", 2016, vol. 28, núm. 1-2, p. 3-12.
dc.identifier.citation	10.1080/0952813X.2015.1024494
dc.identifier.uri	http://hdl.handle.net/2117/84681
dc.language.iso	eng
dc.publisher	Taylor & Francis
dc.relation	http://www.tandfonline.com/doi/full/10.1080/0952813X.2015.1024494#.VS6nrJPcnv5
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial
dc.subject	Artificial intelligence
dc.subject	Reinforcement Learning
dc.subject	Support Vector Machine
dc.subject	Approximate Policy Iteration
dc.subject	Regularization
dc.subject	Regression
dc.subject	Intel·ligència artificial
dc.title	Approximate policy iteration using regularized Bellman residuals minimization
dc.type	info:eu-repo/semantics/submittedVersion
dc.type	info:eu-repo/semantics/article
dc.description.abstract	Reinforcement Learning (RL) provides a general methodology to solve complex uncertain decision problems, which are very challenging in many real-world applications. RL problem is modeled as a Markov Decision Process (MDP) deeply studied in the literature. We consider Policy Iteration (PI) algorithms for RL which iteratively evaluate and improve control policies. In handling problems with continuous states or in very large state spaces, generalization is mandatory. Generalization property of RL algorithms is an important factor to predict values for unexplored states. Candidates for value function approximation are Support Vector Regression (SVR) known to have good properties over the generalization ability. SVR has been used in batch frameworks in RL but, smart implementations of incremental exact SVR can extend SVR generalization ability to online RL where the expected reward from states change constantly with experience. Hence our online SVR is a novelty method which allows fast and good estimation of value function achieving RL objective very efficiently. Throughout simulation tests, the feasibility and usefulness of the proposed approach is demonstrated.
dc.description.abstract	Peer Reviewed

Mostrar el registro sencillo del ítem

Documentos relacionados

Otros documentos del mismo autor/a

Intersegmental synchronization of spontaneous cord dorsum potentials as a clinical parameter to evaluate changes in neuronal connectivity produced by peripheral nerve and spinal cord damage

Martín Muñoz, Mario; Chávez, Diógenes; Béjar Alonso, Javier; Esposito, Gennaro; Rodríguez, Érika; Cortés García, Claudio Ulises; Rudomín, Pablo

A randomized algorithm for the exact solution of transductive support vector machines

Esposito, Gennaro; Martín Muñoz, Mario

Markovian analysis reveals dynamic changes in the sequential behavior of dorsal horn neuronal activity induced by nociceptive stimulation

Martín Muñoz, Mario; Béjar Alonso, Javier; Esposito, Gennaro; Chávez, Diógenes; Contreras-Hernández, Enrique; Glusman, Silvio; Cortés García, Claudio Ulises; Rudomín, Pablo

Supraspinal modulation of neuronal synchronization by nociceptive stimulation induces an enduring reorganization of dorsal horn neuronal connectivity

Contreras Hernández, Enrique; Chávez, Diógenes; Hernández, E.; Béjar Alonso, Javier; Martín Muñoz, Mario; Cortés García, Claudio Ulises

A machine learning methodology for the selection and classification of spontaneous spinal cord dorsum potentials allows disclosure of structured (non-random) changes in neuronal connectivity induced by nociceptive stimulation

Martín Muñoz, Mario; Contreras-Hernández, Enrique; Béjar Alonso, Javier; Espósito, Gennaro; Chávez, Diógenes; Glusman, Silvio; Cortés García, Claudio Ulises; Rudomín, Pablo

Accesibilidad | Aviso legal | Política de Cookies | Documentos de uso interno

Coordinación

Patrocinio