<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="static/style.xsl"?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-04-13T02:47:58Z</responseDate><request verb="GetRecord" identifier="oai:www.recercat.cat:2117/346628" metadataPrefix="qdc">https://recercat.cat/oai/request</request><GetRecord><record><header><identifier>oai:recercat.cat:2117/346628</identifier><datestamp>2026-01-19T02:19:27Z</datestamp><setSpec>com_2072_1033</setSpec><setSpec>col_2072_452949</setSpec></header><metadata><qdc:qualifieddc xmlns:qdc="http://dspace.org/qualifieddc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:doc="http://www.lyncode.com/xoai" xsi:schemaLocation="http://purl.org/dc/elements/1.1/ http://dublincore.org/schemas/xmls/qdc/2006/01/06/dc.xsd http://purl.org/dc/terms/ http://dublincore.org/schemas/xmls/qdc/2006/01/06/dcterms.xsd http://dspace.org/qualifieddc/ http://www.ukoln.ac.uk/metadata/dcmi/xmlschema/qualifieddc.xsd">
   <dc:title>Adaptive optics control with reinforcement learning: first steps</dc:title>
   <dc:creator>Pou Mulet, Bartomeu</dc:creator>
   <dc:creator>Quiñones, Eduardo</dc:creator>
   <dc:creator>Martín Muñoz, Mario</dc:creator>
   <dc:subject>Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors</dc:subject>
   <dc:subject>High performance computing</dc:subject>
   <dc:subject>Reinforcement Learning</dc:subject>
   <dc:subject>Adaptive Optics</dc:subject>
   <dc:subject>Nonlinear Control</dc:subject>
   <dc:subject>Machine Learning</dc:subject>
   <dc:subject>Càlcul intensiu (Informàtica)</dc:subject>
   <dcterms:abstract>When planar wavefronts from distant stars traverse the&#xd;
atmosphere, they become distorted due to the atmosphere’s inhomogeneous&#xd;
temperature distribution. Adaptive Optics (AO)&#xd;
is the field in charge of correcting those distortions allowing&#xd;
high-quality observations of distant targets. The AO solution&#xd;
is composed of three main components: a deformable mirror&#xd;
(DM) that corrects the deformation in the wavefront, a&#xd;
wavefront sensor (WFS) that allows characterising the current&#xd;
turbulence in the wavefront and a real time controller (RTC)&#xd;
that issues commands to, via the deformation of the DM,&#xd;
correct the wavefront. Usually, the operations are performed&#xd;
on closed-loop with stringent real-time requirements (in the&#xd;
order of 103 􀀀 104 actions per second). At each iteration, the&#xd;
WFS observes the wavefront after being corrected by the DM&#xd;
and the RTC issues the commands to correct for the evolution&#xd;
of turbulence and previous uncorrected errors (Figure 1 left).&#xd;
One of the primary sources of error for an AO control&#xd;
algorithm is the temporal error. The delay between characterising&#xd;
the turbulence with the WFS and setting the desired&#xd;
commands in the DM creates the need that any successful&#xd;
control approach must take into account past commands and&#xd;
the probable evolution of the atmosphere in this gap of time.&#xd;
To do that, the most common approach in AO are variants&#xd;
of Linear Quadratic Gaussian (LQG) with Kalman filters with&#xd;
one of its initial iterations presented in [1]. Usually, a linear&#xd;
model of the system’s evolution is built with a set of parameters&#xd;
that are usually fitted based on observations or on theoretical&#xd;
assumptions, which limits the capability of the system to&#xd;
correct the turbulence.&#xd;
In this paper, we present a novel solution based on Reinforcement&#xd;
Learning (RL), based on a reward signal to be&#xd;
optimised, that does not need any previously built model (as&#xd;
LQG) and is non-linear. RL has been already applied in the&#xd;
domain of AO, however, it has been limited to WFS-less&#xd;
systems (e.g. [2]) or, more recently, to control a very limited&#xd;
number of actuators [3]. This work’s main practical objective&#xd;
is to be applied in the 8.2 m Subaru telescope (located in&#xd;
Hawaii), which includes thousands of actuators.&#xd;
B. AO Control: Integrator with gain</dcterms:abstract>
   <dcterms:issued>2021-05</dcterms:issued>
   <dc:type>Conference report</dc:type>
   <dc:rights>Open Access</dc:rights>
   <dc:publisher>Barcelona Supercomputing Center</dc:publisher>
</qdc:qualifieddc></metadata></record></GetRecord></OAI-PMH>