<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="static/style.xsl"?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-04-13T04:55:58Z</responseDate><request verb="GetRecord" identifier="oai:www.recercat.cat:2117/346628" metadataPrefix="didl">https://recercat.cat/oai/request</request><GetRecord><record><header><identifier>oai:recercat.cat:2117/346628</identifier><datestamp>2026-01-19T02:19:27Z</datestamp><setSpec>com_2072_1033</setSpec><setSpec>col_2072_452949</setSpec></header><metadata><d:DIDL xmlns:d="urn:mpeg:mpeg21:2002:02-DIDL-NS" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:doc="http://www.lyncode.com/xoai" xsi:schemaLocation="urn:mpeg:mpeg21:2002:02-DIDL-NS http://standards.iso.org/ittf/PubliclyAvailableStandards/MPEG-21_schema_files/did/didl.xsd">
   <d:Item id="hdl_2117_346628">
      <d:Descriptor>
         <d:Statement mimeType="application/xml; charset=utf-8">
            <dii:Identifier xmlns:dii="urn:mpeg:mpeg21:2002:01-DII-NS" xsi:schemaLocation="urn:mpeg:mpeg21:2002:01-DII-NS http://standards.iso.org/ittf/PubliclyAvailableStandards/MPEG-21_schema_files/dii/dii.xsd">urn:hdl:2117/346628</dii:Identifier>
         </d:Statement>
      </d:Descriptor>
      <d:Descriptor>
         <d:Statement mimeType="application/xml; charset=utf-8">
            <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
               <dc:title>Adaptive optics control with reinforcement learning: first steps</dc:title>
               <dc:creator>Pou Mulet, Bartomeu</dc:creator>
               <dc:creator>Quiñones, Eduardo</dc:creator>
               <dc:creator>Martín Muñoz, Mario</dc:creator>
               <dc:subject>Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors</dc:subject>
               <dc:subject>High performance computing</dc:subject>
               <dc:subject>Reinforcement Learning</dc:subject>
               <dc:subject>Adaptive Optics</dc:subject>
               <dc:subject>Nonlinear Control</dc:subject>
               <dc:subject>Machine Learning</dc:subject>
               <dc:subject>Càlcul intensiu (Informàtica)</dc:subject>
               <dc:description>When planar wavefronts from distant stars traverse the&#xd;
atmosphere, they become distorted due to the atmosphere’s inhomogeneous&#xd;
temperature distribution. Adaptive Optics (AO)&#xd;
is the field in charge of correcting those distortions allowing&#xd;
high-quality observations of distant targets. The AO solution&#xd;
is composed of three main components: a deformable mirror&#xd;
(DM) that corrects the deformation in the wavefront, a&#xd;
wavefront sensor (WFS) that allows characterising the current&#xd;
turbulence in the wavefront and a real time controller (RTC)&#xd;
that issues commands to, via the deformation of the DM,&#xd;
correct the wavefront. Usually, the operations are performed&#xd;
on closed-loop with stringent real-time requirements (in the&#xd;
order of 103 􀀀 104 actions per second). At each iteration, the&#xd;
WFS observes the wavefront after being corrected by the DM&#xd;
and the RTC issues the commands to correct for the evolution&#xd;
of turbulence and previous uncorrected errors (Figure 1 left).&#xd;
One of the primary sources of error for an AO control&#xd;
algorithm is the temporal error. The delay between characterising&#xd;
the turbulence with the WFS and setting the desired&#xd;
commands in the DM creates the need that any successful&#xd;
control approach must take into account past commands and&#xd;
the probable evolution of the atmosphere in this gap of time.&#xd;
To do that, the most common approach in AO are variants&#xd;
of Linear Quadratic Gaussian (LQG) with Kalman filters with&#xd;
one of its initial iterations presented in [1]. Usually, a linear&#xd;
model of the system’s evolution is built with a set of parameters&#xd;
that are usually fitted based on observations or on theoretical&#xd;
assumptions, which limits the capability of the system to&#xd;
correct the turbulence.&#xd;
In this paper, we present a novel solution based on Reinforcement&#xd;
Learning (RL), based on a reward signal to be&#xd;
optimised, that does not need any previously built model (as&#xd;
LQG) and is non-linear. RL has been already applied in the&#xd;
domain of AO, however, it has been limited to WFS-less&#xd;
systems (e.g. [2]) or, more recently, to control a very limited&#xd;
number of actuators [3]. This work’s main practical objective&#xd;
is to be applied in the 8.2 m Subaru telescope (located in&#xd;
Hawaii), which includes thousands of actuators.&#xd;
B. AO Control: Integrator with gain</dc:description>
               <dc:date>2021-05</dc:date>
               <dc:type>Conference report</dc:type>
               <dc:rights>Open Access</dc:rights>
               <dc:publisher>Barcelona Supercomputing Center</dc:publisher>
            </oai_dc:dc>
         </d:Statement>
      </d:Descriptor>
   </d:Item>
</d:DIDL></metadata></record></GetRecord></OAI-PMH>