2026-04-13T02:47:58Zhttps://recercat.cat/oai/request

oai:recercat.cat:2117/3466282026-01-19T02:19:27Zcom_2072_1033col_2072_452949

Adaptive optics control with reinforcement learning: first steps Pou Mulet, Bartomeu Quiñones, Eduardo Martín Muñoz, Mario Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors High performance computing Reinforcement Learning Adaptive Optics Nonlinear Control Machine Learning Càlcul intensiu (Informàtica) When planar wavefronts from distant stars traverse the atmosphere, they become distorted due to the atmosphere’s inhomogeneous temperature distribution. Adaptive Optics (AO) is the field in charge of correcting those distortions allowing high-quality observations of distant targets. The AO solution is composed of three main components: a deformable mirror (DM) that corrects the deformation in the wavefront, a wavefront sensor (WFS) that allows characterising the current turbulence in the wavefront and a real time controller (RTC) that issues commands to, via the deformation of the DM, correct the wavefront. Usually, the operations are performed on closed-loop with stringent real-time requirements (in the order of 103 􀀀 104 actions per second). At each iteration, the WFS observes the wavefront after being corrected by the DM and the RTC issues the commands to correct for the evolution of turbulence and previous uncorrected errors (Figure 1 left). One of the primary sources of error for an AO control algorithm is the temporal error. The delay between characterising the turbulence with the WFS and setting the desired commands in the DM creates the need that any successful control approach must take into account past commands and the probable evolution of the atmosphere in this gap of time. To do that, the most common approach in AO are variants of Linear Quadratic Gaussian (LQG) with Kalman filters with one of its initial iterations presented in [1]. Usually, a linear model of the system’s evolution is built with a set of parameters that are usually fitted based on observations or on theoretical assumptions, which limits the capability of the system to correct the turbulence. In this paper, we present a novel solution based on Reinforcement Learning (RL), based on a reward signal to be optimised, that does not need any previously built model (as LQG) and is non-linear. RL has been already applied in the domain of AO, however, it has been limited to WFS-less systems (e.g. [2]) or, more recently, to control a very limited number of actuators [3]. This work’s main practical objective is to be applied in the 8.2 m Subaru telescope (located in Hawaii), which includes thousands of actuators. B. AO Control: Integrator with gain 2021-05 Conference report Open Access Barcelona Supercomputing Center