Deep deterministic policy gradient with symmetric data augmentation for lateral attitude tracking control of a fixed-wing aircraft

Yifei Li, Erik-Jan van Kampen

arXiv:2407.11077·cs.LG·Published 2024-07-13·Updated 2026-04-13

The symmetry of dynamical systems can be exploited for state-transition prediction and to facilitate control policy optimization. This paper leverages system symmetry to develop sample-efficient offline reinforcement learning (RL) approaches. Under the symmetry assumption for a Markov Decision Process (MDP), a symmetric data augmentation method is proposed. The augmented samples are integrated into the dataset of Deep Deterministic Policy Gradient (DDPG) to enhance its coverage rate of the state-action space. Furthermore, sample utilization efficiency is improved by introducing a second critic trained on the augmented samples, resulting in a dual-critic structure. The aircraft's model is verified to be symmetric, and flight control simulations demonstrate accelerated policy convergence when augmented samples are employed.

TopicsDynamical Systems & PDE Learning

Tagsdynamical-systems

arXiv categoriescs.LG, cs.AI

arXiv abstract pagePDF