Title

Safe deployment of reinforcement learning using deterministic optimization of trained neural networks

Abstract

In combination with the fast-improving performance of optimization software, optimization over trained neural networks (NNs) appears tractable (at least for moderate-size NNs). In this talk, we discuss recent advancements in optimization formulations and software for NN surrogate models, including OMLT, the Optimization and Machine Learning Toolkit (https://github.com/cog-imperial/OMLT). We will then outline how optimization over trained neural-network state-action value functions (i.e., a critic function) can explicitly incorporate constraints and describe two corresponding RL algorithms: the first uses constrained optimization of the critic to give optimal actions for training an actor, while the second guarantees constraint satisfaction by directly implementing actions from optimizing a trained critic model. The two algorithms are tested on a supply chain case study from OR-Gym and are compared against state-of-the-art algorithms TRPO, CPO, and RCPO.

Bio

Dr Calvin Tsay is Assistant Professor (UK Lecturer) in the Computational Optimisation Group at the Department of Computing, Imperial College London. His research focuses on computational methods for optimisation and control, with applications in machine learning and process systems engineering. Calvin received his PhD degree in Chemical Engineering from the University of Texas at Austin, receiving the 2022 W. David Smith, Jr. Graduate Publication Award from the CAST Division of the American Institute of Chemical Engineers (AIChE). He previously received his BS/BA from Rice University (Houston, TX).