Imperial College London

DrThulasiMylvaganam

Faculty of EngineeringDepartment of Aeronautics

Senior Lecturer in Control Engineering
 
 
 
//

Contact

 

+44 (0)20 7594 5129t.mylvaganam

 
 
//

Location

 

221City and Guilds BuildingSouth Kensington Campus

//

Summary

 

Publications

Citation

BibTex format

@article{Sassano:2023:10.1109/TAC.2022.3199211,
author = {Sassano, M and Mylvaganam, T and Astolfi, A},
doi = {10.1109/TAC.2022.3199211},
journal = {IEEE Transactions on Automatic Control},
pages = {2683--2698},
title = {Model-based policy iterations for nonlinear systems via controlled Hamiltonian dynamics},
url = {http://dx.doi.org/10.1109/TAC.2022.3199211},
volume = {68},
year = {2023}
}

RIS format (EndNote, RefMan)

TY  - JOUR
AB - The infinite-horizon optimal control problem for nonlinear systems is studied. In the context of model-based, iterative learning strategies we propose an alternative definition and construction of the temporal difference error arising in Policy Iteration strategies. In such architectures the error is computed via the evolution of the Hamiltonian function (or, possibly, of its integral) along the trajectories of the closed-loop system. Herein the temporal difference error is instead obtained via two subsequent steps: first the dynamics of the underlying costate variable in the Hamiltonian system is steered by means of a (virtual) control input in such a way that the stable invariant manifold becomes externally attractive. Then, the distance-from-invariance of the manifold, induced by approximate solutions, yields a natural candidate measure for the policy evaluation step. The policy improvement phase is then performed by means of standard gradient descent methodsthat allows to correctly update the weights of the underlying functional approximator. The above architecture then yields an iterative (episodic) learning scheme based on a scalar, constant reward at each iteration, the value of which is insensitive to the length of the episode, as in the originalspirit of Reinforcement Learning strategies for discrete-time systems. Finally, the theory is validated by means of a numerical simulation involving an automatic flight control problem.
AU - Sassano,M
AU - Mylvaganam,T
AU - Astolfi,A
DO - 10.1109/TAC.2022.3199211
EP - 2698
PY - 2023///
SN - 0018-9286
SP - 2683
TI - Model-based policy iterations for nonlinear systems via controlled Hamiltonian dynamics
T2 - IEEE Transactions on Automatic Control
UR - http://dx.doi.org/10.1109/TAC.2022.3199211
UR - http://hdl.handle.net/10044/1/97507
VL - 68
ER -