Title: A unifying view of optimism in episodic reinforcement learning
Abstract: We consider the problem of learning an optimal policy in a finite horizon Markov Decision Process (MDP) with an unknown transition function. The principle of optimism in the face of uncertainty underpins many theoretically successful algorithms for this problem. In this talk I will discuss a general framework for designing, analyzing and implementing such algorithms. This framework demonstrates that there is a deep relationship between two classes of optimistic algorithms that were previously thought to be distinct. Additionally, the framework is broad enough to capture many existing algorithms and can be extended to factored linear MDPs. I will conclude the talk by discussing an adaptation that also guarantees local differential privacy.