Abstract:
I will review the origins of reinforcement learning (RL), and the ‘action-replay’ approach to proving convergence of TD and Q learning in finite state spaces. This proof construction gives some insight into the effect of initial estimates of the value function on the progress of learning.
Next, I will survey some of the difficulties of scaling RL up to large, continuous state spaces. From the point of view of learning in robots, RL is a coherent theory (though not easy to apply). But RL does not seem a coherent or complete explanation of biological learning: how are the reinforcements specified? Is it reasonable to suppose that animals learn by optimising a stream of subjective reinforcement?
A satisfactory theory of biological learning should explain the role of learning in evolution. I will discuss some approaches to doing this, using information theory. There are several surprises. One surprise is an apparently new and unexpected interaction of indiscriminate social imitation with evolution, which may indicate that certain types of imitation may be primitive and important modes of learning.