In Lecture 14 we move from supervised learning to reinforcement learning (RL), in which an agent must learn to interact with an environment in order to maximize its reward. We formalize reinforcement learning using the language of Markov Decision Processes (MDPs), policies, value functions, and Q-Value functions. We discuss different algorithms for reinforcement learning including Q-Learning, policy gradients, and Actor-Critic. We show how deep reinforcement learning has been used to play Atari games and to achieve super-human Go performance in AlphaGo.
Hide player controls
Hide resume playing