#playerofgames #deepmind #alphazero Special Guest: First author Martin Schmid () Games have been used throughout research as testbeds for AI algorithms, such as reinforcement learning agents. However, different types of games usually require different solution approaches, such as AlphaZero for Go or Chess, and Counterfactual Regret Minimization (CFR) for Poker. Player of Games bridges this gap between perfect and imperfect information games and delivers a single algorithm that uses tree search over public information states, and is trained via self-play. The resulting algorithm can play Go, Chess, Poker, Scotland Yard, and many more games, as well as non-game environments. OUTLINE: 0:00 - Introduction 2:50 - What games can Player of Games be trained on? 4:00 - Tree search algorithms (AlphaZero) 8:00 - What is different in imperfect information games? 15:40 - Counterfactual Value- and Policy-Networks 18:50 - The Player of Games search procedure 28:30 - How to train the network? 34
Hide player controls
Hide resume playing