The Policy Gradient algorithm is a Monte Carlo based reinforcement learning method that uses deep neural networks to approximate an agent’s policy. The policy is a probability distribution that gives us the probability of selecting each action in the agent’s discrete action space. This algorithm is suited for environments like the Open AI gyms’ lunar lander, and can even be scaled up to learn how to play games from the Open AI Gym’s Atari library. We’re going to code up our agent using the Tensorflow 2 fram
Hide player controls
Hide resume playing