A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The pendulum is placed upright on the cart and the goal is to balance the pole by applying forces in the left and right direction on the cart. The model probably retrained, but with fewer neurons, the network was not trained at all. I consider the task of learning conditionally solved. Gymnasium (gym): Code:
Hide player controls
Hide resume playing