In this paper, we present an automated learning environment for developing control policies directly on the hardware of a modular legged robot. This environment facilitates the reinforcement learning process by computing the rewards using a vision-based tracking system and relocating the robot to the initial position using a resetting mechanism. We employ two state-of-the-art deep reinforcement learning (DRL) algorithms, Trust Region Policy Optimization (TRPO) and Deep Deterministic Policy Gradient (DDPG),
Hide player controls
Hide resume playing