Training the Ant, running the full simulation on CUDA at 1 Million steps per second on an NVIDIA RTX 2080, using the Tiny Differentiable Simulator and a C Augmented Random Search implementation. It is a linear policy, action dimension 8, observation dimention 28. Running 256 Ants in parallel. Source code is here: See also Laikago trained using the same tech:
Hide player controls
Hide resume playing