This implementation is inspired by the OpenAI A2C baseline. It uses the same hyper parameters and the model since they were well tuned for Atari games.
Contributions
Contributions are very welcome. If you know how to make this code better, don't hesitate to send a pull request.
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO) and Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR).