mirror of
https://github.com/wassname/pytorch-a2c-ppo-acktr.git
synced 2026-06-27 16:20:05 +08:00
5d29401658262246006e50d52a7e7dc6346d1604
pytorch-a2c-ppo
This is a PyTorch implementation of Advantage Actor Critic (A2C), a synchronous deterministic version of A3C "Asynchronous Methods for Deep Reinforcement Learning" and Proximal Policy Optimization (PPO). Also see the OpenAI posts: A2C/A3C and PPO for more information.
This implementation is inspired by the OpenAI baselines for A2C and PPO. It uses the same hyper parameters and the model since they were well tuned for Atari games.
Contributions
Contributions are very welcome. If you know how to make this code better, don't hesitate to send a pull request.
Usage
A2C
python main.py --env-name "PongNoFrameskip-v4"
PPO
python main.py --env-name "PongNoFrameskip-v4" --algo ppo --use-gae --num-processes 8 --num-steps 256 --vis-interval 1 --log-interval 1
Results
A2C
PPO
Coming soon.
Languages
Python
100%



