mirror of
https://github.com/wassname/pytorch-a2c-ppo-acktr.git
synced 2026-06-27 16:20:05 +08:00
475de2251909d06c1b7056ebe0800c889e1bc863
pytorch-a2c-ppo-acktr
This is a PyTorch implementation of
- Advantage Actor Critic (A2C), a synchronous deterministic version of A3C
- Proximal Policy Optimization PPO
- Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation ACKTR
Also see the OpenAI posts: A2C/ACKTR and PPO for more information.
This implementation is inspired by the OpenAI baselines for A2C, ACKTR and PPO. It uses the same hyper parameters and the model since they were well tuned for Atari games.
Contributions
Contributions are very welcome. If you know how to make this code better, don't hesitate to send a pull request. Also see a todo list below.
TODO
- Add MuJoCo and continuous actions
- Improve performance of KFAC, see kfac.py for more information
- Run evaluation for all games and algorithms
Usage
A2C
python main.py --env-name "PongNoFrameskip-v4"
PPO
python main.py --env-name "PongNoFrameskip-v4" --algo ppo --use-gae --num-processes 8 --num-steps 256 --vis-interval 1 --log-interval 1
ACKTR
python main.py --env-name "PongNoFrameskip-v4" --algo acktr --num-processes 32 --num-steps 20
Results
A2C
PPO
Coming soon.
ACKTR
Coming soon.
Languages
Python
100%



