mirror of
https://github.com/wassname/pytorch-a2c-ppo-acktr.git
synced 2026-06-27 16:20:05 +08:00
ec47ca7ed98e8b8f9d48de591f15e799f0cabab8
pytorch-a2c-ppo-acktr
This is a PyTorch implementation of
- Advantage Actor Critic (A2C), a synchronous deterministic version of A3C
- Proximal Policy Optimization PPO
- Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation ACKTR
Also see the OpenAI posts: A2C/ACKTR and PPO for more information.
This implementation is inspired by the OpenAI baselines for A2C, ACKTR and PPO. It uses the same hyper parameters and the model since they were well tuned for Atari games.
Contributions
Contributions are very welcome. If you know how to make this code better, don't hesitate to send a pull request. Also see a todo list below.
TODO
- Add MuJoCo and continuous actions
- Improve performance of KFAC, see kfac.py for more information
- Run evaluation for all games and algorithms
Usage
A2C
python main.py --env-name "PongNoFrameskip-v4"
PPO
python main.py --env-name "PongNoFrameskip-v4" --algo ppo --use-gae --num-processes 8 --num-steps 256 --vis-interval 1 --log-interval 1
ACKTR
python main.py --env-name "PongNoFrameskip-v4" --algo acktr --num-processes 32 --num-steps 20
Results
A2C
PPO
Coming soon.
ACKTR
Coming soon.
Languages
Python
100%



