Files
pytorch-a2c-ppo-acktr/README.md
T
Ilya Kostrikov 5d29401658 Add PPO
2017-09-17 18:43:45 -04:00

1.3 KiB

pytorch-a2c-ppo

This is a PyTorch implementation of Advantage Actor Critic (A2C), a synchronous deterministic version of A3C "Asynchronous Methods for Deep Reinforcement Learning" and Proximal Policy Optimization (PPO). Also see the OpenAI posts: A2C/A3C and PPO for more information.

This implementation is inspired by the OpenAI baselines for A2C and PPO. It uses the same hyper parameters and the model since they were well tuned for Atari games.

Contributions

Contributions are very welcome. If you know how to make this code better, don't hesitate to send a pull request.

Usage

A2C

python main.py --env-name "PongNoFrameskip-v4"

PPO

python main.py --env-name "PongNoFrameskip-v4" --algo ppo --use-gae --num-processes 8 --num-steps 256 --vis-interval 1 --log-interval 1

Results

A2C

BreakoutNoFrameskip-v4

SeaquestNoFrameskip-v4

QbertNoFrameskip-v4

beamriderNoFrameskip-v4

PPO

Coming soon.