mirror of
https://github.com/wassname/pytorch-a2c-ppo-acktr.git
synced 2026-06-27 16:20:05 +08:00
86 lines
2.9 KiB
Markdown
86 lines
2.9 KiB
Markdown
# pytorch-a2c-ppo-acktr
|
|
|
|
## Update 09/27/2017: now supports both Atari and MuJoCo/Roboschool!
|
|
|
|
This is a PyTorch implementation of
|
|
* Advantage Actor Critic (A2C), a synchronous deterministic version of [A3C](https://arxiv.org/pdf/1602.01783v1.pdf)
|
|
* Proximal Policy Optimization [PPO](https://arxiv.org/pdf/1707.06347.pdf)
|
|
* Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation [ACKTR](https://arxiv.org/abs/1708.05144)
|
|
|
|
Also see the OpenAI posts: [A2C/ACKTR](https://blog.openai.com/baselines-acktr-a2c/) and [PPO](https://blog.openai.com/openai-baselines-ppo/) for more information.
|
|
|
|
This implementation is inspired by the OpenAI baselines for [A2C](https://github.com/openai/baselines/tree/master/baselines/a2c), [ACKTR](https://github.com/openai/baselines/tree/master/baselines/acktr) and [PPO](https://github.com/openai/baselines/tree/master/baselines/ppo1). It uses the same hyper parameters and the model since they were well tuned for Atari games.
|
|
|
|
## Contributions
|
|
|
|
Contributions are very welcome. If you know how to make this code better, don't hesitate to send a pull request. Also see a todo list below.
|
|
|
|
Also I'm searching for volunteers to run all experiments on Atari and MuJoCo (with multiple random seeds).
|
|
|
|
## Disclaimer
|
|
|
|
It's extremely difficult to reproduce results for Reinforcement Learning methods. See ["Deep Reinforcement Learning that Matters"](https://arxiv.org/abs/1709.06560) for more information. I tried to reproduce OpenAI results as closely as possible. However, majors differences in performance can be caused even by minor differences in TensorFlow and PyTorch libraries.
|
|
|
|
### TODO
|
|
* Improve this README file. Rearrange images.
|
|
* Improve performance of KFAC, see kfac.py for more information
|
|
* Run evaluation for all games and algorithms
|
|
|
|
## Usage
|
|
|
|
### Atari
|
|
#### A2C
|
|
|
|
```
|
|
python main.py --env-name "PongNoFrameskip-v4"
|
|
```
|
|
|
|
#### PPO
|
|
|
|
```
|
|
python main.py --env-name "PongNoFrameskip-v4" --algo ppo --use-gae --num-processes 8 --num-steps 256 --vis-interval 1 --log-interval 1
|
|
```
|
|
|
|
#### ACKTR
|
|
|
|
```
|
|
python main.py --env-name "PongNoFrameskip-v4" --algo acktr --num-processes 32 --num-steps 20
|
|
```
|
|
|
|
### MuJoCo
|
|
#### A2C
|
|
|
|
```
|
|
python main.py --env-name "Reacher-v1" --num-stack 1 --num-frames 1000000
|
|
```
|
|
|
|
#### PPO
|
|
|
|
```
|
|
python main.py --env-name "Reacher-v1" --algo ppo --use-gae --vis-interval 1 --log-interval 1 --num-stack 1 --num-steps 2048 --num-processes 1 --lr 3e-4 --entropy-coef 0 --ppo-epoch 10 --batch-size 64 --gamma 0.99 --tau 0.95 --num-frames 1000000
|
|
```
|
|
|
|
#### ACKTR
|
|
|
|
ACKTR requires some modifications to be made specifically for MuJoCo. But at the moment, I want to keep this code as unified as possible. Thus, I'm going for better ways to integrate it into the codebase.
|
|
|
|
## Results
|
|
|
|
### A2C
|
|
|
|

|
|
|
|

|
|
|
|

|
|
|
|

|
|
|
|
### PPO
|
|
|
|
Coming soon.
|
|
|
|
### ACKTR
|
|
|
|
Coming soon.
|