wassname/pytorch-a2c-ppo-acktr

Fork 0

mirror of https://github.com/wassname/pytorch-a2c-ppo-acktr.git synced 2026-06-27 16:20:05 +08:00

T

Ilya Kostrikov 475de22519 Create a rollout storage

2017-09-20 19:29:04 -04:00

imgs

Add images

2017-09-14 07:08:50 -04:00

arguments.py

Move arguments to a separate file

2017-09-18 23:22:11 -04:00

envs.py

Initial commit

2017-09-07 19:45:57 -04:00

kfac.py

Fix a typo

2017-09-19 09:31:49 -04:00

LICENSE

Create LICENSE

2017-09-07 20:01:02 -04:00

main.py

Create a rollout storage

2017-09-20 19:29:04 -04:00

model.py

Add KFAC

2017-09-17 23:33:59 -04:00

README.md

Add KFAC

2017-09-17 23:33:59 -04:00

storage.py

Create a rollout storage

2017-09-20 19:29:04 -04:00

vizualize_atari.py

Add visualization

2017-09-13 18:34:13 -04:00

README.md

pytorch-a2c-ppo-acktr

This is a PyTorch implementation of

Advantage Actor Critic (A2C), a synchronous deterministic version of A3C
Proximal Policy Optimization PPO
Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation ACKTR

Also see the OpenAI posts: A2C/ACKTR and PPO for more information.

This implementation is inspired by the OpenAI baselines for A2C, ACKTR and PPO. It uses the same hyper parameters and the model since they were well tuned for Atari games.

Contributions

Contributions are very welcome. If you know how to make this code better, don't hesitate to send a pull request. Also see a todo list below.

TODO

Add MuJoCo and continuous actions
Improve performance of KFAC, see kfac.py for more information
Run evaluation for all games and algorithms

Usage

A2C

python main.py --env-name "PongNoFrameskip-v4"

PPO

python main.py --env-name "PongNoFrameskip-v4" --algo ppo --use-gae --num-processes 8 --num-steps 256 --vis-interval 1 --log-interval 1

ACKTR

python main.py --env-name "PongNoFrameskip-v4" --algo acktr --num-processes 32 --num-steps 20

Results

A2C

PPO

Coming soon.

ACKTR

Coming soon.

Description

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO) and Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR).

Readme MIT 233 KiB