pytorch-soft-actor-critic/README.md at a436c8aa9cf52d14b7cdc82cebb3d8726af7a2f0

wassname/pytorch-soft-actor-critic

Fork 0

mirror of https://github.com/wassname/pytorch-soft-actor-critic.git synced 2026-06-27 18:06:10 +08:00

Files

T

Pranjal Tandon a436c8aa9c Update README.md

2018-09-15 12:26:23 +05:30

2.0 KiB

Raw Blame History

Description

Reimplementation of Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor.

Contributions are welcome. If you find any mistake (very likely) or know how to make it more stable, don't hesitate to send a pull request.

Requirements

Run

For SAC :

python main.py --env-name Humanoid-v2 --scale_R 20

For SAC (Hard Update):

python main.py --env-name Humanoid-v2 --scale_R 20 --tau 1 --value_update 1000

For SAC (Deterministic, Hard Update):

python main.py --env-name Humanoid-v2 --scale_R 20 --deterministic True --tau 1 --value_update 1000

Results

My results on Humanoid-v2 environment using SAC, SAC(hard update) and SAC(deterministic, hard update). This is a plot of average rewards at every 10000 step interval

Hyperparameters

Use the following hyperparameters for different environment:

Parameters	Value
Shared	-
optimizer	Adam
learning rate	3x10⁻⁴
discount (γ)	0.99
replay buffer size	1x10⁶
number of hidden layers (all networks)	2
number of hidden units per layer	256
number of samples per minibatch	256
nonlinearity	ReLU
SAC	-
target smoothing coefficient (τ)	0.005
target update interval	1
gradient steps	1
SAC (Hard Update)	-
target smoothing coefficient (τ)	1
target update interval	1000
gradient steps (except humanoids)	4
gradient steps (humanoids)	1

Environment	Reward Scale
HalfCheetah-v2	5
Hopper-v2	5
Walker2d-v2	5
Ant-v2	5
Humanoid-v2	20

2.0 KiB Raw Blame History Unescape Escape