diff --git a/README.md b/README.md index 327f92f..18be560 100644 --- a/README.md +++ b/README.md @@ -6,14 +6,12 @@ Learning with a Stochastic Actor](https://arxiv.org/pdf/1801.01290.pdf). Added another branch for [Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor](https://arxiv.org/pdf/1801.01290.pdf) -> [SAC_V](https://github.com/pranz24/pytorch-soft-actor-critic/tree/SAC_V) - ### Requirements ------------ -- [mujoco-py](https://github.com/openai/mujoco-py) -- [TensorboardX](https://github.com/lanpa/tensorboardX) -- [PyTorch](http://pytorch.org/) - +- [mujoco-py](https://github.com/openai/mujoco-py) +- [TensorboardX](https://github.com/lanpa/tensorboardX) +- [PyTorch](http://pytorch.org/) ### Default Arguments and Usage ------------- @@ -31,26 +29,25 @@ usage: main.py [-h] [--env-name ENV_NAME] [--policy POLICY] [--eval EVAL] (Note: There is no need for setting Temperature(`--alpha`) if `--automatic_entropy_tuning` is True.) -##### For SAC : +##### For SAC ``` python main.py --env-name Humanoid-v2 --alpha 0.05 ``` -##### For SAC (Hard Update): +##### For SAC (Hard Update) ``` python main.py --env-name Humanoid-v2 --alpha 0.05 --tau 1 --target_update_interval 1000 ``` -##### For SAC (Deterministic, Hard Update): +##### For SAC (Deterministic, Hard Update) ``` python main.py --env-name Humanoid-v2 --policy Deterministic --tau 1 --target_update_interval 1000 ``` ### Arguments - ``` PyTorch Soft Actor-Critic Args @@ -86,9 +83,9 @@ optional arguments: | Environment **(`--env-name`)**| Temperature **(`--alpha`)**| | --------------- | ------------- | -| HalfCheetah-v2 | 0.2 | -| Hopper-v2 | 0.2 | -| Walker2d-v2 | 0.2 | -| Ant-v2 | 0.2 | -| Humanoid-v2 | 0.05 | +| HalfCheetah-v2 | 0.2 | +| Hopper-v2 | 0.2 | +| Walker2d-v2 | 0.2 | +| Ant-v2 | 0.2 | +| Humanoid-v2 | 0.05 |