diff --git a/README.md b/README.md index e9cf98e..62fb0e2 100644 --- a/README.md +++ b/README.md @@ -34,13 +34,10 @@ python main.py --env-name Humanoid-v2 --scale_R 20 --tau 1 --value_update 1000 python main.py --env-name Humanoid-v2 --scale_R 20 --deterministic True --tau 1 --value_update 1000 ``` -### TODO +### Results ------------ +My results on Humanoid-v2 environment using SAC and SAC(deterministic, hard update). +This is a plot of average rewards at every 10000 steps interval -- [x] Gaussian Policy -- [x] Reparameterization -- [x] Gaussian Mixture Model -- [x] Use 2 Q-functions -- [x] Evaluate the trained Policy -- [ ] Deterministic Policy (hard target update) +![sacs](https://user-images.githubusercontent.com/18737539/45400165-b2f42980-b668-11e8-8e2b-2b5cdc226204.jpeg)