[rllib] Feature/soft actor critic v2 (#5328)

* Add base for Soft Actor-Critic

* Pick changes from old SAC branch

* Update sac.py

* First implementation of sac model

* Remove unnecessary SAC imports

* Prune unnecessary noise and exploration code

* Implement SAC model and use that in SAC policy

* runs but doesn't learn

* clear state

* fix batch size

* Add missing alpha grads and vars

* -200 by 2k timesteps

* doc

* lazy squash

* one file

* ignore tfp

* revert done
This commit is contained in:
Kristian Hartikainen
2019-08-01 23:37:36 -07:00
committed by Eric Liang
parent 3ae54a2b20
commit 13fb9fe3db
21 changed files with 827 additions and 26 deletions
+1 -1
View File
@@ -7,7 +7,7 @@ RUN conda install -y numpy
RUN apt-get install -y zlib1g-dev
# The following is needed to support TensorFlow 1.14
RUN conda remove -y --force wrapt
RUN pip install gym[atari] opencv-python-headless tensorflow lz4 keras pytest-timeout smart_open
RUN pip install gym[atari] opencv-python-headless tensorflow lz4 keras pytest-timeout smart_open tensorflow_probability
RUN pip install -U h5py # Mutes FutureWarnings
RUN pip install --upgrade bayesian-optimization
RUN pip install --upgrade git+git://github.com/hyperopt/hyperopt.git