[RLlib] SAC MuJoCo instability issues (tf and torch versions). (#8063)

SAC (both torch and tf versions) are showing issues (crashes) due to numeric instabilities in the SquashedGaussian distribution (sampling + logp after extreme NN outputs).
This PR fixes these. Stable MuJoCo learning (HalfCheetah) has been confirmed on both tf and torch versions. A Distribution stability test (using extreme NN outputs) has been added for SquashedGaussian (can be used for any other type of distribution as well).
This commit is contained in:
Sven Mika
2020-04-19 10:20:23 +02:00
committed by GitHub
parent bdb03a0544
commit 165a86f1ab
5 changed files with 96 additions and 31 deletions
Binary file not shown.