Files
ray/python/ray
Sven Mika 165a86f1ab [RLlib] SAC MuJoCo instability issues (tf and torch versions). (#8063)
SAC (both torch and tf versions) are showing issues (crashes) due to numeric instabilities in the SquashedGaussian distribution (sampling + logp after extreme NN outputs).
This PR fixes these. Stable MuJoCo learning (HalfCheetah) has been confirmed on both tf and torch versions. A Distribution stability test (using extreme NN outputs) has been added for SquashedGaussian (can be used for any other type of distribution as well).
2020-04-19 10:20:23 +02:00
..
2020-04-14 08:30:16 -07:00
2018-07-06 00:16:22 -07:00
2020-04-15 12:25:37 -07:00
2020-01-30 09:34:47 -08:00
2020-01-09 00:15:48 -08:00
2020-01-09 00:15:48 -08:00
2020-01-09 00:15:48 -08:00
2020-03-30 20:39:23 -05:00
2020-04-01 10:10:40 -05:00
2020-03-20 14:53:40 +08:00
2020-03-30 20:39:23 -05:00