Files
ray/python/ray
Jones Wong 319c1340cb [rllib] Develop MARWIL (#3635)
*  add marvil policy graph

*  fix typo

*  add offline optimizer and enable running marwil

*  fix loss function

*  add maintaining the moving average of advantage norm

*  use sync replay optimizer for unifying

*  remove offline optimizer and use sync replay optimizer

*  format by yapf

*  add imitation learning objective

*  fix according to eric's review

*  format by yapf

* revise

* add test data

* marwil
2019-01-16 19:00:43 -08:00
..
2019-01-07 12:45:48 -08:00
2018-07-06 00:16:22 -07:00
2019-01-16 19:00:43 -08:00
2019-01-16 11:40:54 -08:00
2019-01-14 13:52:51 -08:00