Files
ray/python
Jones Wong 319c1340cb [rllib] Develop MARWIL (#3635)
*  add marvil policy graph

*  fix typo

*  add offline optimizer and enable running marwil

*  fix loss function

*  add maintaining the moving average of advantage norm

*  use sync replay optimizer for unifying

*  remove offline optimizer and use sync replay optimizer

*  format by yapf

*  add imitation learning objective

*  fix according to eric's review

*  format by yapf

* revise

* add test data

* marwil
2019-01-16 19:00:43 -08:00
..
2019-01-16 19:00:43 -08:00
2018-06-20 10:43:44 -07:00