[rllib] Behavior Cloning (#1400)

* Behavior Cloning * episode_reward_mean -> mean_loss * removing vestigial code * punctuation * unnecessary * Behavior Cloning * Behavior Cloning * Update __init__.py
2026-07-04 18:14:55 +08:00 · 2018-01-23 10:50:45 -08:00
parent ee36effd8e
commit 4b0ef5eb2c
11 changed files with 390 additions and 83 deletions
@@ -7,8 +7,8 @@ class Optimizer(object):
    """RLlib optimizers encapsulate distributed RL optimization strategies.

    For example, AsyncOptimizer is used for A3C, and LocalMultiGPUOptimizer is
-    used for PPO. These optimizers are all pluggable however, it is possible
-    to mix as match as needed.
+    used for PPO. These optimizers are all pluggable, and it is possible
+    to mix and match as needed.

    In order for an algorithm to use an RLlib optimizer, it must implement
    the Evaluator interface and pass a number of Evaluators to its Optimizer