mirror of
https://github.com/wassname/ray.git
synced 2026-06-27 19:16:19 +08:00
[rllib] Sync filters at end of iteration not start; hierarchical docs (#3769)
This commit is contained in:
@@ -108,8 +108,8 @@ Vectorized
|
||||
|
||||
RLlib will auto-vectorize Gym envs for batch evaluation if the ``num_envs_per_worker`` config is set, or you can define a custom environment class that subclasses `VectorEnv <https://github.com/ray-project/ray/blob/master/python/ray/rllib/env/vector_env.py>`__ to implement ``vector_step()`` and ``vector_reset()``.
|
||||
|
||||
Multi-Agent
|
||||
-----------
|
||||
Multi-Agent and Hierarchical
|
||||
----------------------------
|
||||
|
||||
.. note::
|
||||
|
||||
@@ -162,7 +162,6 @@ If all the agents will be using the same algorithm class to train, then you can
|
||||
"traffic_light" # Traffic lights are always controlled by this policy
|
||||
if agent_id.startswith("traffic_light_")
|
||||
else random.choice(["car1", "car2"]) # Randomly choose from car policies
|
||||
},
|
||||
},
|
||||
})
|
||||
|
||||
@@ -203,6 +202,38 @@ Here is a simple `example training script <https://github.com/ray-project/ray/bl
|
||||
|
||||
To scale to hundreds of agents, MultiAgentEnv batches policy evaluations across multiple agents internally. It can also be auto-vectorized by setting ``num_envs_per_worker > 1``.
|
||||
|
||||
Hierarchical Environments
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Hierarchical training can sometimes be implemented as a special case of multi-agent RL. For example, consider a three-level hierarchy of policies, where a top-level policy issues high level actions that are executed at finer timescales by a mid-level and low-level policy. The following timeline shows one step of the top-level policy, which corresponds to two mid-level actions and five low-level actions:
|
||||
|
||||
.. code-block:: text
|
||||
|
||||
top_level ---------------------------------------------------------------> top_level --->
|
||||
mid_level_0 -------------------------------> mid_level_0 ----------------> mid_level_1 ->
|
||||
low_level_0 -> low_level_0 -> low_level_0 -> low_level_1 -> low_level_1 -> low_level_2 ->
|
||||
|
||||
This can be implemented as a multi-agent environment with three types of agents. Each higher-level action creates a new lower-level agent instance with a new id (e.g., ``low_level_0``, ``low_level_1``, ``low_level_2`` in the above example). These lower-level agents pop in existence at the start of higher-level steps, and terminate when their higher-level action ends. Their experiences are aggregated by policy, so from RLlib's perspective it's just optimizing three different types of policies. The configuration might look something like this:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
"multiagent": {
|
||||
"policy_graphs": {
|
||||
"top_level": (some_policy_graph, ...),
|
||||
"mid_level": (some_policy_graph, ...),
|
||||
"low_level": (some_policy_graph, ...),
|
||||
},
|
||||
"policy_mapping_fn":
|
||||
lambda agent_id:
|
||||
"low_level" if agent_id.startswith("low_level_") else
|
||||
"mid_level" if agent_id.startswith("mid_level_") else "top_level"
|
||||
"policies_to_train": ["top_level"],
|
||||
},
|
||||
|
||||
|
||||
In this setup, the appropriate rewards for training lower-level agents must be provided by the multi-agent env implementation. The environment class is also responsible for routing between the agents, e.g., conveying `goals <https://arxiv.org/pdf/1703.01161.pdf>`__ from higher-level agents to lower-level agents as part of the lower-level agent observation.
|
||||
|
||||
|
||||
Grouping Agents
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
|
||||
File diff suppressed because one or more lines are too long
|
Before Width: | Height: | Size: 91 KiB After Width: | Height: | Size: 97 KiB |
@@ -37,7 +37,7 @@ Environments
|
||||
* `RLlib Environments Overview <rllib-env.html>`__
|
||||
* `OpenAI Gym <rllib-env.html#openai-gym>`__
|
||||
* `Vectorized <rllib-env.html#vectorized>`__
|
||||
* `Multi-Agent <rllib-env.html#multi-agent>`__
|
||||
* `Multi-Agent and Hierarchical <rllib-env.html#multi-agent-and-hierarchical>`__
|
||||
* `Interfacing with External Agents <rllib-env.html#interfacing-with-external-agents>`__
|
||||
* `Batch Asynchronous <rllib-env.html#batch-asynchronous>`__
|
||||
|
||||
|
||||
@@ -271,6 +271,8 @@ class Agent(Trainable):
|
||||
ev.set_global_vars.remote(self.global_vars)
|
||||
logger.debug("updated global vars: {}".format(self.global_vars))
|
||||
|
||||
result = Trainable.train(self)
|
||||
|
||||
if (self.config.get("observation_filter", "NoFilter") != "NoFilter"
|
||||
and hasattr(self, "local_evaluator")):
|
||||
FilterManager.synchronize(
|
||||
@@ -280,12 +282,12 @@ class Agent(Trainable):
|
||||
logger.debug("synchronized filters: {}".format(
|
||||
self.local_evaluator.filters))
|
||||
|
||||
result = Trainable.train(self)
|
||||
if self.config["callbacks"].get("on_train_result"):
|
||||
self.config["callbacks"]["on_train_result"]({
|
||||
"agent": self,
|
||||
"result": result,
|
||||
})
|
||||
|
||||
return result
|
||||
|
||||
@override(Trainable)
|
||||
|
||||
Reference in New Issue
Block a user