[rllib] Add a doc section on computing actions (#6326)

* options doc * add note * hint shr * doc update
2026-07-02 21:56:20 +08:00 · 2019-12-03 00:10:50 -08:00
parent 670cb6374e
commit bc5e259264
2 changed files with 74 additions and 0 deletions
@@ -18,6 +18,8 @@ Training APIs

   -  `Custom Training Workflows <rllib-training.html#custom-training-workflows>`__

+   -  `Computing Actions <rllib-training.html#computing-actions>`__
+
   -  `Accessing Policy State <rllib-training.html#accessing-policy-state>`__

   -  `Accessing Model State <rllib-training.html#accessing-model-state>`__
@@ -184,6 +184,78 @@ In the `basic training example <https://github.com/ray-project/ray/blob/master/r

 For even finer-grained control over training, you can use RLlib's lower-level `building blocks <rllib-concepts.html>`__ directly to implement `fully customized training workflows <https://github.com/ray-project/ray/blob/master/rllib/examples/rollout_worker_custom_workflow.py>`__.

+Computing Actions
+~~~~~~~~~~~~~~~~~
+
+The simplest way to programmatically compute actions from a trained agent is to use ``trainer.compute_action()``.
+This method preprocesses and filters the observation before passing it to the agent policy.
+For more advanced usage, you can access the ``workers`` and policies held by the trainer
+directly as ``compute_action()`` does:
+
+.. code-block:: python
+
+  class Trainer(Trainable):
+
+    @PublicAPI
+    def compute_action(self,
+                       observation,
+                       state=None,
+                       prev_action=None,
+                       prev_reward=None,
+                       info=None,
+                       policy_id=DEFAULT_POLICY_ID,
+                       full_fetch=False):
+        """Computes an action for the specified policy.
+
+        Note that you can also access the policy object through
+        self.get_policy(policy_id) and call compute_actions() on it directly.
+
+        Arguments:
+            observation (obj): observation from the environment.
+            state (list): RNN hidden state, if any. If state is not None,
+                          then all of compute_single_action(...) is returned
+                          (computed action, rnn state, logits dictionary).
+                          Otherwise compute_single_action(...)[0] is
+                          returned (computed action).
+            prev_action (obj): previous action value, if any
+            prev_reward (int): previous reward, if any
+            info (dict): info object, if any
+            policy_id (str): policy to query (only applies to multi-agent).
+            full_fetch (bool): whether to return extra action fetch results.
+                This is always set to true if RNN state is specified.
+
+        Returns:
+            Just the computed action if full_fetch=False, or the full output
+            of policy.compute_actions() otherwise.
+        """
+
+        if state is None:
+            state = []
+        preprocessed = self.workers.local_worker().preprocessors[
+            policy_id].transform(observation)
+        filtered_obs = self.workers.local_worker().filters[policy_id](
+            preprocessed, update=False)
+        if state:
+            return self.get_policy(policy_id).compute_single_action(
+                filtered_obs,
+                state,
+                prev_action,
+                prev_reward,
+                info,
+                clip_actions=self.config["clip_actions"])
+        res = self.get_policy(policy_id).compute_single_action(
+            filtered_obs,
+            state,
+            prev_action,
+            prev_reward,
+            info,
+            clip_actions=self.config["clip_actions"])
+        if full_fetch:
+            return res
+        else:
+            return res[0]  # backwards compatibility
+
+
 Accessing Policy State
 ~~~~~~~~~~~~~~~~~~~~~~
 It is common to need to access a trainer's internal state, e.g., to set or get internal weights. In RLlib trainer state is replicated across multiple *rollout workers* (Ray actors) in the cluster. However, you can easily get and update this state between calls to ``train()`` via ``trainer.workers.foreach_worker()`` or ``trainer.workers.foreach_worker_with_index()``. These functions take a lambda function that is applied with the worker as an arg. You can also return values from these functions and those will be returned as a list.