mirror of
https://github.com/wassname/ray.git
synced 2026-07-05 07:39:02 +08:00
[rllib] Update RLlib docs and README (#1288)
Updates the rllib docs and README.
This commit is contained in:
+46
-20
@@ -1,31 +1,57 @@
|
||||
RLLib: A Scalable Reinforcement Learning Library
|
||||
RLlib: A Scalable Reinforcement Learning Library
|
||||
================================================
|
||||
|
||||
Getting Started
|
||||
---------------
|
||||
This README provides a brief technical overview of RLlib. See also the `user documentation <http://ray.readthedocs.io/en/latest/rllib.html>`__.
|
||||
|
||||
You can run training with
|
||||
RLlib currently provides the following algorithms:
|
||||
|
||||
::
|
||||
- `Proximal Policy Optimization <https://arxiv.org/abs/1707.06347>`__ which
|
||||
is a proximal variant of `TRPO <https://arxiv.org/abs/1502.05477>`__.
|
||||
|
||||
python train.py --env CartPole-v0 --run PPO
|
||||
|
||||
The available algorithms are:
|
||||
|
||||
- ``PPO`` is a proximal variant of
|
||||
`TRPO <https://arxiv.org/abs/1502.05477>`__.
|
||||
|
||||
- ``ES`` is decribed in `this
|
||||
- Evolution Strategies which is decribed in `this
|
||||
paper <https://arxiv.org/abs/1703.03864>`__. Our implementation
|
||||
borrows code from
|
||||
`here <https://github.com/openai/evolution-strategies-starter>`__.
|
||||
|
||||
- ``DQN`` is an implementation of `Deep Q
|
||||
Networks <https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf>`__ based on
|
||||
`OpenAI baselines <https://github.com/openai/baselines>`__.
|
||||
- `The Asynchronous Advantage Actor-Critic <https://arxiv.org/abs/1602.01783>`__
|
||||
based on `the OpenAI starter agent <https://github.com/openai/universe-starter-agent>`__.
|
||||
|
||||
- ``A3C`` is an implementation of
|
||||
`A3C <https://arxiv.org/abs/1602.01783>`__ based on `the OpenAI
|
||||
starter agent <https://github.com/openai/universe-starter-agent>`__.
|
||||
- `Deep Q Network (DQN) <https://arxiv.org/abs/1312.5602>`__.
|
||||
|
||||
User documentation can be `found here <http://ray.readthedocs.io/en/latest/rllib.html>`__.
|
||||
Proximal Policy Optimization scales to hundreds of cores and several GPUs, Evolution Strategies to clusters with thousands of cores and the Asynchronous Advantage Actor-Critic scales to dozens of cores on a single node.
|
||||
|
||||
These algorithms can be run on any OpenAI Gym MDP, including custom ones written and registered by the user.
|
||||
|
||||
For more detailed usage information, see the `user documentation <http://ray.readthedocs.io/en/latest/rllib.html>`__.
|
||||
|
||||
Training API
|
||||
------------
|
||||
|
||||
All RLlib algorithms implement a common training API (agent.py), which enables multiple algorithms to be easily evaluated:
|
||||
|
||||
::
|
||||
|
||||
# Train a model on a single environment
|
||||
python train.py --env CartPole-v0 --run PPO
|
||||
|
||||
# Integration with ray.tune for hyperparam evaluation
|
||||
python train.py -f tuned_examples/cartpole-grid-search-example.yaml
|
||||
|
||||
Evaluator and Optimizer abstractions
|
||||
------------------------------------
|
||||
|
||||
RLlib's gradient-based algorithms are composed using two abstractions: Evaluators (evaluator.py) and Optimizers (optimizers/optimizer.py). Optimizers encapsulate a particular distributed optimization strategy for RL. Evaluators encapsulate the model graph, and once implemented, any Optimizer may be "plugged in" to any algorithm that implements the Evaluator interface.
|
||||
|
||||
This pluggability enables optimization strategies to be re-used and improved across different algorithms and deep learning frameworks (RLlib's optimizers work with both TensorFlow and PyTorch, though currently only A3C has a PyTorch graph implementation).
|
||||
|
||||
These are the currently available optimizers:
|
||||
|
||||
- ``AsyncOptimizer`` is an asynchronous RL optimizer, i.e. like A3C. It asynchronously pulls and applies gradients from evaluators, sending updated weights back as needed.
|
||||
- ``LocalSyncOptimizer`` is a simple synchronous RL optimizer. It pulls samples from remote evaluators, concatenates them, and then updates a local model. The updated model weights are then broadcast to all remote evalutaors.
|
||||
- ``LocalMultiGPUOptimizer`` (currently available for PPO) This optimizer performs SGD over a number of local GPUs, and pins experience data in GPU memory to amortize the copy overhead for multiple SGD passes.
|
||||
- ``AllReduceOptimizer`` (planned) This optimizer would use the Allreduce primitive to scalably synchronize weights among a number of remote GPU workers.
|
||||
|
||||
Common utilities
|
||||
----------------
|
||||
|
||||
RLlib defines common action distributions, preprocessors, and neural network models, found in ``models/catalog.py``, which are shared by all algorithms. More information on these classes can be found in the `developer API docs <http://ray.readthedocs.io/en/latest/rllib.html#the-developer-api>`__.
|
||||
|
||||
@@ -35,7 +35,8 @@ class Agent(Trainable):
|
||||
env_creator (func): Function that creates a new training env.
|
||||
config (obj): Algorithm-specific configuration data.
|
||||
logdir (str): Directory in which training outputs should be placed.
|
||||
registry (obj): Object registry.
|
||||
registry (obj): Tune object registry, for registering user-defined
|
||||
classes and objects by name.
|
||||
"""
|
||||
|
||||
_allow_unknown_configs = False
|
||||
|
||||
@@ -118,7 +118,7 @@ class TrialRunner(object):
|
||||
self._committed_resources.gpu,
|
||||
self._avail_resources.gpu))
|
||||
for local_dir in sorted(set([t.local_dir for t in self._trials])):
|
||||
messages.append("Tensorboard logdir: {}".format(local_dir))
|
||||
messages.append("Result logdir: {}".format(local_dir))
|
||||
for t in self._trials:
|
||||
if t.local_dir == local_dir:
|
||||
messages.append(
|
||||
|
||||
Reference in New Issue
Block a user