[rllib] Update RLlib docs and README (#1288)

Updates the rllib docs and README.
This commit is contained in:
Eric Liang
2017-12-06 18:17:51 -08:00
committed by GitHub
parent 2d543b6e19
commit 35f7398666
7 changed files with 151 additions and 94 deletions
+46 -20
View File
@@ -1,31 +1,57 @@
RLLib: A Scalable Reinforcement Learning Library
RLlib: A Scalable Reinforcement Learning Library
================================================
Getting Started
---------------
This README provides a brief technical overview of RLlib. See also the `user documentation <http://ray.readthedocs.io/en/latest/rllib.html>`__.
You can run training with
RLlib currently provides the following algorithms:
::
- `Proximal Policy Optimization <https://arxiv.org/abs/1707.06347>`__ which
is a proximal variant of `TRPO <https://arxiv.org/abs/1502.05477>`__.
python train.py --env CartPole-v0 --run PPO
The available algorithms are:
- ``PPO`` is a proximal variant of
`TRPO <https://arxiv.org/abs/1502.05477>`__.
- ``ES`` is decribed in `this
- Evolution Strategies which is decribed in `this
paper <https://arxiv.org/abs/1703.03864>`__. Our implementation
borrows code from
`here <https://github.com/openai/evolution-strategies-starter>`__.
- ``DQN`` is an implementation of `Deep Q
Networks <https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf>`__ based on
`OpenAI baselines <https://github.com/openai/baselines>`__.
- `The Asynchronous Advantage Actor-Critic <https://arxiv.org/abs/1602.01783>`__
based on `the OpenAI starter agent <https://github.com/openai/universe-starter-agent>`__.
- ``A3C`` is an implementation of
`A3C <https://arxiv.org/abs/1602.01783>`__ based on `the OpenAI
starter agent <https://github.com/openai/universe-starter-agent>`__.
- `Deep Q Network (DQN) <https://arxiv.org/abs/1312.5602>`__.
User documentation can be `found here <http://ray.readthedocs.io/en/latest/rllib.html>`__.
Proximal Policy Optimization scales to hundreds of cores and several GPUs, Evolution Strategies to clusters with thousands of cores and the Asynchronous Advantage Actor-Critic scales to dozens of cores on a single node.
These algorithms can be run on any OpenAI Gym MDP, including custom ones written and registered by the user.
For more detailed usage information, see the `user documentation <http://ray.readthedocs.io/en/latest/rllib.html>`__.
Training API
------------
All RLlib algorithms implement a common training API (agent.py), which enables multiple algorithms to be easily evaluated:
::
# Train a model on a single environment
python train.py --env CartPole-v0 --run PPO
# Integration with ray.tune for hyperparam evaluation
python train.py -f tuned_examples/cartpole-grid-search-example.yaml
Evaluator and Optimizer abstractions
------------------------------------
RLlib's gradient-based algorithms are composed using two abstractions: Evaluators (evaluator.py) and Optimizers (optimizers/optimizer.py). Optimizers encapsulate a particular distributed optimization strategy for RL. Evaluators encapsulate the model graph, and once implemented, any Optimizer may be "plugged in" to any algorithm that implements the Evaluator interface.
This pluggability enables optimization strategies to be re-used and improved across different algorithms and deep learning frameworks (RLlib's optimizers work with both TensorFlow and PyTorch, though currently only A3C has a PyTorch graph implementation).
These are the currently available optimizers:
- ``AsyncOptimizer`` is an asynchronous RL optimizer, i.e. like A3C. It asynchronously pulls and applies gradients from evaluators, sending updated weights back as needed.
- ``LocalSyncOptimizer`` is a simple synchronous RL optimizer. It pulls samples from remote evaluators, concatenates them, and then updates a local model. The updated model weights are then broadcast to all remote evalutaors.
- ``LocalMultiGPUOptimizer`` (currently available for PPO) This optimizer performs SGD over a number of local GPUs, and pins experience data in GPU memory to amortize the copy overhead for multiple SGD passes.
- ``AllReduceOptimizer`` (planned) This optimizer would use the Allreduce primitive to scalably synchronize weights among a number of remote GPU workers.
Common utilities
----------------
RLlib defines common action distributions, preprocessors, and neural network models, found in ``models/catalog.py``, which are shared by all algorithms. More information on these classes can be found in the `developer API docs <http://ray.readthedocs.io/en/latest/rllib.html#the-developer-api>`__.
+2 -1
View File
@@ -35,7 +35,8 @@ class Agent(Trainable):
env_creator (func): Function that creates a new training env.
config (obj): Algorithm-specific configuration data.
logdir (str): Directory in which training outputs should be placed.
registry (obj): Object registry.
registry (obj): Tune object registry, for registering user-defined
classes and objects by name.
"""
_allow_unknown_configs = False
+1 -1
View File
@@ -118,7 +118,7 @@ class TrialRunner(object):
self._committed_resources.gpu,
self._avail_resources.gpu))
for local_dir in sorted(set([t.local_dir for t in self._trials])):
messages.append("Tensorboard logdir: {}".format(local_dir))
messages.append("Result logdir: {}".format(local_dir))
for t in self._trials:
if t.local_dir == local_dir:
messages.append(