From e3685fca5e8127f72da4a3f737db6b7aaa8d9ef6 Mon Sep 17 00:00:00 2001 From: Eric Liang Date: Sat, 17 Mar 2018 14:45:04 -0700 Subject: [PATCH] [rllib] remove redundant docs (#1728) * wip * more work * fix apex * docs * apex doc * pool comment * clean up * make wrap stack pluggable * Mon Mar 12 21:45:50 PDT 2018 * clean up comment * table * Mon Mar 12 22:51:57 PDT 2018 * Mon Mar 12 22:53:05 PDT 2018 * Mon Mar 12 22:55:03 PDT 2018 * Mon Mar 12 22:56:18 PDT 2018 * Mon Mar 12 22:59:54 PDT 2018 * Update apex_optimizer.py * Update index.rst * Update README.rst * Update README.rst * comments * Wed Mar 14 19:01:02 PDT 2018 * Fri Mar 16 15:44:27 PDT 2018 --- doc/source/policy-optimizers.rst | 2 +- doc/source/rllib.rst | 11 ++++----- python/ray/rllib/README.rst | 42 ++------------------------------ 3 files changed, 8 insertions(+), 47 deletions(-) diff --git a/doc/source/policy-optimizers.rst b/doc/source/policy-optimizers.rst index c3ec1c7f8..3a3c60bf2 100644 --- a/doc/source/policy-optimizers.rst +++ b/doc/source/policy-optimizers.rst @@ -1,7 +1,7 @@ Policy Optimizers ================= -RLlib supports using its distributed policy optimizer implementations from external algorithms. +RLlib supports using its policy optimizer implementations from external algorithms. Example of constructing and using a policy optimizer `(link to full example) `__: diff --git a/doc/source/rllib.rst b/doc/source/rllib.rst index fc5ea7c0a..e76ee354b 100644 --- a/doc/source/rllib.rst +++ b/doc/source/rllib.rst @@ -1,15 +1,10 @@ Ray RLlib: Scalable Reinforcement Learning ========================================== -Ray RLlib is an RL execution toolkit built on the Ray distributed execution framework. RLlib implements a collection of distributed *policy optimizers* that make it easy to use a variety of training strategies with existing RL algorithms written in frameworks such as PyTorch, TensorFlow, and Theano. This enables complex architectures for RL training (e.g., Ape-X, IMPALA), to be implemented once and reused many times across different RL algorithms and libraries. +Ray RLlib is an RL execution toolkit built on the Ray distributed execution framework. RLlib implements a collection of distributed *policy optimizers* that make it easy to use a variety of training strategies with existing RL algorithms written in frameworks such as PyTorch, TensorFlow, and Theano. You can find the code for RLlib `here on GitHub `__, and the paper `here `__. -.. note:: - - To use RLlib's policy optimizers outside of RLlib, see the `RLlib policy optimizers documentation `__. - - RLlib's policy optimizers serve as the basis for RLlib's reference algorithms, which include: - `Proximal Policy Optimization (PPO) `__ which @@ -29,6 +24,10 @@ RLlib's policy optimizers serve as the basis for RLlib's reference algorithms, w These algorithms can be run on any `OpenAI Gym MDP `__, including custom ones written and registered by the user. +.. note:: + + To use RLlib's policy optimizers outside of RLlib, see the `policy optimizers documentation `__. + Installation ------------ diff --git a/python/ray/rllib/README.rst b/python/ray/rllib/README.rst index e9ac3e2ea..29b31e625 100644 --- a/python/ray/rllib/README.rst +++ b/python/ray/rllib/README.rst @@ -1,11 +1,9 @@ Ray RLlib: Scalable Reinforcement Learning ========================================== -This README provides a brief technical overview of RLlib. See also the `user documentation `__ and `paper `__. +Ray RLlib is an RL execution toolkit built on the Ray distributed execution framework. See the `user documentation `__ and `paper `__. -Ray RLlib is an RL execution toolkit built on the Ray distributed execution framework. RLlib implements a collection of distributed *policy optimizers* that make it easy to use a variety of training strategies with existing RL algorithms written in frameworks such as PyTorch, TensorFlow, and Theano. This enables complex architectures for RL training (e.g., Ape-X, IMPALA), to be implemented *once* and reused many times across different RL algorithms and libraries. - -RLlib's policy optimizers serve as the basis for RLlib's reference algorithms, which include: +RLlib includes the following reference algorithms: - `Proximal Policy Optimization (PPO) `__ which is a proximal variant of `TRPO `__. @@ -22,39 +20,3 @@ RLlib's policy optimizers serve as the basis for RLlib's reference algorithms, w `here `__. These algorithms can be run on any OpenAI Gym MDP, including custom ones written and registered by the user. - -RLlib's distributed policy optimizers can also be used by any existing algorithm or RL library that implements the policy evaluator interface (optimizers/policy_evaluator.py). - - -Training API ------------- - -All RLlib algorithms implement a common training API (agent.py), which enables multiple algorithms to be easily evaluated: - -:: - - # Train a model on a single environment - python train.py --env CartPole-v0 --run PPO - - # Integration with ray.tune for hyperparam evaluation - python train.py -f tuned_examples/cartpole-grid-search-example.yaml - -Policy Optimizer abstraction ----------------------------- - -RLlib's gradient-based algorithms are composed using two abstractions: policy evaluators (optimizers/policy_evaluator.py) and policy optimizers (optimizers/policy_optimizer.py). Policy optimizers serve as the "control plane" of algorithms and implement a particular distributed optimization strategy for RL. Evaluators implement the algorithm "data plane" and encapsulate the model graph. Once an evaluator for an algorithm is implemented, it is compatible with any policy optimizer. - -This pluggability enables complex architectures for distributed training to be defined _once_ and reused many times across different algorithms and RL libraries. - -These are the currently available optimizers: - -- ``AsyncOptimizer`` is an asynchronous RL optimizer, i.e. like A3C. It asynchronously pulls and applies gradients from evaluators, sending updated weights back as needed. -- ``LocalSyncOptimizer`` is a simple synchronous RL optimizer. It pulls samples from remote evaluators, concatenates them, and then updates a local model. The updated model weights are then broadcast to all remote evalutaors. -- ``LocalSyncReplayOptimizer`` adds experience replay to LocalSyncOptimizer (e.g., for DQNs). -- ``LocalMultiGPUOptimizer`` This optimizer performs SGD over a number of local GPUs, and pins experience data in GPU memory to amortize the copy overhead for multiple SGD passes. -- ``ApexOptimizer`` This implements the distributed experience replay algorithm for DQN and DDPG and is designed to run in a cluster setting. - -Common utilities ----------------- - -RLlib defines common action distributions, preprocessors, and neural network models, found in ``models/catalog.py``, which are shared by all algorithms. More information on these classes can be found in the `RLlib Developer Guide `__.