mirror of
https://github.com/wassname/ray.git
synced 2026-06-28 02:01:24 +08:00
[rllib] remove redundant docs (#1728)
* wip * more work * fix apex * docs * apex doc * pool comment * clean up * make wrap stack pluggable * Mon Mar 12 21:45:50 PDT 2018 * clean up comment * table * Mon Mar 12 22:51:57 PDT 2018 * Mon Mar 12 22:53:05 PDT 2018 * Mon Mar 12 22:55:03 PDT 2018 * Mon Mar 12 22:56:18 PDT 2018 * Mon Mar 12 22:59:54 PDT 2018 * Update apex_optimizer.py * Update index.rst * Update README.rst * Update README.rst * comments * Wed Mar 14 19:01:02 PDT 2018 * Fri Mar 16 15:44:27 PDT 2018
This commit is contained in:
@@ -1,11 +1,9 @@
|
||||
Ray RLlib: Scalable Reinforcement Learning
|
||||
==========================================
|
||||
|
||||
This README provides a brief technical overview of RLlib. See also the `user documentation <http://ray.readthedocs.io/en/latest/rllib.html>`__ and `paper <https://arxiv.org/abs/1712.09381>`__.
|
||||
Ray RLlib is an RL execution toolkit built on the Ray distributed execution framework. See the `user documentation <http://ray.readthedocs.io/en/latest/rllib.html>`__ and `paper <https://arxiv.org/abs/1712.09381>`__.
|
||||
|
||||
Ray RLlib is an RL execution toolkit built on the Ray distributed execution framework. RLlib implements a collection of distributed *policy optimizers* that make it easy to use a variety of training strategies with existing RL algorithms written in frameworks such as PyTorch, TensorFlow, and Theano. This enables complex architectures for RL training (e.g., Ape-X, IMPALA), to be implemented *once* and reused many times across different RL algorithms and libraries.
|
||||
|
||||
RLlib's policy optimizers serve as the basis for RLlib's reference algorithms, which include:
|
||||
RLlib includes the following reference algorithms:
|
||||
|
||||
- `Proximal Policy Optimization (PPO) <https://arxiv.org/abs/1707.06347>`__ which
|
||||
is a proximal variant of `TRPO <https://arxiv.org/abs/1502.05477>`__.
|
||||
@@ -22,39 +20,3 @@ RLlib's policy optimizers serve as the basis for RLlib's reference algorithms, w
|
||||
`here <https://github.com/openai/evolution-strategies-starter>`__.
|
||||
|
||||
These algorithms can be run on any OpenAI Gym MDP, including custom ones written and registered by the user.
|
||||
|
||||
RLlib's distributed policy optimizers can also be used by any existing algorithm or RL library that implements the policy evaluator interface (optimizers/policy_evaluator.py).
|
||||
|
||||
|
||||
Training API
|
||||
------------
|
||||
|
||||
All RLlib algorithms implement a common training API (agent.py), which enables multiple algorithms to be easily evaluated:
|
||||
|
||||
::
|
||||
|
||||
# Train a model on a single environment
|
||||
python train.py --env CartPole-v0 --run PPO
|
||||
|
||||
# Integration with ray.tune for hyperparam evaluation
|
||||
python train.py -f tuned_examples/cartpole-grid-search-example.yaml
|
||||
|
||||
Policy Optimizer abstraction
|
||||
----------------------------
|
||||
|
||||
RLlib's gradient-based algorithms are composed using two abstractions: policy evaluators (optimizers/policy_evaluator.py) and policy optimizers (optimizers/policy_optimizer.py). Policy optimizers serve as the "control plane" of algorithms and implement a particular distributed optimization strategy for RL. Evaluators implement the algorithm "data plane" and encapsulate the model graph. Once an evaluator for an algorithm is implemented, it is compatible with any policy optimizer.
|
||||
|
||||
This pluggability enables complex architectures for distributed training to be defined _once_ and reused many times across different algorithms and RL libraries.
|
||||
|
||||
These are the currently available optimizers:
|
||||
|
||||
- ``AsyncOptimizer`` is an asynchronous RL optimizer, i.e. like A3C. It asynchronously pulls and applies gradients from evaluators, sending updated weights back as needed.
|
||||
- ``LocalSyncOptimizer`` is a simple synchronous RL optimizer. It pulls samples from remote evaluators, concatenates them, and then updates a local model. The updated model weights are then broadcast to all remote evalutaors.
|
||||
- ``LocalSyncReplayOptimizer`` adds experience replay to LocalSyncOptimizer (e.g., for DQNs).
|
||||
- ``LocalMultiGPUOptimizer`` This optimizer performs SGD over a number of local GPUs, and pins experience data in GPU memory to amortize the copy overhead for multiple SGD passes.
|
||||
- ``ApexOptimizer`` This implements the distributed experience replay algorithm for DQN and DDPG and is designed to run in a cluster setting.
|
||||
|
||||
Common utilities
|
||||
----------------
|
||||
|
||||
RLlib defines common action distributions, preprocessors, and neural network models, found in ``models/catalog.py``, which are shared by all algorithms. More information on these classes can be found in the `RLlib Developer Guide <http://ray.readthedocs.io/en/latest/rllib-dev.html>`__.
|
||||
|
||||
Reference in New Issue
Block a user