[rllib] Split docs into user and development guide (#1377)

* docs

* Update README.rst

* Sat Dec 30 15:23:49 PST 2017

* comments

* Sun Dec 31 23:33:30 PST 2017

* Sun Dec 31 23:33:38 PST 2017

* Sun Dec 31 23:37:46 PST 2017

* Sun Dec 31 23:39:28 PST 2017

* Sun Dec 31 23:43:05 PST 2017

* Sun Dec 31 23:51:55 PST 2017

* Sun Dec 31 23:52:51 PST 2017
This commit is contained in:
Eric Liang
2018-01-01 11:10:44 -08:00
committed by GitHub
parent b6c42f96be
commit 6e6674a824
9 changed files with 181 additions and 116 deletions
+11 -12
View File
@@ -1,23 +1,22 @@
Ray RLlib: A Composable and Scalable Reinforcement Learning Library
===================================================================
Ray RLlib: A Scalable Reinforcement Learning Library
====================================================
This README provides a brief technical overview of RLlib. See also the `user documentation <http://ray.readthedocs.io/en/latest/rllib.html>`__ and `NIPS symposium paper <https://drive.google.com/open?id=1lDMOFLMUQXn8qGtuahOBUwjmFb2iASxu>`__.
This README provides a brief technical overview of RLlib. See also the `user documentation <http://ray.readthedocs.io/en/latest/rllib.html>`__ and `NIPS symposium paper <https://arxiv.org/abs/1712.09381>`__.
RLlib currently provides the following algorithms:
- `Proximal Policy Optimization <https://arxiv.org/abs/1707.06347>`__ which
- `Proximal Policy Optimization (PPO) <https://arxiv.org/abs/1707.06347>`__ which
is a proximal variant of `TRPO <https://arxiv.org/abs/1502.05477>`__.
- Evolution Strategies which is decribed in `this
- `The Asynchronous Advantage Actor-Critic (A3C) <https://arxiv.org/abs/1602.01783>`__.
- `Deep Q Networks (DQN) <https://arxiv.org/abs/1312.5602>`__.
- Evolution Strategies, as described in `this
paper <https://arxiv.org/abs/1703.03864>`__. Our implementation
borrows code from
is adapted from
`here <https://github.com/openai/evolution-strategies-starter>`__.
- `The Asynchronous Advantage Actor-Critic <https://arxiv.org/abs/1602.01783>`__
based on `the OpenAI starter agent <https://github.com/openai/universe-starter-agent>`__.
- `Deep Q Network (DQN) <https://arxiv.org/abs/1312.5602>`__.
These algorithms can be run on any OpenAI Gym MDP, including custom ones written and registered by the user.
@@ -51,4 +50,4 @@ These are the currently available optimizers:
Common utilities
----------------
RLlib defines common action distributions, preprocessors, and neural network models, found in ``models/catalog.py``, which are shared by all algorithms. More information on these classes can be found in the `developer API docs <http://ray.readthedocs.io/en/latest/rllib.html#the-developer-api>`__.
RLlib defines common action distributions, preprocessors, and neural network models, found in ``models/catalog.py``, which are shared by all algorithms. More information on these classes can be found in the `RLlib Developer Guide <http://ray.readthedocs.io/en/latest/rllib-dev.html>`__.
+1 -1
View File
@@ -21,7 +21,7 @@ class ActionDistribution(object):
raise NotImplementedError
def kl(self, other):
"""The KL-divergene between two action distributions."""
"""The KL-divergence between two action distributions."""
raise NotImplementedError
def entropy(self):
+11 -1
View File
@@ -36,7 +36,17 @@ MODEL_CONFIGS = [
class ModelCatalog(object):
"""Registry of default models and action distributions for envs."""
"""Registry of models, preprocessors, and action distributions for envs.
Examples:
>>> prep = ModelCatalog.get_preprocessor(env)
>>> observation = prep.transform(raw_observation)
>>> dist_cls, dist_dim = ModelCatalog.get_action_dist(env.action_space)
>>> model = ModelCatalog.get_model(registry, inputs, dist_dim)
>>> dist = dist_cls(model.outputs)
>>> action = dist.sample()
"""
ATARI_OBS_SHAPE = (210, 160, 3)
ATARI_RAM_OBS_SHAPE = (128,)
+3 -3
View File
@@ -1,7 +1,7 @@
Ray.tune: Efficient distributed hyperparameter search
=====================================================
Ray.tune: Hyperparameter Optimization Framework
===============================================
Ray.tune is a hyperparameter tuning tool for long-running tasks such as RL and deep learning training.
Ray.tune is a hyperparameter tuning framework for long-running tasks such as RL and deep learning training.
User documentation can be `found here <http://ray.readthedocs.io/en/latest/tune.html>`__.