[rllib] Split docs into user and development guide (#1377)

* docs * Update README.rst * Sat Dec 30 15:23:49 PST 2017 * comments * Sun Dec 31 23:33:30 PST 2017 * Sun Dec 31 23:33:38 PST 2017 * Sun Dec 31 23:37:46 PST 2017 * Sun Dec 31 23:39:28 PST 2017 * Sun Dec 31 23:43:05 PST 2017 * Sun Dec 31 23:51:55 PST 2017 * Sun Dec 31 23:52:51 PST 2017
2026-07-05 04:48:19 +08:00 · 2018-01-01 11:10:44 -08:00
parent b6c42f96be
commit 6e6674a824
9 changed files with 181 additions and 116 deletions
@@ -1,23 +1,22 @@
-Ray RLlib: A Composable and Scalable Reinforcement Learning Library
-===================================================================
+Ray RLlib: A Scalable Reinforcement Learning Library
+====================================================

-This README provides a brief technical overview of RLlib. See also the `user documentation <http://ray.readthedocs.io/en/latest/rllib.html>`__ and `NIPS symposium paper <https://drive.google.com/open?id=1lDMOFLMUQXn8qGtuahOBUwjmFb2iASxu>`__.
+This README provides a brief technical overview of RLlib. See also the `user documentation <http://ray.readthedocs.io/en/latest/rllib.html>`__ and `NIPS symposium paper <https://arxiv.org/abs/1712.09381>`__.

 RLlib currently provides the following algorithms:

-  `Proximal Policy Optimization <https://arxiv.org/abs/1707.06347>`__ which
+-  `Proximal Policy Optimization (PPO) <https://arxiv.org/abs/1707.06347>`__ which
   is a proximal variant of `TRPO <https://arxiv.org/abs/1502.05477>`__.

-  Evolution Strategies which is decribed in `this
+-  `The Asynchronous Advantage Actor-Critic (A3C) <https://arxiv.org/abs/1602.01783>`__.
+
+- `Deep Q Networks (DQN) <https://arxiv.org/abs/1312.5602>`__.
+
+-  Evolution Strategies, as described in `this
   paper <https://arxiv.org/abs/1703.03864>`__. Our implementation
-   borrows code from
+   is adapted from
   `here <https://github.com/openai/evolution-strategies-starter>`__.

-  `The Asynchronous Advantage Actor-Critic <https://arxiv.org/abs/1602.01783>`__
-   based on `the OpenAI starter agent <https://github.com/openai/universe-starter-agent>`__.
-
- `Deep Q Network (DQN) <https://arxiv.org/abs/1312.5602>`__.
-
 These algorithms can be run on any OpenAI Gym MDP, including custom ones written and registered by the user.


@@ -51,4 +50,4 @@ These are the currently available optimizers:
 Common utilities
 ----------------

-RLlib defines common action distributions, preprocessors, and neural network models, found in ``models/catalog.py``, which are shared by all algorithms. More information on these classes can be found in the `developer API docs <http://ray.readthedocs.io/en/latest/rllib.html#the-developer-api>`__.
+RLlib defines common action distributions, preprocessors, and neural network models, found in ``models/catalog.py``, which are shared by all algorithms. More information on these classes can be found in the `RLlib Developer Guide <http://ray.readthedocs.io/en/latest/rllib-dev.html>`__.
@@ -21,7 +21,7 @@ class ActionDistribution(object):
        raise NotImplementedError

    def kl(self, other):
-        """The KL-divergene between two action distributions."""
+        """The KL-divergence between two action distributions."""
        raise NotImplementedError

    def entropy(self):
@@ -36,7 +36,17 @@ MODEL_CONFIGS = [


 class ModelCatalog(object):
-    """Registry of default models and action distributions for envs."""
+    """Registry of models, preprocessors, and action distributions for envs.
+
+    Examples:
+        >>> prep = ModelCatalog.get_preprocessor(env)
+        >>> observation = prep.transform(raw_observation)
+
+        >>> dist_cls, dist_dim = ModelCatalog.get_action_dist(env.action_space)
+        >>> model = ModelCatalog.get_model(registry, inputs, dist_dim)
+        >>> dist = dist_cls(model.outputs)
+        >>> action = dist.sample()
+    """

    ATARI_OBS_SHAPE = (210, 160, 3)
    ATARI_RAM_OBS_SHAPE = (128,)
@@ -1,7 +1,7 @@
-Ray.tune: Efficient distributed hyperparameter search
-=====================================================
+Ray.tune: Hyperparameter Optimization Framework
+===============================================

-Ray.tune is a hyperparameter tuning tool for long-running tasks such as RL and deep learning training.
+Ray.tune is a hyperparameter tuning framework for long-running tasks such as RL and deep learning training.

 User documentation can be `found here <http://ray.readthedocs.io/en/latest/tune.html>`__.