mirror of
https://github.com/wassname/ray.git
synced 2026-07-05 04:48:19 +08:00
[rllib] Split docs into user and development guide (#1377)
* docs * Update README.rst * Sat Dec 30 15:23:49 PST 2017 * comments * Sun Dec 31 23:33:30 PST 2017 * Sun Dec 31 23:33:38 PST 2017 * Sun Dec 31 23:37:46 PST 2017 * Sun Dec 31 23:39:28 PST 2017 * Sun Dec 31 23:43:05 PST 2017 * Sun Dec 31 23:51:55 PST 2017 * Sun Dec 31 23:52:51 PST 2017
This commit is contained in:
+11
-12
@@ -1,23 +1,22 @@
|
||||
Ray RLlib: A Composable and Scalable Reinforcement Learning Library
|
||||
===================================================================
|
||||
Ray RLlib: A Scalable Reinforcement Learning Library
|
||||
====================================================
|
||||
|
||||
This README provides a brief technical overview of RLlib. See also the `user documentation <http://ray.readthedocs.io/en/latest/rllib.html>`__ and `NIPS symposium paper <https://drive.google.com/open?id=1lDMOFLMUQXn8qGtuahOBUwjmFb2iASxu>`__.
|
||||
This README provides a brief technical overview of RLlib. See also the `user documentation <http://ray.readthedocs.io/en/latest/rllib.html>`__ and `NIPS symposium paper <https://arxiv.org/abs/1712.09381>`__.
|
||||
|
||||
RLlib currently provides the following algorithms:
|
||||
|
||||
- `Proximal Policy Optimization <https://arxiv.org/abs/1707.06347>`__ which
|
||||
- `Proximal Policy Optimization (PPO) <https://arxiv.org/abs/1707.06347>`__ which
|
||||
is a proximal variant of `TRPO <https://arxiv.org/abs/1502.05477>`__.
|
||||
|
||||
- Evolution Strategies which is decribed in `this
|
||||
- `The Asynchronous Advantage Actor-Critic (A3C) <https://arxiv.org/abs/1602.01783>`__.
|
||||
|
||||
- `Deep Q Networks (DQN) <https://arxiv.org/abs/1312.5602>`__.
|
||||
|
||||
- Evolution Strategies, as described in `this
|
||||
paper <https://arxiv.org/abs/1703.03864>`__. Our implementation
|
||||
borrows code from
|
||||
is adapted from
|
||||
`here <https://github.com/openai/evolution-strategies-starter>`__.
|
||||
|
||||
- `The Asynchronous Advantage Actor-Critic <https://arxiv.org/abs/1602.01783>`__
|
||||
based on `the OpenAI starter agent <https://github.com/openai/universe-starter-agent>`__.
|
||||
|
||||
- `Deep Q Network (DQN) <https://arxiv.org/abs/1312.5602>`__.
|
||||
|
||||
These algorithms can be run on any OpenAI Gym MDP, including custom ones written and registered by the user.
|
||||
|
||||
|
||||
@@ -51,4 +50,4 @@ These are the currently available optimizers:
|
||||
Common utilities
|
||||
----------------
|
||||
|
||||
RLlib defines common action distributions, preprocessors, and neural network models, found in ``models/catalog.py``, which are shared by all algorithms. More information on these classes can be found in the `developer API docs <http://ray.readthedocs.io/en/latest/rllib.html#the-developer-api>`__.
|
||||
RLlib defines common action distributions, preprocessors, and neural network models, found in ``models/catalog.py``, which are shared by all algorithms. More information on these classes can be found in the `RLlib Developer Guide <http://ray.readthedocs.io/en/latest/rllib-dev.html>`__.
|
||||
|
||||
@@ -21,7 +21,7 @@ class ActionDistribution(object):
|
||||
raise NotImplementedError
|
||||
|
||||
def kl(self, other):
|
||||
"""The KL-divergene between two action distributions."""
|
||||
"""The KL-divergence between two action distributions."""
|
||||
raise NotImplementedError
|
||||
|
||||
def entropy(self):
|
||||
|
||||
@@ -36,7 +36,17 @@ MODEL_CONFIGS = [
|
||||
|
||||
|
||||
class ModelCatalog(object):
|
||||
"""Registry of default models and action distributions for envs."""
|
||||
"""Registry of models, preprocessors, and action distributions for envs.
|
||||
|
||||
Examples:
|
||||
>>> prep = ModelCatalog.get_preprocessor(env)
|
||||
>>> observation = prep.transform(raw_observation)
|
||||
|
||||
>>> dist_cls, dist_dim = ModelCatalog.get_action_dist(env.action_space)
|
||||
>>> model = ModelCatalog.get_model(registry, inputs, dist_dim)
|
||||
>>> dist = dist_cls(model.outputs)
|
||||
>>> action = dist.sample()
|
||||
"""
|
||||
|
||||
ATARI_OBS_SHAPE = (210, 160, 3)
|
||||
ATARI_RAM_OBS_SHAPE = (128,)
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
Ray.tune: Efficient distributed hyperparameter search
|
||||
=====================================================
|
||||
Ray.tune: Hyperparameter Optimization Framework
|
||||
===============================================
|
||||
|
||||
Ray.tune is a hyperparameter tuning tool for long-running tasks such as RL and deep learning training.
|
||||
Ray.tune is a hyperparameter tuning framework for long-running tasks such as RL and deep learning training.
|
||||
|
||||
User documentation can be `found here <http://ray.readthedocs.io/en/latest/tune.html>`__.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user