ray/python/ray/rllib at 7c38f964b7269b7bfdf49fd079ebbd3416de13ab - ray

mirror of https://github.com/wassname/ray.git synced 2026-06-28 11:21:15 +08:00

Files

T

Eric Liang 7c38f964b7 [tune] Add command line support for choosing early stopping schedulers (#1209 )

* command line support

* add checkpoint freq

* fix other flags

* fix

* docs

* doc

2017-11-12 12:05:18 -08:00

a3c

[rllib] PyTorch Models for A3C (#1187 )

2017-11-12 00:20:33 -08:00

dqn

[rllib] Fix DQN inefficiency, and cleanup for different modes of parallelism (#1151 )

2017-10-29 10:52:30 -07:00

[rllib] Support discrete observation spaces such as FrozenLake-v0 (#1140 )

2017-10-23 23:16:52 -07:00

models

[rllib] PyTorch Models for A3C (#1187 )

2017-11-12 00:20:33 -08:00

ppo

[rllib] Support discrete observation spaces such as FrozenLake-v0 (#1140 )

2017-10-23 23:16:52 -07:00

test

[rllib] Support discrete observation spaces such as FrozenLake-v0 (#1140 )

2017-10-23 23:16:52 -07:00

tuned_examples

[tune] Improve the tune Python API and variant generation (#1154 )

2017-11-06 23:41:17 -08:00

__init__.py

[rllib] Move policy gradient and evolution strategies algorithms from examples/ to ray/rllib/ (#694 )

2017-06-25 22:13:03 +00:00

agent.py

[tune] Add command line support for choosing early stopping schedulers (#1209 )

2017-11-12 12:05:18 -08:00

parallel.py

fix (#1174 )

2017-11-01 13:45:39 -07:00

README.rst

[rllib] Rename algorithms (#890 )

2017-08-29 16:56:42 -07:00

train.py

[tune] Add command line support for choosing early stopping schedulers (#1209 )

2017-11-12 12:05:18 -08:00

README.rst

RLLib: Ray's modular and scalable reinforcement learning library
================================================================

Getting Started
---------------

You can run training with

::

    python train.py --env CartPole-v0 --alg PPO

The available algorithms are:

-  ``PPO`` is a proximal variant of
   `TRPO <https://arxiv.org/abs/1502.05477>`__.

-  ``ES`` is decribed in `this
   paper <https://arxiv.org/abs/1703.03864>`__. Our implementation
   borrows code from
   `here <https://github.com/openai/evolution-strategies-starter>`__.

-  ``DQN`` is an implementation of `Deep Q
   Networks <https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf>`__ based on
   `OpenAI baselines <https://github.com/openai/baselines>`__.

-  ``A3C`` is an implementation of
   `A3C <https://arxiv.org/abs/1602.01783>`__ based on `the OpenAI
   starter agent <https://github.com/openai/universe-starter-agent>`__.

Storing logs
------------

You can store the algorithm configuration (including hyperparameters) and
training results on a filesystem with the ``--upload-dir`` flag. Two protocols
are supported at the moment:

- ``--upload-dir file:///tmp/ray/`` will store the logs on the local filesystem
  in a subdirectory of /tmp/ray which is named after the algorithm name, the
  environment and the current date. This is the default.

- ``--upload-dir s3://bucketname/`` will store the logs in S3. Not that if you
  store the logs in S3, TensorFlow files will not currently be stored because
  TensorFlow doesn't support directly uploading files to S3 at the moment.

Querying logs with Athena
-------------------------

If you stored the logs in S3 or uploaded them there from the local file system,
they can be queried with Athena. First create tables containing the
experimental results with

.. code:: sql

    CREATE EXTERNAL TABLE IF NOT EXISTS experiments (
      experiment_id STRING,
      env_name STRING,
      alg STRING,
      -- result.json
      training_iteration INT,
      episode_reward_mean FLOAT,
      episode_len_mean FLOAT
    ) ROW FORMAT serde 'org.apache.hive.hcatalog.data.JsonSerDe'
    LOCATION 's3://bucketname/'

and then you can for example visualize the results with

.. code:: sql

    SELECT c.experiment_id, c.env_name, c.alg, a.episode_reward_mean, a.episode_len_mean
    FROM experiments a
    LEFT OUTER JOIN experiments b
        ON a.experiment_id = b.experiment_id AND a.training_iteration < b.training_iteration
    INNER JOIN experiments c
        ON a.experiment_id = c.experiment_id
    WHERE b.experiment_id IS NULL AND a.training_iteration IS NOT NULL AND c.alg is NOT NULL;

This query selects last iteration from each experiment (see `this
stackoverflow
post <https://stackoverflow.com/questions/7745609/sql-select-only-rows-with-max-value-on-a-column>`__).