diff --git a/doc/source/example-hyperopt.rst b/doc/source/example-hyperopt.rst deleted file mode 100644 index ff32bffa0..000000000 --- a/doc/source/example-hyperopt.rst +++ /dev/null @@ -1,245 +0,0 @@ -Hyperparameter Optimization -=========================== - -This document provides a walkthrough of the hyperparameter optimization example. - -.. note:: - - To learn about Ray's built-in hyperparameter optimization framework, see `Ray Tune `__. - -To run the application, first install some dependencies. - -.. code-block:: bash - - pip install tensorflow - -You can view the `code for this example`_. - -.. _`code for this example`: https://github.com/ray-project/ray/tree/master/examples/hyperopt - -The simple script that processes results as they become available and launches -new experiments can be run as follows. - -.. code-block:: bash - - python ray/examples/hyperopt/hyperopt_simple.py --trials=5 --steps=10 - -The variant that divides training into multiple segments and aggressively -terminates poorly performing models can be run as follows. - -.. code-block:: bash - - python ray/examples/hyperopt/hyperopt_adaptive.py --num-starting-segments=5 \ - --num-segments=10 \ - --steps-per-segment=20 - -Machine learning algorithms often have a number of *hyperparameters* whose -values must be chosen by the practitioner. For example, an optimization -algorithm may have a step size, a decay rate, and a regularization coefficient. -In a deep network, the network parameterization itself (e.g., the number of -layers and the number of units per layer) can be considered a hyperparameter. - -Choosing these parameters can be challenging, and so a common practice is to -search over the space of hyperparameters. One approach that works surprisingly -well is to randomly sample different options. - -Problem Setup -------------- - -Suppose that we want to train a convolutional network, but we aren't sure how to -choose the following hyperparameters: - -- the learning rate -- the batch size -- the dropout probability -- the standard deviation of the distribution from which to initialize the - network weights - -Suppose that we've defined a remote function ``train_cnn_and_compute_accuracy``, -which takes values for these hyperparameters as its input (along with the -dataset), trains a convolutional network using those hyperparameters, and -returns the accuracy of the trained model on a validation set. - -.. code-block:: python - - import numpy as np - import ray - - @ray.remote - def train_cnn_and_compute_accuracy(hyperparameters, - train_images, - train_labels, - validation_images, - validation_labels): - # Construct a deep network, train it, and return the accuracy on the - # validation data. - return np.random.uniform(0, 1) - -Basic random search -------------------- - -Something that works surprisingly well is to try random values for the -hyperparameters. For example, we can write a function that randomly generates -hyperparameter configurations. - -.. code-block:: python - - def generate_hyperparameters(): - # Randomly choose values for the hyperparameters. - return {"learning_rate": 10 ** np.random.uniform(-5, 5), - "batch_size": np.random.randint(1, 100), - "dropout": np.random.uniform(0, 1), - "stddev": 10 ** np.random.uniform(-5, 5)} - -In addition, let's assume that we've started Ray and loaded some data. - -.. code-block:: python - - import ray - - ray.init() - - from tensorflow.examples.tutorials.mnist import input_data - mnist = input_data.read_data_sets("MNIST_data", one_hot=True) - train_images = ray.put(mnist.train.images) - train_labels = ray.put(mnist.train.labels) - validation_images = ray.put(mnist.validation.images) - validation_labels = ray.put(mnist.validation.labels) - - -Then basic random hyperparameter search looks something like this. We launch a -bunch of experiments, and we get the results. - -.. code-block:: python - - # Generate a bunch of hyperparameter configurations. - hyperparameter_configurations = [generate_hyperparameters() for _ in range(20)] - - # Launch some experiments. - results = [] - for hyperparameters in hyperparameter_configurations: - results.append(train_cnn_and_compute_accuracy.remote(hyperparameters, - train_images, - train_labels, - validation_images, - validation_labels)) - - # Get the results. - accuracies = ray.get(results) - -Then we can inspect the contents of `accuracies` and see which set of -hyperparameters worked the best. Note that in the above example, the for loop -will run instantaneously and the program will block in the call to ``ray.get``, -which will wait until all of the experiments have finished. - -Processing results as they become available -------------------------------------------- - -One problem with the above approach is that you have to wait for all of the -experiments to finish before you can process the results. Instead, you may want -to process the results as they become available, perhaps in order to adaptively -choose new experiments to run, or perhaps simply so you know how well the -experiments are doing. To process the results as they become available, we can -use the ``ray.wait`` primitive. - -The most simple usage is the following. This example is implemented in more -detail in driver.py_. - -.. code-block:: python - - # Launch some experiments. - remaining_ids = [] - for hyperparameters in hyperparameter_configurations: - remaining_ids.append(train_cnn_and_compute_accuracy.remote(hyperparameters, - train_images, - train_labels, - validation_images, - validation_labels)) - - # Whenever a new experiment finishes, print the value and start a new - # experiment. - for i in range(100): - ready_ids, remaining_ids = ray.wait(remaining_ids, num_returns=1) - accuracy = ray.get(ready_ids[0]) - print("Accuracy is {}".format(accuracy)) - # Start a new experiment. - new_hyperparameters = generate_hyperparameters() - remaining_ids.append(train_cnn_and_compute_accuracy.remote(new_hyperparameters, - train_images, - train_labels, - validation_images, - validation_labels)) - -.. _driver.py: https://github.com/ray-project/ray/blob/master/examples/hyperopt/driver.py - -More sophisticated hyperparameter search ----------------------------------------- - -Hyperparameter search algorithms can get much more sophisticated. So far, we've -been treating the function ``train_cnn_and_compute_accuracy`` as a black box, -that we can choose its inputs and inspect its outputs, but once we decide to run -it, we have to run it until it finishes. - -However, there is often more structure to be exploited. For example, if the -training procedure is going poorly, we can end the session early and invest more -resources in the more promising hyperparameter experiments. And if we've saved -the state of the training procedure, we can always restart it again later. - -This is one of the ideas of the Hyperband_ algorithm. Start with a huge number -of hyperparameter configurations, aggressively stop the bad ones, and invest -more resources in the promising experiments. - -To implement this, we can first adapt our training method to optionally take a -model and to return the updated model. - -.. code-block:: python - - @ray.remote - def train_cnn_and_compute_accuracy(hyperparameters, model=None): - # Construct a deep network, train it, and return the accuracy on the - # validation data as well as the latest version of the model. If the model - # argument is not None, this will continue training an existing model. - validation_accuracy = np.random.uniform(0, 1) - new_model = model - return validation_accuracy, new_model - -Here's a different variant that uses the same principles. Divide each training -session into a series of shorter training sessions. Whenever a short session -finishes, if it still looks promising, then continue running it. If it isn't -doing well, then terminate it and start a new experiment. - -.. code-block:: python - - import numpy as np - - def is_promising(model): - # Return true if the model is doing well and false otherwise. In practice, - # this function will want more information than just the model. - return np.random.choice([True, False]) - - # Start 10 experiments. - remaining_ids = [] - for _ in range(10): - experiment_id = train_cnn_and_compute_accuracy.remote(hyperparameters, model=None) - remaining_ids.append(experiment_id) - - accuracies = [] - for i in range(100): - # Whenever a segment of an experiment finishes, decide if it looks promising - # or not. - ready_ids, remaining_ids = ray.wait(remaining_ids, num_returns=1) - experiment_id = ready_ids[0] - current_accuracy, current_model = ray.get(experiment_id) - accuracies.append(current_accuracy) - - if is_promising(experiment_id): - # Continue running the experiment. - experiment_id = train_cnn_and_compute_accuracy.remote(hyperparameters, - model=current_model) - else: - # Start a new experiment. - experiment_id = train_cnn_and_compute_accuracy.remote(hyperparameters) - - remaining_ids.append(experiment_id) - -.. _Hyperband: https://arxiv.org/abs/1603.06560 diff --git a/doc/source/hyperband.rst b/doc/source/hyperband.rst new file mode 100644 index 000000000..1b3c7bf4d --- /dev/null +++ b/doc/source/hyperband.rst @@ -0,0 +1,99 @@ +HyperBand and Early Stopping +============================ + +Ray Tune includes distributed implementations of early stopping algorithms such as `Median Stopping Rule `__, `HyperBand `__, and an `asynchronous version of HyperBand `__. These algorithms are very resource efficient and can outperform Bayesian Optimization methods in `many cases `__. + +Asynchronous HyperBand +---------------------- + +The `asynchronous version of HyperBand `__ scheduler can be plugged in on top of an existing grid or random search. This can be done by setting the ``scheduler`` parameter of ``run_experiments``, e.g. + +.. code-block:: python + + run_experiments({...}, scheduler=AsyncHyperBandScheduler()) + +Compared to the original version of HyperBand, this implementation provides better parallelism and avoids straggler issues during eliminations. An example of this can be found in `async_hyperband_example.py `__. **We recommend using this over the standard HyperBand scheduler.** + +.. autoclass:: ray.tune.async_hyperband.AsyncHyperBandScheduler + +HyperBand +--------- + +.. note:: Note that the HyperBand scheduler requires your trainable to support checkpointing, which is described in `Ray Tune documentation `__. Checkpointing enables the scheduler to multiplex many concurrent trials onto a limited size cluster. + +Ray Tune also implements the `standard version of HyperBand `__. You can use it as such: + +.. code-block:: python + + run_experiments({...}, scheduler=HyperBandScheduler()) + +An example of this can be found in `hyperband_example.py `__. The progress of one such HyperBand run is shown below. + + +:: + + == Status == + Using HyperBand: num_stopped=0 total_brackets=5 + Round #0: + Bracket(n=5, r=100, completed=80%): {'PAUSED': 4, 'PENDING': 1} + Bracket(n=8, r=33, completed=23%): {'PAUSED': 4, 'PENDING': 4} + Bracket(n=15, r=11, completed=4%): {'RUNNING': 2, 'PAUSED': 2, 'PENDING': 11} + Bracket(n=34, r=3, completed=0%): {'RUNNING': 2, 'PENDING': 32} + Bracket(n=81, r=1, completed=0%): {'PENDING': 38} + Resources used: 4/4 CPUs, 0/0 GPUs + Result logdir: ~/ray_results/hyperband_test + PAUSED trials: + - my_class_0_height=99,width=43: PAUSED [pid=11664], 0 s, 100 ts, 97.1 rew + - my_class_11_height=85,width=81: PAUSED [pid=11771], 0 s, 33 ts, 32.8 rew + - my_class_12_height=0,width=52: PAUSED [pid=11785], 0 s, 33 ts, 0 rew + - my_class_19_height=44,width=88: PAUSED [pid=11811], 0 s, 11 ts, 5.47 rew + - my_class_27_height=96,width=84: PAUSED [pid=11840], 0 s, 11 ts, 12.5 rew + ... 5 more not shown + PENDING trials: + - my_class_10_height=12,width=25: PENDING + - my_class_13_height=90,width=45: PENDING + - my_class_14_height=69,width=45: PENDING + - my_class_15_height=41,width=11: PENDING + - my_class_16_height=57,width=69: PENDING + ... 81 more not shown + RUNNING trials: + - my_class_23_height=75,width=51: RUNNING [pid=11843], 0 s, 1 ts, 1.47 rew + - my_class_26_height=16,width=48: RUNNING + - my_class_31_height=40,width=10: RUNNING + - my_class_53_height=28,width=96: RUNNING + +.. autoclass:: ray.tune.hyperband.HyperBandScheduler + + +HyperBand Implementation Details +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Implementation details may deviate slightly from theory but are focused on increasing usability. Note: ``R``, ``s_max``, and ``eta`` are parameters of HyperBand given by the paper. See `this post `_ for context. + +1. Both ``s_max`` (representing the ``number of brackets - 1``) and ``eta``, representing the downsampling rate, are fixed. In many practical settings, ``R``, which represents some resource unit and often the number of training iterations, can be set reasonably large, like ``R >= 200``. For simplicity, assume ``eta = 3``. Varying ``R`` between ``R = 200`` and ``R = 1000`` creates a huge range of the number of trials needed to fill up all brackets. + +.. image:: images/hyperband_bracket.png + +On the other hand, holding ``R`` constant at ``R = 300`` and varying ``eta`` also leads to HyperBand configurations that are not very intuitive: + +.. image:: images/hyperband_eta.png + +The implementation takes the same configuration as the example given in the paper and exposes ``max_t``, which is not a parameter in the paper. + +2. The example in the `post `_ to calculate ``n_0`` is actually a little different than the algorithm given in the paper. In this implementation, we implement ``n_0`` according to the paper (which is `n` in the below example): + +.. image:: images/hyperband_allocation.png + + +3. There are also implementation specific details like how trials are placed into brackets which are not covered in the paper. This implementation places trials within brackets according to smaller bracket first - meaning that with low number of trials, there will be less early stopping. + +Median Stopping Rule +-------------------- + +The Median Stopping Rule implements the simple strategy of stopping a trial if its performance falls below the median of other trials at similar points in time. You can set the ``scheduler`` parameter as such: + +.. code-block:: python + + run_experiments({...}, scheduler=MedianStoppingRule()) + +.. autoclass:: ray.tune.median_stopping_rule.MedianStoppingRule diff --git a/doc/source/images/hyperband_allocation.png b/doc/source/images/hyperband_allocation.png new file mode 100644 index 000000000..7df0de766 Binary files /dev/null and b/doc/source/images/hyperband_allocation.png differ diff --git a/doc/source/images/hyperband_bracket.png b/doc/source/images/hyperband_bracket.png new file mode 100644 index 000000000..0aac889b1 Binary files /dev/null and b/doc/source/images/hyperband_bracket.png differ diff --git a/doc/source/images/hyperband_eta.png b/doc/source/images/hyperband_eta.png new file mode 100644 index 000000000..1353cc9c2 Binary files /dev/null and b/doc/source/images/hyperband_eta.png differ diff --git a/doc/source/index.rst b/doc/source/index.rst index b705a997e..ec5f30037 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -9,6 +9,31 @@ Ray *Ray is a flexible, high-performance distributed execution framework.* + +Ray is easy to install: ``pip install ray`` + +Example Use +----------- + ++------------------------------------------------+----------------------------------------------------+ +| **Basic Python** | **Distributed with Ray** | ++------------------------------------------------+----------------------------------------------------+ +|.. code-block:: python |.. code-block:: python | +| | | +| # Execute f serially. | # Execute f in parallel. | +| | | +| | @ray.remote | +| def f(): | def f(): | +| time.sleep(1) | time.sleep(1) | +| return 1 | return 1 | +| | | +| | | +| | ray.init() | +| results = [f() for i in range(4)] | results = ray.get([f.remote() for i in range(4)]) | ++------------------------------------------------+----------------------------------------------------+ + + + View the `codebase on GitHub`_. .. _`codebase on GitHub`: https://github.com/ray-project/ray @@ -21,27 +46,6 @@ Ray comes with libraries that accelerate deep learning and reinforcement learnin .. _`Ray Tune`: tune.html .. _`Ray RLlib`: rllib.html -Example Program ---------------- - -+------------------------------------------------+----------------------------------------------------+ -| **Basic Python** | **Distributed with Ray** | -+------------------------------------------------+----------------------------------------------------+ -|.. code:: python |.. code-block:: python | -| | | -| import time | import time | -| | import ray | -| | | -| | ray.init() | -| | | -| | @ray.remote | -| def f(): | def f(): | -| time.sleep(1) | time.sleep(1) | -| return 1 | return 1 | -| | | -| # Execute f serially. | # Execute f in parallel. | -| results = [f() for i in range(4)] | results = ray.get([f.remote() for i in range(4)]) | -+------------------------------------------------+----------------------------------------------------+ .. toctree:: :maxdepth: 1 @@ -60,16 +64,27 @@ Example Program api.rst actors.rst using-ray-with-gpus.rst + webui.rst + +.. toctree:: + :maxdepth: 1 + :caption: Ray Tune + tune.rst + hyperband.rst + pbt.rst + +.. toctree:: + :maxdepth: 1 + :caption: Ray RLlib + rllib.rst rllib-dev.rst - webui.rst .. toctree:: :maxdepth: 1 :caption: Examples - example-hyperopt.rst example-rl-pong.rst example-policy-gradient.rst example-parameter-server.rst diff --git a/doc/source/pbt.rst b/doc/source/pbt.rst new file mode 100644 index 000000000..217495645 --- /dev/null +++ b/doc/source/pbt.rst @@ -0,0 +1,33 @@ +Population Based Training +========================= + +Ray Tune includes a distributed implementation of `Population Based Training (PBT) `__. + + +PBT Scheduler +------------- + +Ray Tune's PBT scheduler can be plugged in on top of an existing grid or random search experiment. This can be enabled by setting the ``scheduler`` parameter of ``run_experiments``, e.g. + +.. code-block:: python + + run_experiments( + {...}, + scheduler=PopulationBasedTraining( + time_attr='time_total_s', + reward_attr='mean_accuracy', + perturbation_interval=600.0, + hyperparameter_mutations={ + "lr": [1e-3, 5e-4, 1e-4, 5e-5, 1e-5], + "alpha": lambda: random.uniform(0.0, 1.0), + ... + })) + +When the PBT scheduler is enabled, each trial variant is treated as a member of the population. Periodically, top-performing trials are checkpointed (this requires your Trainable to support `checkpointing `__). Low-performing trials clone the checkpoints of top performers and perturb the configurations in the hope of discovering an even better variation. + +You can run this `toy PBT example `__ to get an idea of how how PBT operates. When training in PBT mode, a single trial may see many different hyperparameters over its lifetime, which is recorded in its ``result.json`` file. The following figure generated by the example shows PBT discovering new hyperparams over the course of a single experiment: + +.. image:: pbt.png + +.. autoclass:: ray.tune.pbt.PopulationBasedTraining + diff --git a/doc/source/tune-api.svg b/doc/source/tune-api.svg new file mode 100644 index 000000000..9b0c0a3e6 --- /dev/null +++ b/doc/source/tune-api.svg @@ -0,0 +1,4 @@ + + + + diff --git a/doc/source/tune.rst b/doc/source/tune.rst index 0d50bd5d6..537c0aaf6 100644 --- a/doc/source/tune.rst +++ b/doc/source/tune.rst @@ -5,7 +5,7 @@ This document describes Ray Tune, a hyperparameter tuning framework for long-run It has the following features: -- Scalable implementations of search algorithms such as `Population Based Training (PBT) <#population-based-training>`__, `Median Stopping Rule `__, and `HyperBand `__. +- Scalable implementations of search algorithms such as `Population Based Training (PBT) `__, `Median Stopping Rule `__, and `HyperBand `__. - Integration with visualization tools such as `TensorBoard `__, `rllab's VisKit `__, and a `parallel coordinates visualization `__. @@ -19,6 +19,8 @@ You can find the code for Ray Tune `here on GitHub `__. Incremental results will be synced to local disk on the head node of the cluster and optionally uploaded to the specified ``upload_dir`` (e.g. S3 path). +Trial Schedulers +---------------- + +By default, Ray Tune schedules trials in serial order with the ``FIFOScheduler`` class. However, you can also specify a custom scheduling algorithm that can early stop trials, perturb parameters, or incorporate suggestions from an external service. Currently implemented trial schedulers include `Population Based Training (PBT) `__, `Median Stopping Rule `__, and `HyperBand `__. + Visualizing Results ------------------- @@ -135,72 +142,10 @@ By default, each random variable and grid search point is sampled once. To take For more information on variant generation, see `variant_generator.py `__. -Early Stopping --------------- - -To reduce costs, long-running trials can often be early stopped if their initial performance is not promising. Ray Tune allows early stopping algorithms to be plugged in on top of existing grid or random searches. This can be enabled by setting the ``scheduler`` parameter of ``run_experiments``, e.g. - -.. code-block:: python - - run_experiments({...}, scheduler=HyperBandScheduler()) - -An example of this can be found in `hyperband_example.py `__. The progress of one such HyperBand run is shown below. - -:: - - == Status == - Using HyperBand: num_stopped=0 total_brackets=5 - Round #0: - Bracket(n=5, r=100, completed=80%): {'PAUSED': 4, 'PENDING': 1} - Bracket(n=8, r=33, completed=23%): {'PAUSED': 4, 'PENDING': 4} - Bracket(n=15, r=11, completed=4%): {'RUNNING': 2, 'PAUSED': 2, 'PENDING': 11} - Bracket(n=34, r=3, completed=0%): {'RUNNING': 2, 'PENDING': 32} - Bracket(n=81, r=1, completed=0%): {'PENDING': 38} - Resources used: 4/4 CPUs, 0/0 GPUs - Result logdir: ~/ray_results/hyperband_test - PAUSED trials: - - my_class_0_height=99,width=43: PAUSED [pid=11664], 0 s, 100 ts, 97.1 rew - - my_class_11_height=85,width=81: PAUSED [pid=11771], 0 s, 33 ts, 32.8 rew - - my_class_12_height=0,width=52: PAUSED [pid=11785], 0 s, 33 ts, 0 rew - - my_class_19_height=44,width=88: PAUSED [pid=11811], 0 s, 11 ts, 5.47 rew - - my_class_27_height=96,width=84: PAUSED [pid=11840], 0 s, 11 ts, 12.5 rew - ... 5 more not shown - PENDING trials: - - my_class_10_height=12,width=25: PENDING - - my_class_13_height=90,width=45: PENDING - - my_class_14_height=69,width=45: PENDING - - my_class_15_height=41,width=11: PENDING - - my_class_16_height=57,width=69: PENDING - ... 81 more not shown - RUNNING trials: - - my_class_23_height=75,width=51: RUNNING [pid=11843], 0 s, 1 ts, 1.47 rew - - my_class_26_height=16,width=48: RUNNING - - my_class_31_height=40,width=10: RUNNING - - my_class_53_height=28,width=96: RUNNING - -Ray Tune also implements an `asynchronous version of HyperBand `__, providing better parallelism and avoids straggler issues during eliminations. An example of this can be found in `async_hyperband_example.py `__. We recommend using this over the vanilla HyperBand scheduler. - -.. note:: Some trial schedulers such as HyperBand and PBT require your Trainable to support checkpointing, which is described in the next section. Checkpointing enables the scheduler to multiplex many concurrent trials onto a limited size cluster. - -Currently we support the following early stopping algorithms, or you can write your own that implements the `TrialScheduler `__ interface. - -.. autoclass:: ray.tune.median_stopping_rule.MedianStoppingRule -.. autoclass:: ray.tune.hyperband.HyperBandScheduler -.. autoclass:: ray.tune.async_hyperband.AsyncHyperBandScheduler - -Population Based Training -------------------------- - -Ray Tune includes a distributed implementation of `Population Based Training (PBT) `__. PBT also requires your Trainable to support checkpointing. You can run this `toy PBT example `__ to get an idea of how how PBT operates. When training in PBT mode, the set of trial variations is treated as the population, so a single trial may see many different hyperparameters over its lifetime, which is recorded in the ``result.json`` file. The following figure generated by the example shows PBT discovering new hyperparams over the course of a single experiment: - -.. image:: pbt.png - -.. autoclass:: ray.tune.pbt.PopulationBasedTraining - Trial Checkpointing ------------------- -To enable checkpoint / resume, you must subclass ``Trainable`` and implement its ``_train``, ``_save``, and ``_restore`` abstract methods `(example) `__: Implementing this interface is required to support resource multiplexing in schedulers such as HyperBand and PBT. +To enable checkpointing, you must implement a Trainable class (Trainable functions are not checkpointable, since they never return control back to their caller). The easiest way to do this is to subclass the pre-defined ``Trainable`` class and implement its ``_train``, ``_save``, and ``_restore`` abstract methods `(example) `__: Implementing this interface is required to support resource multiplexing in schedulers such as HyperBand and PBT. For TensorFlow model training, this would look something like this `(full tensorflow example) `__: