diff --git a/doc/source/actors.rst b/doc/source/actors.rst index 989f5fb75..994497a93 100644 --- a/doc/source/actors.rst +++ b/doc/source/actors.rst @@ -1,3 +1,5 @@ +.. _actor-guide: + Using Actors ============ diff --git a/doc/source/tune-schedulers.rst b/doc/source/tune-schedulers.rst index 84916f0fb..fd91a632a 100644 --- a/doc/source/tune-schedulers.rst +++ b/doc/source/tune-schedulers.rst @@ -47,6 +47,8 @@ You can run this `toy PBT example ` in less than 10 lines of code. - * Supports any machine learning framework, including PyTorch, XGBoost, MXNet, and Keras. See :ref:`examples here `. + * Supports any machine learning framework, :ref:`including PyTorch, XGBoost, MXNet, and Keras`. * Natively `integrates with optimization libraries `_ such as `HyperOpt `_, `Bayesian Optimization `_, and `Facebook Ax `_. * Choose among `scalable algorithms `_ such as `Population Based Training (PBT)`_, `Vizier's Median Stopping Rule`_, `HyperBand/ASHA`_. * Visualize results with `TensorBoard `__. @@ -19,14 +19,7 @@ Tune is a Python library for experiment execution and hyperparameter tuning at a .. _`Vizier's Median Stopping Rule`: tune-schedulers.html#median-stopping-rule .. _`HyperBand/ASHA`: tune-schedulers.html#asynchronous-hyperband -.. important:: Join our `community slack `_ to discuss Ray! - -For more information, check out: - - * :ref:`Tune in 60 Seconds `: A quick overview of Tune and its key concepts. - * :ref:`Tune Guides and Examples `: Examples, Tutorials, and Guides for how to use Tune. - * `Code `__: GitHub repository for Tune. - +**Want to get started?** Head over to the :ref:`60 second Tune tutorial `. Quick Start ----------- @@ -57,14 +50,16 @@ If using TF2 and TensorBoard, Tune will also automatically generate TensorBoard :scale: 20% :align: center -Take a look at the :ref:`Distributed Experiments ` documentation for: - 1. Setting up distributed experiments on your local cluster - 2. Using AWS and GCP - 3. Spot instance usage/pre-emptible instances, and more. +.. tip:: Join the `Ray community slack `_ to discuss Ray Tune (and other Ray libraries)! -Talks and Blogs ---------------- +Guides/Materials +---------------- + +Here are some reference materials for Tune: + + * :ref:`Tune Tutorials, Guides, and Examples ` + * `Code `__: GitHub repository for Tune Below are some blog posts and talks about Tune: diff --git a/doc/source/tune/_tutorials/overview.rst b/doc/source/tune/_tutorials/overview.rst index a05eeaf54..f0fe3f099 100644 --- a/doc/source/tune/_tutorials/overview.rst +++ b/doc/source/tune/_tutorials/overview.rst @@ -16,9 +16,9 @@ Take a look at any of the below tutorials to get started with Tune.
.. customgalleryitem:: - :tooltip: A gentle 60 second tour of core Tune concepts. + :tooltip: Tune concepts in 60 seconds. :figure: /images/tune-workflow.png - :description: :doc:`A gentle 60 second tour of Tune ` + :description: :doc:`Tune concepts in 60 seconds ` .. customgalleryitem:: :tooltip: A simple Tune walkthrough. @@ -124,6 +124,7 @@ Tune Examples If any example is broken, or if you'd like to add an example to this page, feel free to raise an issue on our Github repository. +.. _tune-general-examples: General Examples ~~~~~~~~~~~~~~~~ diff --git a/doc/source/tune/_tutorials/tune-60-seconds.rst b/doc/source/tune/_tutorials/tune-60-seconds.rst index 28493bab1..2ade588de 100644 --- a/doc/source/tune/_tutorials/tune-60-seconds.rst +++ b/doc/source/tune/_tutorials/tune-60-seconds.rst @@ -9,156 +9,167 @@ Let's quickly walk through the key concepts you need to know to use Tune. In thi :local: :depth: 1 -Tune takes a user-defined Python function or class and evaluates it on a set of hyperparameter configurations. Each hyperparameter configuration evaluation is called a *trial*, and Tune runs multiple trials in parallel, leveraging Search Algorithms and Trial Schedulers to optimize your hyperparameters. - .. image:: /images/tune-workflow.png Trainables ---------- -To allow Tune to optimize your model, Tune will need to control your training process. This is done via the Trainable API. Each *trial* corresponds to one instance of a Trainable; Tune will create multiple instances of the Trainable. +Tune will optimize your training process using the :ref:`Trainable API `. To start, let's try to maximize this objective function: -The Trainable API is where you specify how to set up your model and track intermediate training progress. There are two types of Trainables - a **function-based API** is for fast prototyping, and **class-based** API that unlocks many Tune features such as checkpointing, pausing. +.. code-block:: python + + def objective(x, a, b): + return a * (x ** 0.5) + b + +Here's an example of specifying the objective function using :ref:`the function-based Trainable API `: + +.. code-block:: python + + def trainable(config): + # config (dict): A dict of hyperparameters. + + for x in range(20): + score = objective(x, config["a"], config["b"]) + + tune.track.log(score=score) # This sends the score to Tune. + +Now, there's two Trainable APIs - one being the :ref:`function-based API ` that we demonstrated above. + +The other is a :ref:`class-based API ` that enables :ref:`checkpointing and pausing `. Here's an example of specifying the objective function using the :ref:`class-based API `: .. code-block:: python from ray import tune class Trainable(tune.Trainable): - """Tries to iteratively find the password.""" - def _setup(self, config): - self.iter = 0 - self.password = 1024 + # config (dict): A dict of hyperparameters + self.x = 0 + self.a = config["a"] + self.b = config["b"] - def _train(self): - """Execute one step of 'training'. This function will be called iteratively""" - self.iter += 1 - return { - "accuracy": abs(self.iter - self.password), - "training_iteration": self.iter # Tune will automatically provide this. - } - - def _stop(self): - # perform any cleanup necessary. - pass - -Function API example: - -.. code-block:: python - - def trainable(config): - """ - Args: - config (dict): Parameters provided from the search algorithm - or variant generation. - """ - - while True: - # ... - tune.track.log(**kwargs) + def _train(self): # This is called iteratively. + score = objective(self.x, self.a, self.b) + self.x += 1 + return {"score": score} .. tip:: Do not use ``tune.track.log`` within a ``Trainable`` class. -See the documentation: :ref:`trainable-docs`. +See the documentation: :ref:`trainable-docs` and :ref:`examples `. tune.run -------- -Use ``tune.run`` execute hyperparameter tuning using the core Ray APIs. This function manages your distributed experiment and provides many features such as logging, checkpointing, and early stopping. +Use ``tune.run`` execute hyperparameter tuning using the core Ray APIs. This function manages your experiment and provides many features such as :ref:`logging `, :ref:`checkpointing `, and :ref:`early stopping `. .. code-block:: python # Pass in a Trainable class or function to tune.run. tune.run(trainable) - # Run 10 trials (each trial is one instance of a Trainable). Tune runs in - # parallel and automatically determines concurrency. - tune.run(trainable, num_samples=10) - - # Run 1 trial, stop when trial has reached 10 iterations OR a mean accuracy of 0.98. - tune.run(my_trainable, stop={"training_iteration": 10, "mean_accuracy": 0.98}) - - # Run 1 trial, search over hyperparameters, stop after 10 iterations. - hyperparameters = {"lr": tune.uniform(0, 1), "momentum": tune.uniform(0, 1)} - tune.run(my_trainable, config=hyperparameters, stop={"training_iteration": 10}) - -This function will report status on the command line until all Trials stop: +This function will report status on the command line until all trials stop (each trial is one instance of a :ref:`Trainable `): .. code-block:: bash == Status == Memory usage on this node: 11.4/16.0 GiB Using FIFO scheduling algorithm. - Resources requested: 4/12 CPUs, 0/0 GPUs, 0.0/3.17 GiB heap, 0.0/1.07 GiB objects + Resources requested: 1/12 CPUs, 0/0 GPUs, 0.0/3.17 GiB heap, 0.0/1.07 GiB objects Result logdir: /Users/foo/ray_results/myexp - Number of trials: 4 (4 RUNNING) + Number of trials: 1 (1 RUNNING) +----------------------+----------+---------------------+-----------+--------+--------+----------------+-------+ - | Trial name | status | loc | param1 | param2 | acc | total time (s) | iter | + | Trial name | status | loc | a | b | score | total time (s) | iter | |----------------------+----------+---------------------+-----------+--------+--------+----------------+-------| | MyTrainable_a826033a | RUNNING | 10.234.98.164:31115 | 0.303706 | 0.0761 | 0.1289 | 7.54952 | 15 | - | MyTrainable_a8263fc6 | RUNNING | 10.234.98.164:31117 | 0.929276 | 0.158 | 0.4865 | 7.0501 | 14 | - | MyTrainable_a8267914 | RUNNING | 10.234.98.164:31111 | 0.068426 | 0.0319 | 0.9585 | 7.0477 | 14 | - | MyTrainable_a826b7bc | RUNNING | 10.234.98.164:31112 | 0.729127 | 0.0748 | 0.1797 | 7.05715 | 14 | +----------------------+----------+---------------------+-----------+--------+--------+----------------+-------+ -See the documentation: :ref:`tune-run-ref`. + +You can also easily run 10 trials. Tune automatically :ref:`determines how many trials will run in parallel `. + +.. code-block:: python + + tune.run(trainable, num_samples=10) + +Finally, you can randomly sample or grid search hyperparameters via Tune's :ref:`search space API `: + +.. code-block:: python + + space = {"x": tune.uniform(0, 1)} + tune.run(my_trainable, config=space, num_samples=10) + +See more documentation: :ref:`tune-run-ref`. Search Algorithms ----------------- -To optimize the hyperparameters of your training process, you will want to explore a “search space”. - -Search Algorithms are Tune modules that help explore a provided search space. It will use previous results from evaluating different hyperparameters to suggest better hyperparameters. Tune has SearchAlgorithms that integrate with many popular **optimization** libraries, such as `Nevergrad `_ and `Hyperopt `_. +To optimize the hyperparameters of your training process, you will want to use a :ref:`Search Algorithm ` which will help suggest better hyperparameters. .. code-block:: python - # https://github.com/hyperopt/hyperopt/ - # pip install hyperopt + # Be sure to first run `pip install hyperopt` + import hyperopt as hp from ray.tune.suggest.hyperopt import HyperOptSearch # Create a HyperOpt search space - space = {"momentum": hp.uniform("momentum", 0, 20), "lr": hp.uniform("lr", 0, 1)} - # Pass the search space into Tune's HyperOpt wrapper and maximize accuracy - hyperopt = HyperOptSearch(space, metric="accuracy", mode="max") + space = { + "a": hp.uniform("a", 0, 1), + "b": hp.uniform("b", 0, 20) - # Execute 20 trials using HyperOpt, stop after 20 iterations - max_iters = {"training_iteration": 20} - tune.run(trainable, search_alg=hyperopt, num_samples=20, stop=max_iters) + # Note: Arbitrary HyperOpt search spaces should be supported! + # "foo": hp.lognormal("foo", 0, 1)) + } + + # Specify the search space and maximize score + hyperopt = HyperOptSearch(space, metric="score", mode="max") + + # Execute 20 trials using HyperOpt and stop after 20 iterations + tune.run( + trainable, + search_alg=hyperopt, + num_samples=20, + stop={"training_iteration": 20} + ) + +Tune has SearchAlgorithms that integrate with many popular **optimization** libraries, such as :ref:`Nevergrad ` and :ref:`Hyperopt `. See the documentation: :ref:`searchalg-ref`. Trial Schedulers ---------------- -In addition, you can make your training process more efficient by stopping, pausing, or changing the hyperparameters of running trials. +In addition, you can make your training process more efficient by using a :ref:`Trial Scheduler `. -Trial Schedulers are Tune modules that adjust and change distributed training runs during execution. These modules can stop/pause/tweak the hyperparameters of running trials, making your hyperparameter tuning process much faster. Population-based training and HyperBand are examples of popular optimization algorithms implemented as Trial Schedulers. +Trial Schedulers can stop/pause/tweak the hyperparameters of running trials, making your hyperparameter tuning process much faster. .. code-block:: python from ray.tune.schedulers import HyperBandScheduler - # Create HyperBand scheduler and maximize accuracy - hyperband = HyperBandScheduler(metric="accuracy", mode="max") + # Create HyperBand scheduler and maximize score + hyperband = HyperBandScheduler(metric="score", mode="max") # Execute 20 trials using HyperBand using a search space - configs = {"lr": tune.uniform(0, 1), "momentum": tune.uniform(0, 1)} - tune.run(MyTrainableClass, num_samples=20, config=configs, scheduler=hyperband) + configs = {"a": tune.uniform(0, 1), "b": tune.uniform(0, 1)} -Unlike **Search Algorithms**, Trial Schedulers do not select which hyperparameter configurations to evaluate. However, you can use them together. + tune.run( + MyTrainableClass, + config=configs, + num_samples=20, + scheduler=hyperband + ) + +:ref:`Population-based Training ` and :ref:`HyperBand ` are examples of popular optimization algorithms implemented as Trial Schedulers. + +Unlike **Search Algorithms**, :ref:`Trial Scheduler ` do not select which hyperparameter configurations to evaluate. However, you can use them together. See the documentation: :ref:`schedulers-ref`. - Analysis -------- -After running a hyperparameter tuning job, you will want to analyze your results to determine what specific parameters are important and which hyperparameter values are the best. - -``tune.run`` returns an :ref:`Analysis ` object which has methods you can use for analyzing your results. This object can also retrieve all training runs as dataframes, allowing you to do ad-hoc data analysis over your results. +``tune.run`` returns an :ref:`Analysis ` object which has methods you can use for analyzing your training. .. code-block:: python @@ -167,13 +178,16 @@ After running a hyperparameter tuning job, you will want to analyze your results # Get the best hyperparameters best_hyperparameters = analysis.get_best_config() - # Get a dataframe for the max accuracy seen for each trial - df = analysis.dataframe(metric="mean_accuracy", mode="max") +This object can also retrieve all training runs as dataframes, allowing you to do ad-hoc data analysis over your results. + +.. code-block:: python + + # Get a dataframe for the max score seen for each trial + df = analysis.dataframe(metric="score", mode="max") What's Next? ~~~~~~~~~~~~ - Now that you have a working understanding of Tune, check out: * :ref:`Tune Guides and Examples `: Examples and templates for using Tune with your preferred machine learning library. diff --git a/doc/source/tune/_tutorials/tune-tutorial.rst b/doc/source/tune/_tutorials/tune-tutorial.rst index 01e3b695d..c1b84d206 100644 --- a/doc/source/tune/_tutorials/tune-tutorial.rst +++ b/doc/source/tune/_tutorials/tune-tutorial.rst @@ -5,7 +5,7 @@ A Basic Tune Tutorial .. image:: /images/tune-api.svg -This tutorial will walk you through the following process to setup a Tune experiment. Specifically, we'll leverage ASHA and Bayesian Optimization (via HyperOpt) via the following steps: +This tutorial will walk you through the following process to setup a Tune experiment using Pytorch. Specifically, we'll leverage ASHA and Bayesian Optimization (via HyperOpt) via the following steps: 1. Integrating Tune into your workflow 2. Specifying a TrialScheduler diff --git a/doc/source/tune/_tutorials/tune-usage.rst b/doc/source/tune/_tutorials/tune-usage.rst index bbb43d0c0..3366c1a63 100644 --- a/doc/source/tune/_tutorials/tune-usage.rst +++ b/doc/source/tune/_tutorials/tune-usage.rst @@ -9,6 +9,8 @@ This document provides an overview of the core concepts as well as some of the c .. contents:: :local: +.. _tune-parallelism: + Parallelism / GPUs ------------------ @@ -60,6 +62,8 @@ To attach to a Ray cluster, simply run ``ray.init`` before ``tune.run``: ray.init(address=) tune.run(trainable, num_samples=100, resources_per_trial={"cpu": 2, "gpu": 1}) +.. _tune-default-search-space: + Search Space (Grid/Random) -------------------------- @@ -219,6 +223,8 @@ You often will want to compute a large object (e.g., training data, model weight tune.run(f) +.. _tune-stopping: + Stopping Trials --------------- @@ -271,6 +277,8 @@ Finally, you can implement the ``Stopper`` abstract class for stopping entire ex Note that in the above example the currently running trials will not stop immediately but will do so once their current iterations are complete. See the :ref:`tune-stop-ref` documentation. +.. _tune-logging: + Logging/Tensorboard ------------------- diff --git a/doc/source/tune/api_docs/trainable.rst b/doc/source/tune/api_docs/trainable.rst index 0a23165c1..d41a8756d 100644 --- a/doc/source/tune/api_docs/trainable.rst +++ b/doc/source/tune/api_docs/trainable.rst @@ -7,30 +7,47 @@ Training can be done with either a **Class API** (``tune.Trainable``) or **funct You can use the **function-based API** for fast prototyping. On the other hand, the ``tune.Trainable`` interface supports checkpoint/restore functionality and provides more control for advanced algorithms. +For the sake of example, let's maximize this objective function: + +.. code-block:: python + + def objective(x, a, b): + return a * (x ** 0.5) + b + +.. _tune-function-api: + Function-based API ------------------ .. code-block:: python def trainable(config): - """ - Args: - config (dict): Parameters provided from the search algorithm - or variant generation. - """ + # config (dict): A dict of hyperparameters. - while True: - # ... - tune.track.log(**kwargs) + for x in range(20): + score = objective(x, config["a"], config["b"]) + + tune.track.log(score=score) # This sends the score to Tune. + + analysis = tune.run( + trainable, + config={ + "a": 2, + "b": 4 + }) + + print("best config: ", analysis.get_best_config(metric="score", mode="max")) .. tip:: Do not use ``tune.track.log`` within a ``Trainable`` class. Tune will run this function on a separate thread in a Ray actor process. Note that this API is not checkpointable, since the thread will never return control back to its caller. -.. note:: If you have a lambda function that you want to train, you will need to first register the function: ``tune.register_trainable("lambda_id", lambda x: ...)``. You can then use ``lambda_id`` in place of ``my_trainable``. +.. note:: If you want to pass in a Python lambda, you will need to first register the function: ``tune.register_trainable("lambda_id", lambda x: ...)``. You can then use ``lambda_id`` in place of ``my_trainable``. -Trainable API -------------- +.. _tune-class-api: + +Trainable Class API +------------------- .. caution:: Do not use ``tune.track.log`` within a ``Trainable`` class. @@ -40,44 +57,40 @@ The Trainable **class API** will require users to subclass ``ray.tune.Trainable` from ray import tune - class Guesser(tune.Trainable): - """Randomly picks a number from [1, 10000) to find the password.""" - + class Trainable(tune.Trainable): def _setup(self, config): - self.guess = config["guess"] - self.iter = 0 - self.password = 1024 - - def _train(self): - """Execute one step of 'training'. This function will be called iteratively""" - self.iter += 1 - self.guess += 1 - return { - "accuracy": abs(self.guess - self.password), - "training_iteration": self.iter # Tune will automatically provide this. - } + # config (dict): A dict of hyperparameters + self.x = 0 + self.a = config["a"] + self.b = config["b"] + def _train(self): # This is called iteratively. + score = objective(self.x, self.a, self.b) + self.x += 1 + return {"score": score} analysis = tune.run( - Guesser, - stop={"training_iteration": 10}, - num_samples=10, + Trainable, + stop={"training_iteration": 20}, config={ - "guess": tune.randint(1, 10000) + "a": 2, + "b": 4 }) - print('best config: ', analysis.get_best_config(metric="diff", mode="min")) + print('best config: ', analysis.get_best_config(metric="score", mode="max")) -As a subclass of ``tune.Trainable``, Tune will create a ``Guesser`` object on a separate process (using the Ray Actor API). +As a subclass of ``tune.Trainable``, Tune will create a ``Trainable`` object on a separate process (using the :ref:`Ray Actor API `). 1. ``_setup`` function is invoked once training starts. - 2. ``_train`` is invoked **multiple times**. Each time, the Guesser object executes one logical iteration of training in the tuning process, which may include one or more iterations of actual training. + 2. ``_train`` is invoked **multiple times**. Each time, the Trainable object executes one logical iteration of training in the tuning process, which may include one or more iterations of actual training. 3. ``_stop`` is invoked when training is finished. .. tip:: As a rule of thumb, the execution time of ``_train`` should be large enough to avoid overheads (i.e. more than a few seconds), but short enough to report progress periodically (i.e. at most a few minutes). In this example, we only implemented the ``_setup`` and ``_train`` methods for simplification. Next, we'll implement ``_save`` and ``_restore`` for checkpoint and fault tolerance. +.. _tune-trainable-save-restore: + Save and Restore ~~~~~~~~~~~~~~~~ diff --git a/python/ray/tune/tune.py b/python/ray/tune/tune.py index a0dd86923..c82565e4d 100644 --- a/python/ray/tune/tune.py +++ b/python/ray/tune/tune.py @@ -219,18 +219,19 @@ def run(run_or_experiment, TuneError: Any trials failed and `raise_on_failed_trial` is True. Examples: - >>> tune.run(mytrainable, scheduler=PopulationBasedTraining()) - >>> tune.run(mytrainable, num_samples=5, reuse_actors=True) + .. code-block:: python - >>> tune.run( - >>> "PG", - >>> num_samples=5, - >>> config={ - >>> "env": "CartPole-v0", - >>> "lr": tune.sample_from(lambda _: np.random.rand()) - >>> } - >>> ) + # Run 10 trials (each trial is one instance of a Trainable). Tune runs + # in parallel and automatically determines concurrency. + tune.run(trainable, num_samples=10) + + # Run 1 trial, stop when trial has reached 10 iterations + tune.run(my_trainable, stop={"training_iteration": 10}) + + # Run 1 trial, search over hyperparameters, stop after 10 iterations. + space = {"lr": tune.uniform(0, 1), "momentum": tune.uniform(0, 1)} + tune.run(my_trainable, config=space, stop={"training_iteration": 10}) """ trial_executor = trial_executor or RayTrialExecutor( queue_trials=queue_trials,