diff --git a/doc/source/images/tune-xgboost-depth.svg b/doc/source/images/tune-xgboost-depth.svg
new file mode 100644
index 000000000..5e96a4a8f
--- /dev/null
+++ b/doc/source/images/tune-xgboost-depth.svg
@@ -0,0 +1,242 @@
+
+
diff --git a/doc/source/images/tune-xgboost-ensemble.svg b/doc/source/images/tune-xgboost-ensemble.svg
new file mode 100644
index 000000000..53a029c77
--- /dev/null
+++ b/doc/source/images/tune-xgboost-ensemble.svg
@@ -0,0 +1,680 @@
+
+
diff --git a/doc/source/images/tune-xgboost-weight.svg b/doc/source/images/tune-xgboost-weight.svg
new file mode 100644
index 000000000..be342aba7
--- /dev/null
+++ b/doc/source/images/tune-xgboost-weight.svg
@@ -0,0 +1,241 @@
+
+
diff --git a/doc/source/images/xgboost_logo.png b/doc/source/images/xgboost_logo.png
new file mode 100644
index 000000000..70014b9fe
Binary files /dev/null and b/doc/source/images/xgboost_logo.png differ
diff --git a/doc/source/tune/_tutorials/overview.rst b/doc/source/tune/_tutorials/overview.rst
index a66ce8020..22a9a0a6c 100644
--- a/doc/source/tune/_tutorials/overview.rst
+++ b/doc/source/tune/_tutorials/overview.rst
@@ -25,6 +25,11 @@ Take a look at any of the below tutorials to get started with Tune.
:figure: /images/tune.png
:description: :doc:`A walkthrough to setup your first Tune experiment `
+.. customgalleryitem::
+ :tooltip: Tuning XGBoost parameters.
+ :figure: /images/xgboost_logo.png
+ :description: :doc:`A guide to tuning XGBoost parameters with Tune `
+
.. raw:: html
@@ -34,6 +39,7 @@ Take a look at any of the below tutorials to get started with Tune.
tune-60-seconds.rst
tune-tutorial.rst
+ tune-xgboost.rst
User Guides
@@ -161,6 +167,7 @@ PyTorch Examples
XGBoost Example
~~~~~~~~~~~~~~~
+- :ref:`XGBoost tutorial `: A guide to tuning XGBoost parameters with Tune.
- `xgboost_example `__: Trains a basic XGBoost model with Tune with the function-based API and an XGBoost callback.
diff --git a/doc/source/tune/_tutorials/tune-xgboost.rst b/doc/source/tune/_tutorials/tune-xgboost.rst
new file mode 100644
index 000000000..6444eadf1
--- /dev/null
+++ b/doc/source/tune/_tutorials/tune-xgboost.rst
@@ -0,0 +1,518 @@
+.. _tune-xgboost:
+
+Tuning XGBoost parameters
+=========================
+
+XGBoost is currently one of the most popular machine learning algorithms. It performs
+very well on a large selection of tasks, and was the key to success in many Kaggle
+competitions.
+
+.. image:: /images/xgboost_logo.png
+ :width: 200px
+ :alt: XGBoost
+ :align: center
+ :target: https://xgboost.readthedocs.io/en/latest/
+
+
+This tutorial will give you a quick introduction to XGBoost, show you how
+to train an XGBoost model, and then guide you on how to optimize XGBoost
+parameters using Tune to get the best performance. We tackle the following topics:
+
+.. contents:: Table of contents
+ :depth: 2
+
+.. note::
+
+ To run this tutorial, you will need to install the following:
+
+ .. code-block:: bash
+
+ $ pip install xgboost
+
+What is XGBoost
+---------------
+
+XGBoost is an acronym for e\ **X**\ treme **G**\ radient **Boost**\ ing. Internally,
+XGBoost uses `decision trees `_. Instead
+of training just one large decision tree, XGBoost and other related algorithms train
+many small decision trees. The intuition behind this is that even though single
+decision trees can be inaccurate and suffer from high variance,
+combining the output of a large number of these weak learners can actually lead to
+strong learner, resulting in better predictions and less variance.
+
+.. figure:: /images/tune-xgboost-ensemble.svg
+ :alt: Single vs. ensemble learning
+
+ A single decision tree (left) might be able to get to an accuracy of 70%
+ for a binary classification task. By combining the output of several small
+ decision trees, an ensemble learner (right) might end up with a higher accuracy
+ of 90%.
+
+Boosting algorithms start with a single small decision tree and evaluate how well
+it predicts the given examples. When building the next tree, those samples that have
+been misclassified before have a higher chance of being used to generate the tree.
+This is useful because it avoids overfitting to samples that can be easily classified
+and instead tries to come up with models that are able to classify hard examples, too.
+Please see `here for a more thorough introduction to bagging and boosting algorithms
+`_.
+
+There are many boosting algorithms. In their core, they are all very similar. XGBoost
+uses second-level derivatives to find splits that maximize the *gain* (the inverse of
+the *loss*) - hence the name. In practice, there really is no drawback in using
+XGBoost over other boosting algorithms - in fact, it usually shows the best performance.
+
+Training a simple XGBoost classifier
+------------------------------------
+
+Let's first see how a simple XGBoost classifier can be trained. We'll use the
+``breast_cancer``-Dataset included in the ``sklearn`` dataset collection. This is
+a binary classification dataset. Given 30 different input features, our task is to
+learn to identify subjects with breast cancer and those without.
+
+Here is the full code to train a simple XGBoost model:
+
+.. code-block:: python
+
+ import numpy as np
+ import sklearn.datasets
+ import sklearn.metrics
+ from sklearn.model_selection import train_test_split
+ import xgboost as xgb
+
+
+ def train_breast_cancer(config):
+ # Load dataset
+ data, labels = sklearn.datasets.load_breast_cancer(return_X_y=True)
+ # Split into train and test set
+ train_x, test_x, train_y, test_y = train_test_split(
+ data, labels, test_size=0.25)
+ # Build input matrices for XGBoost
+ train_set = xgb.DMatrix(train_x, label=train_y)
+ test_set = xgb.DMatrix(test_x, label=test_y)
+ # Train the classifier
+ bst = xgb.train(config, train_set, evals=[(test_set, "eval")], verbose_eval=False)
+ # Predict labels for the test set
+ preds = bst.predict(test_set)
+ pred_labels = np.rint(preds)
+ # Return prediction accuracy
+ accuracy = sklearn.metrics.accuracy_score(test_y, pred_labels)
+ return accuracy
+
+
+ if __name__ == "__main__":
+ accuracy = train_breast_cancer({
+ "objective": "binary:logistic"
+ })
+ print("Accuracy: {:.2f}".format(accuracy))
+
+As you can see, the code is quite simple. First, the dataset is loaded and split
+into a ``test`` and ``train`` set. The XGBoost model is trained with ``xgb.train()``
+and the predictions for the test set are obtained with ``bst.predict()``. Lastly, we
+return the accuracy of our predictions. Even in this simple example, most runs result
+in a good accuracy of over ``0.90``.
+
+Maybe you have noticed the ``config`` parameter we pass to the XGBoost algorithm. This
+is a ``dict`` in which you can specify parameters for the XGBoost algorithm. In this
+simple example, the only parameter we passed is the ``objective`` parameter. The value
+``binary:logistic`` tells XGBoost that we aim to train a logistic regression model for
+a binary classification task. You can find an overview over all valid objectives
+`here in the XGBoost documentation `_.
+
+XGBoost Hyperparameters
+-----------------------
+Even with the default settings, XGBoost was able to get to a good accuracy on the
+breast cancer dataset. However, as in many machine learning algorithms, there are
+many knobs to tune which might lead to even better performance. Let's explore some of
+them below.
+
+Maximum tree depth
+..................
+Remember that XGBoost internally uses many decision tree models to come up with
+predictions. When training a decision tree, we need to tell the algorithm how
+large the tree may get. The parameter for this is called the tree *depth*.
+
+.. figure:: /images/tune-xgboost-depth.svg
+ :alt: Decision tree depth
+ :align: center
+
+ In this image, the left tree has a depth of 2, and the right tree a depth of 3.
+ Note that with each level, :math:`2^{(d-1)}` splits are added, where *d* is the depth
+ of the tree.
+
+Tree depth is a property that concerns the model complexity. If you only allow short
+trees, the models are likely not very precise - they underfit the data. If you allow
+very large trees, the single models are likely to overfit to the data. In practice,
+a number between ``2`` and ``6`` is often a good starting point for this parameter.
+
+XGBoost's default value is ``3``.
+
+Minimum child weight
+....................
+When a decision tree creates new leaves, it splits up the remaining data at one node
+into two groups. If there are only few samples in one of these groups, it often
+doesn't make sense to split it further. One of the reasons for this is that the
+model is harder to train when we have fewer samples.
+
+.. figure:: /images/tune-xgboost-weight.svg
+ :alt: Minimum child weight
+ :align: center
+
+ In this example, we start with 100 examples. At the first node, they are split
+ into 4 and 96 samples, respectively. In the next step, our model might find
+ that it doesn't make sense to split the 4 examples more. It thus only continues
+ to add leaves on the right side.
+
+The parameter used by the model to decide if it makes sense to split a node is called
+the *minimum child weight*. In the case of linear regression, this is just the absolute
+number of nodes requried in each child. In other objectives, this value is determined
+using the weights of the examples, hence the name.
+
+The larger the value, the more constrained the trees are and the less deep they will be.
+This parameter thus also affects the model complexity. Values can range between 0
+and infinity and are dependent on the sample size. For our ca. 500 examples in the
+breast cancer dataset, values between ``0`` and ``10`` should be sensible.
+
+XGBoost's default value is ``1``.
+
+Subsample size
+..............
+Each decision tree we add is trained on a subsample of the total training dataset.
+The probabilities for the samples are weighted according to the XGBoost algorithm,
+but we can decide on which fraction of the samples we want to train each decision
+tree on.
+
+Setting this value to ``0.7`` would mean that we randomly sample ``70%`` of the
+training dataset before each training iteration.
+
+XGBoost's default value is ``1``.
+
+Learning rate / Eta
+...................
+Remember that XGBoost sequentially trains many decision trees, and that later trees
+are more likely trained on data that has been misclassified by prior trees. In effect
+this means that earlier trees make decisions for easy samples (i.e. those samples that
+can easily be classified) and later trees make decisions for harder samples. It is then
+sensible to assume that the later trees are less accurate than earlier trees.
+
+To address this fact, XGBoost uses a parameter called *Eta*, which is sometimes called
+the *learning rate*. Don't confuse this with learning rates from gradient descent!
+The original `paper on stochastic gradient boosting `_
+introduces this parameter like so:
+
+.. math::
+ F_m(x) = F_{m-1}(x) + \eta \cdot \gamma_{lm} \textbf{1}(x \in R_{lm})
+
+This is just a complicated way to say that when we train we new decision tree,
+represented by :math:`\gamma_{lm} \textbf{1}(x \in R_{lm})`, we want to dampen
+its effect on the previous prediction :math:`F_{m-1}(x)` with a factor
+:math:`\eta`.
+
+Typical values for this parameter are between ``0.01`` and ``0.3```.
+
+XGBoost's default value is ``0.3``.
+
+Number of boost rounds
+......................
+Lastly, we can decide on how many boosting rounds we perform, which means how
+many decision trees we ultimately train. When we do heavy subsampling or use small
+learning rate, it might make sense to increase the number of boosting rounds.
+
+XGBoost's default value is ``10``.
+
+Putting it together
+...................
+Let's see how this looks like in code! We just need to adjust our ``config`` dict:
+
+.. code-block:: python
+
+ if __name__ == "__main__":
+ config = {
+ "objective": "binary:logistic",
+ "max_depth": 2,
+ "min_child_weight": 0,
+ "subsample": 0.8,
+ "eta": 0.2
+ }
+ accuracy = train_breast_cancer(config)
+ print("Accuracy: {:.2f}".format(accuracy))
+
+The rest stays the same. Please note that we do not adjust the ``num_boost_rounds`` here.
+The result should also show a high accuracy of over 90%.
+
+Tuning the configuration parameters
+-----------------------------------
+XGBoosts default parameters already lead to a good accuracy, and even our guesses in the
+last section should result in accuracies well above 90%. However, our guesses were
+just that: guesses. Often we do not know what combination of parameters would actually
+lead to the best results on a machine learning task.
+
+Unfortunately, there are infinitely many combinations of hyperparameters we could try
+out. Should we combine ``max_depth=3`` with ``subsample=0.8`` or with ``subsample=0.9``?
+What about the other parameters?
+
+This is where hyperparameter tuning comes into play. By using tuning libraries such as
+Ray Tune we can try out combinations of hyperparameters. Using sophisticated search
+strategies, these parameters can be selected so that they are likely to lead to good
+results (avoiding an expensive *exhaustive search*). Also, trials that do not perform
+well can be preemptively stopped to reduce waste of computing resources. Lastly, Ray Tune
+also takes care of training these runs in parallel, greatly increasing search speed.
+
+Let's start with a basic example on how to use Tune for this. We just need to make
+a few changes to our code-block:
+
+.. code-block:: python
+ :emphasize-lines: 26,32,33,34,35,37,38,39,40,41
+
+ import numpy as np
+ import sklearn.datasets
+ import sklearn.metrics
+ from sklearn.model_selection import train_test_split
+ import xgboost as xgb
+
+ from ray import tune
+
+
+ def train_breast_cancer(config):
+ # Load dataset
+ data, labels = sklearn.datasets.load_breast_cancer(return_X_y=True)
+ # Split into train and test set
+ train_x, test_x, train_y, test_y = train_test_split(
+ data, labels, test_size=0.25)
+ # Build input matrices for XGBoost
+ train_set = xgb.DMatrix(train_x, label=train_y)
+ test_set = xgb.DMatrix(test_x, label=test_y)
+ # Train the classifier
+ bst = xgb.train(config, train_set, evals=[(test_set, "eval")], verbose_eval=False)
+ # Predict labels for the test set
+ preds = bst.predict(test_set)
+ pred_labels = np.rint(preds)
+ # Return prediction accuracy
+ accuracy = sklearn.metrics.accuracy_score(test_y, pred_labels)
+ tune.report(mean_accuracy=accuracy, done=True)
+
+
+ if __name__ == "__main__":
+ config = {
+ "objective": "binary:logistic",
+ "max_depth": tune.randint(1, 9),
+ "min_child_weight": tune.choice([1, 2, 3]),
+ "subsample": tune.uniform(0.5, 1.0),
+ "eta": tune.loguniform(1e-4, 1e-1)
+ }
+ tune.run(
+ train_breast_cancer,
+ resources_per_trial={"cpu": 1},
+ config=config,
+ num_samples=10)
+
+As you can see, the changes in the actual training function are minimal. Instead of
+returning the accuracy value, we report it back to Tune using ``tune.report()``.
+Our ``config`` dictionary only changed slightly. Instead of passing hard-coded
+parameters, we tell Tune to choose values from a range of valid options. There are
+a number of options we have here, all of which are explained in
+:ref:`the Tune docs `.
+
+For a brief explanation, this is what they do:
+
+* ``tune.randint(min, max)`` chooses a random integer value between *min* and *max*.
+ Note that *max* is exclusive, so it will not be sampled.
+* ``tune.choice([a, b, c])`` chooses one of the items of the list at random. Each item
+ has the same chance to be sampled.
+* ``tune.uniform(min, max)`` samples a floating point number between *min* and *max*.
+ Note that *max* is exclusive here, too.
+* ``tune.loguniform(min, max, base=10)`` samples a floating point number between *min* and *max*,
+ but applies a logarithmic transformation to these boundaries first. Thus, this makes
+ it easy to sample values from different orders of magnitude.
+
+
+
+The ``num_samples=10`` option we pass to ``tune.run()`` means that we sample 10 different
+hyperparameter configurations from this search space.
+
+The output of our training run coud look like this:
+
+.. code-block::
+ :emphasize-lines: 10
+
+ +---------------------------------+------------+-------+-------------+-------------+--------------------+-------------+----------+--------+------------------+
+ | Trial name | status | loc | eta | max_depth | min_child_weight | subsample | acc | iter | total time (s) |
+ |---------------------------------+------------+-------+-------------+-------------+--------------------+-------------+----------+--------+------------------|
+ | train_breast_cancer_c817a_00000 | TERMINATED | | 0.00334038 | 8 | 1 | 0.640256 | 0.93007 | 1 | 0.050081 |
+ | train_breast_cancer_c817a_00001 | TERMINATED | | 0.00285335 | 4 | 3 | 0.951621 | 0.93007 | 1 | 0.0453899 |
+ | train_breast_cancer_c817a_00002 | TERMINATED | | 0.0597631 | 5 | 2 | 0.96479 | 0.986014 | 1 | 0.0503612 |
+ | train_breast_cancer_c817a_00003 | TERMINATED | | 0.000650095 | 6 | 2 | 0.923812 | 0.951049 | 1 | 0.0588872 |
+ | train_breast_cancer_c817a_00004 | TERMINATED | | 0.00753275 | 1 | 1 | 0.973499 | 0.881119 | 1 | 0.0347321 |
+ | train_breast_cancer_c817a_00005 | TERMINATED | | 0.000411214 | 5 | 1 | 0.672503 | 0.958042 | 1 | 0.0477931 |
+ | train_breast_cancer_c817a_00006 | TERMINATED | | 0.0940201 | 5 | 2 | 0.711124 | 0.972028 | 1 | 0.069901 |
+ | train_breast_cancer_c817a_00007 | TERMINATED | | 0.0372492 | 1 | 1 | 0.76303 | 0.895105 | 1 | 0.0496318 |
+ | train_breast_cancer_c817a_00008 | TERMINATED | | 0.000140322 | 1 | 2 | 0.885415 | 0.909091 | 1 | 0.045424 |
+ | train_breast_cancer_c817a_00009 | TERMINATED | | 0.000341654 | 5 | 3 | 0.720523 | 0.937063 | 1 | 0.0657773 |
+ +---------------------------------+------------+-------+-------------+-------------+--------------------+-------------+----------+--------+------------------+
+
+The best configuration we found used ``eta=0.0940201``, ``max_depth=5``,
+``min_child_weight=2``, ``subsample=0.711124`` and reached an accuracy of
+``0.972028``.
+
+Early stopping
+--------------
+Currently, Tune samples 10 different hyperparameter configurations and trains a full
+XGBoost on all of them. In our small example, training is very fast. However,
+if training takes longer, a significant amount of computer resources is spent on trials
+that will eventually show a bad performance, e.g. a low accuracy. It would be good
+if we could identify these trials early and stop them, so we don't waste any resources.
+
+This is where Tune's *Schedulers* shine. A Tune ``TrialScheduler`` is responsible
+for starting and stopping trials. Tune implements a number of different schedulers, each
+described :ref:`in the Tune documentation `.
+For our example, we will use the ``AsyncHyperBandScheduler`` or ``ASHAScheduler``.
+
+The basic idea of this scheduler: We sample a number of hyperparameter configurations.
+Each of these configurations is trained for a specific number of iterations.
+After these iterations, only the best performing hyperparameters are retained. These
+are selected according to some loss metric, usually an evaluation loss. This cycle is
+repeated until we end up with the best configuration.
+
+The ``ASHAScheduler`` needs to know three things:
+
+1. Which metric should be used to identify badly performing trials?
+2. Should this metric be maximized or minimized?
+3. How many iterations does each trial train for?
+
+There are more parameters, which are explained in the
+:ref:`documentation `.
+
+Lastly, we have to report the loss metric to Tune. We do this with a ``Callback`` that
+XGBoost accepts and calls after each training iteration. We also tell XGBoost which
+loss metrics to calculate in the ``eval_metric`` parameter. These are the metrics
+available in ``env.evaluation_result_list`` below.
+
+.. code-block:: python
+ :emphasize-lines: 11,12,13,26,42,44,45,46,47,48,49
+
+ import numpy as np
+ import sklearn.datasets
+ import sklearn.metrics
+ from ray.tune.schedulers import ASHAScheduler
+ from sklearn.model_selection import train_test_split
+ import xgboost as xgb
+
+ from ray import tune
+
+
+ def XGBCallback(env):
+ # After every training iteration, report loss to Tune
+ tune.report(**dict(env.evaluation_result_list))
+
+
+ def train_breast_cancer(config):
+ # Load dataset
+ data, labels = sklearn.datasets.load_breast_cancer(return_X_y=True)
+ # Split into train and test set
+ train_x, test_x, train_y, test_y = train_test_split(
+ data, labels, test_size=0.25)
+ # Build input matrices for XGBoost
+ train_set = xgb.DMatrix(train_x, label=train_y)
+ test_set = xgb.DMatrix(test_x, label=test_y)
+ # Train the classifier
+ bst = xgb.train(config, train_set, evals=[(test_set, "eval")], verbose_eval=False, callbacks=[XGBCallback])
+ # Predict labels for the test set
+ preds = bst.predict(test_set)
+ pred_labels = np.rint(preds)
+ # Return prediction accuracy
+ accuracy = sklearn.metrics.accuracy_score(test_y, pred_labels)
+ tune.report(mean_accuracy=accuracy, done=True)
+
+
+ if __name__ == "__main__":
+ config = {
+ "objective": "binary:logistic",
+ "max_depth": tune.randint(1, 9),
+ "min_child_weight": tune.choice([1, 2, 3]),
+ "subsample": tune.uniform(0.5, 1.0),
+ "eta": tune.loguniform(1e-4, 1e-1),
+ "eval_metric": ["auc", "ams@0", "logloss"]
+ }
+ scheduler = ASHAScheduler(
+ metric="eval-logloss", # The `eval` prefix is defined in xgb.train
+ mode="min", # Retain configurations with a low logloss
+ max_t=11, # 10 training iterations + 1 final evaluation
+ grace_period=1, # Number of minimum iterations for each trial
+ reduction_factor=2) # How aggressively to stop trials
+ tune.run(
+ train_breast_cancer,
+ resources_per_trial={"cpu": 1},
+ config=config,
+ num_samples=10,
+ scheduler=scheduler)
+
+The output of our run could look like this:
+
+.. code-block::
+ :emphasize-lines: 13
+
+ +---------------------------------+------------+-------+-------------+-------------+--------------------+-------------+----------+--------+------------------+
+ | Trial name | status | loc | eta | max_depth | min_child_weight | subsample | acc | iter | total time (s) |
+ |---------------------------------+------------+-------+-------------+-------------+--------------------+-------------+----------+--------+------------------|
+ | train_breast_cancer_806ea_00000 | TERMINATED | | 0.0371055 | 2 | 1 | 0.611729 | 0.951049 | 11 | 0.339279 |
+ | train_breast_cancer_806ea_00001 | TERMINATED | | 0.0324613 | 3 | 2 | 0.643815 | | 4 | 0.230338 |
+ | train_breast_cancer_806ea_00002 | TERMINATED | | 0.0100875 | 4 | 3 | 0.985147 | | 2 | 0.0661929 |
+ | train_breast_cancer_806ea_00003 | TERMINATED | | 0.00124263 | 1 | 3 | 0.890299 | | 1 | 0.0201721 |
+ | train_breast_cancer_806ea_00004 | TERMINATED | | 0.000230373 | 5 | 3 | 0.627611 | | 1 | 0.0265107 |
+ | train_breast_cancer_806ea_00005 | TERMINATED | | 0.000186942 | 5 | 2 | 0.831801 | | 1 | 0.026082 |
+ | train_breast_cancer_806ea_00006 | TERMINATED | | 0.00871051 | 2 | 3 | 0.721523 | 0.958042 | 11 | 0.299392 |
+ | train_breast_cancer_806ea_00007 | TERMINATED | | 0.00440949 | 2 | 3 | 0.606252 | | 1 | 0.0210171 |
+ | train_breast_cancer_806ea_00008 | TERMINATED | | 0.00948289 | 5 | 2 | 0.892979 | | 2 | 0.140424 |
+ | train_breast_cancer_806ea_00009 | TERMINATED | | 0.0514017 | 2 | 1 | 0.859864 | 0.972028 | 11 | 0.365437 |
+ +---------------------------------+------------+-------+-------------+-------------+--------------------+-------------+----------+--------+------------------+
+
+As you can see, four trials have been stopped after just one iteration, two after two iterations,
+one after four iterations, and the three most promising configurations have been run for
+ten iterations. The 11 is due to the fact that we finally report the accuracy after
+training the full model, which is internally interpreted as another iteration.
+
+Using fractional GPUs
+---------------------
+You can often accelerate your training by using GPUs in addition to CPUs. However,
+you usually don't have as many GPUs as you have trials to run. For instance, if you
+run 10 Tune trials in parallel, you usually don't have access to 10 separate GPUs.
+
+Tune supports *fractional GPUs*. This means that each task is assigned a fraction
+of the GPU memory for training. For 10 tasks, this could look like this:
+
+.. code-block:: python
+ :emphasize-lines: 8,12
+
+ config = {
+ "objective": "binary:logistic",
+ "max_depth": tune.randint(1, 9),
+ "min_child_weight": tune.choice([1, 2, 3]),
+ "subsample": tune.uniform(0.5, 1.0),
+ "eta": tune.loguniform(1e-4, 1e-1),
+ "eval_metric": ["auc", "ams@0", "logloss"],
+ "tree_method": "gpu_hist"
+ }
+ tune.run(
+ train_breast_cancer,
+ resources_per_trial={"cpu": 1, "gpu": 0.1},
+ config=config,
+ num_samples=10,
+ scheduler=scheduler)
+
+Each task thus works with 10% of the available GPU memory. You also have to tell
+XGBoost to use the ``gpu_hist`` tree method, so it knows it should use the GPU.
+
+Conclusion
+----------
+You should now have a basic understanding on how to train XGBoost models and on how
+to tune the hyperparameters to yield the best results. In our simple example,
+Tuning the parameters didn't make a huge difference for the accuracy.
+But in larger applications, intelligent hyperparameter tuning can make the
+difference between a model that doesn't seem to learn at all, and a model
+that outperforms all the other ones.
+
+Further References
+------------------
+
+* `XGBoost Hyperparameter Tuning - A Visual Guide `_
+* `Notes on XGBoost Parameter Tuning `_
+* `Doing XGBoost Hyperparameter Tuning the smart way `_
diff --git a/python/ray/tune/examples/xgboost_example.py b/python/ray/tune/examples/xgboost_example.py
index 1e7303fe9..10ad88e6a 100644
--- a/python/ray/tune/examples/xgboost_example.py
+++ b/python/ray/tune/examples/xgboost_example.py
@@ -1,49 +1,61 @@
-import xgboost as xgb
import numpy as np
import sklearn.datasets
import sklearn.metrics
+from ray.tune.schedulers import ASHAScheduler
from sklearn.model_selection import train_test_split
+import xgboost as xgb
from ray import tune
def XGBCallback(env):
+ # After every training iteration, report loss to Tune
tune.report(**dict(env.evaluation_result_list))
def train_breast_cancer(config):
- data, target = sklearn.datasets.load_breast_cancer(return_X_y=True)
+ # Load dataset
+ data, labels = sklearn.datasets.load_breast_cancer(return_X_y=True)
+ # Split into train and test set
train_x, test_x, train_y, test_y = train_test_split(
- data, target, test_size=0.25)
+ data, labels, test_size=0.25)
+ # Build input matrices for XGBoost
train_set = xgb.DMatrix(train_x, label=train_y)
test_set = xgb.DMatrix(test_x, label=test_y)
+ # Train the classifier
bst = xgb.train(
- config, train_set, evals=[(test_set, "eval")], callbacks=[XGBCallback])
+ config,
+ train_set,
+ evals=[(test_set, "eval")],
+ verbose_eval=False,
+ callbacks=[XGBCallback])
+ # Predict labels for the test set
preds = bst.predict(test_set)
pred_labels = np.rint(preds)
- tune.report(
- mean_accuracy=sklearn.metrics.accuracy_score(test_y, pred_labels),
- done=True)
+ # Return prediction accuracy
+ accuracy = sklearn.metrics.accuracy_score(test_y, pred_labels)
+ tune.report(mean_accuracy=accuracy, done=True)
if __name__ == "__main__":
- num_threads = 2
config = {
- "verbosity": 0,
- "num_threads": num_threads,
"objective": "binary:logistic",
- "booster": "gbtree",
- "eval_metric": ["auc", "ams@0", "logloss"],
"max_depth": tune.randint(1, 9),
+ "min_child_weight": tune.choice([1, 2, 3]),
+ "subsample": tune.uniform(0.5, 1.0),
"eta": tune.loguniform(1e-4, 1e-1),
- "gamma": tune.loguniform(1e-8, 1.0),
- "grow_policy": tune.choice(["depthwise", "lossguide"])
+ "eval_metric": ["auc", "ams@0", "logloss"]
}
-
- from ray.tune.schedulers import ASHAScheduler
+ # The ASHAScheduler stops bad performing configurations early
+ scheduler = ASHAScheduler(
+ metric="eval-logloss", # The `eval` prefix is defined in xgb.train
+ mode="min", # Retain configurations with a low logloss
+ max_t=11, # 10 training iterations + 1 final evaluation
+ grace_period=1, # Number of minimum iterations for each trial
+ reduction_factor=2) # How aggressively to stop trials
tune.run(
- train_breast_cancer,
- resources_per_trial={"cpu": num_threads},
+ train_breast_cancer, # your training function
+ resources_per_trial={"cpu": 1}, # You can add "gpu": 0.1 here
config=config,
- num_samples=2,
- scheduler=ASHAScheduler(metric="eval-logloss", mode="min"))
+ num_samples=10, # number of parameter configurations to try
+ scheduler=scheduler)