[tune/xgboost] Update release test docs (#13880)

* Update release test docs * Update
2026-06-27 16:31:25 +08:00 · 2021-02-04 13:10:56 +01:00
parent 6c77aeb98a
commit 1e113d2e6e
3 changed files with 68 additions and 3 deletions
@@ -60,6 +60,20 @@ This checklist is meant to be used in conjunction with the RELEASE_PROCESS.rst d
 - [ ] K8s Test
 	- [ ] K8s cluster launcher test
 	- [ ] K8s operator test
+- [ ] Data processing tests
+    - [ ] streaming_shuffle
+- [x] Tune tests
+    - [x] ignore for now
+- [ ] XGBoost Tests
+    - [ ] distributed_api_test
+    - [ ] train_small
+    - [ ] train_moderate
+    - [ ] train_gpu
+    - [ ] tune_small
+    - [ ] tune_4x32
+    - [ ] tune_32x4
+    - [ ] ft_small_non_elastic (flaky!)
+    - [ ] ft_small_elastic (flaky!)

 ## Final Steps
 - [ ] Wheels uploaded to Test PyPI
@@ -144,11 +144,11 @@ is generally the easiest way to run release tests.

   Run the ``ci/asan_tests`` with the commit. This will enable ASAN build and run the whole Python tests to detect memory leaks.

-6. **K8s operator tests**
+7. **K8s operator tests**

   Run the ``python/ray/tests/test_k8s_*`` to make sure K8s cluster launcher and operator works. Make sure the docker image is the released version.

-6. **Data processing tests**
+8. **Data processing tests**

   .. code-block:: bash

@@ -162,7 +162,26 @@ is generally the easiest way to run release tests.

   **IMPORTANT** Check if the workload scripts has terminated. If so, please record the result (both read/write bandwidth and the shuffle result) to the ``release_logs/data_processing_tests/[test_name]``.
   Both shuffling runtime and read/write bandwidth shouldn't be decreasing more than 15% compared to the previous release.
-  
+
+9. **Ray Tune release tests**
+
+   General Ray Tune functionality is implicitly tested via RLLib and XGBoost release tests.
+   We are in the process of introducing scalability envelopes for Ray Tune.
+   This is an ongoing effort and will only be introduced in the next release.
+   For now, **you can ignore the tune_tests directory**.
+
+10. **XGBoost release tests**
+
+    .. code-block:: bash
+
+       xgboost_tests/README.rst
+
+    Follow the instructions to kick off the tests and check the status of the workloads.
+    The XGBoost release tests use assertions or fail with exceptions and thus
+    should automatically tell you if they failed or not.
+    Only in the case of the fault tolerance tests you might want
+    to check the logs. See the readme for more information.
+

 Identify and Resolve Release Blockers
 -------------------------------------
@@ -0,0 +1,32 @@
+XGBoost on Ray tests
+====================
+
+This directory contains various XGBoost on Ray release tests.
+
+You should run these tests with the `releaser <https://github.com/ray-project/releaser>`_ tool.
+
+Overview
+--------
+There are four kinds of tests:
+
+1. ``distributed_api_test`` - checks general API functionality and should finish very quickly (< 1 minute)
+2. ``train_*`` - checks single trial training on different setups.
+3. ``tune_*`` - checks multi trial training via Ray Tune.
+4. ``ft_*`` - checks fault tolerance. **These tests are currently flaky**
+
+Generally the releaser tool will run all tests in parallel, but if you do
+it sequentially, be sure to do it in the order above. If ``train_*`` fails,
+``tune_*`` will fail, too.
+
+Flaky fault tolerance tests
+---------------------------
+The fault tolerance tests are currently flaky. In some runs, more nodes die
+than expected, causing the test to fail. In other cases, the re-scheduled
+actors become available too soon after crashing, causing the assertions to
+fail. Please consider re-running the test a couple of times or contact the
+test owner with outputs from the tests for further questions.
+
+Acceptance criteria
+-------------------
+These tests are considered passing when they throw no error at the end of
+the output log.