[tune/xgboost] Update release test docs (#13880)

* Update release test docs

* Update
This commit is contained in:
Kai Fricke
2021-02-04 13:10:56 +01:00
committed by GitHub
parent 6c77aeb98a
commit 1e113d2e6e
3 changed files with 68 additions and 3 deletions
+14
View File
@@ -60,6 +60,20 @@ This checklist is meant to be used in conjunction with the RELEASE_PROCESS.rst d
- [ ] K8s Test
- [ ] K8s cluster launcher test
- [ ] K8s operator test
- [ ] Data processing tests
- [ ] streaming_shuffle
- [x] Tune tests
- [x] ignore for now
- [ ] XGBoost Tests
- [ ] distributed_api_test
- [ ] train_small
- [ ] train_moderate
- [ ] train_gpu
- [ ] tune_small
- [ ] tune_4x32
- [ ] tune_32x4
- [ ] ft_small_non_elastic (flaky!)
- [ ] ft_small_elastic (flaky!)
## Final Steps
- [ ] Wheels uploaded to Test PyPI
+22 -3
View File
@@ -144,11 +144,11 @@ is generally the easiest way to run release tests.
Run the ``ci/asan_tests`` with the commit. This will enable ASAN build and run the whole Python tests to detect memory leaks.
6. **K8s operator tests**
7. **K8s operator tests**
Run the ``python/ray/tests/test_k8s_*`` to make sure K8s cluster launcher and operator works. Make sure the docker image is the released version.
6. **Data processing tests**
8. **Data processing tests**
.. code-block:: bash
@@ -162,7 +162,26 @@ is generally the easiest way to run release tests.
**IMPORTANT** Check if the workload scripts has terminated. If so, please record the result (both read/write bandwidth and the shuffle result) to the ``release_logs/data_processing_tests/[test_name]``.
Both shuffling runtime and read/write bandwidth shouldn't be decreasing more than 15% compared to the previous release.
9. **Ray Tune release tests**
General Ray Tune functionality is implicitly tested via RLLib and XGBoost release tests.
We are in the process of introducing scalability envelopes for Ray Tune.
This is an ongoing effort and will only be introduced in the next release.
For now, **you can ignore the tune_tests directory**.
10. **XGBoost release tests**
.. code-block:: bash
xgboost_tests/README.rst
Follow the instructions to kick off the tests and check the status of the workloads.
The XGBoost release tests use assertions or fail with exceptions and thus
should automatically tell you if they failed or not.
Only in the case of the fault tolerance tests you might want
to check the logs. See the readme for more information.
Identify and Resolve Release Blockers
-------------------------------------
+32
View File
@@ -0,0 +1,32 @@
XGBoost on Ray tests
====================
This directory contains various XGBoost on Ray release tests.
You should run these tests with the `releaser <https://github.com/ray-project/releaser>`_ tool.
Overview
--------
There are four kinds of tests:
1. ``distributed_api_test`` - checks general API functionality and should finish very quickly (< 1 minute)
2. ``train_*`` - checks single trial training on different setups.
3. ``tune_*`` - checks multi trial training via Ray Tune.
4. ``ft_*`` - checks fault tolerance. **These tests are currently flaky**
Generally the releaser tool will run all tests in parallel, but if you do
it sequentially, be sure to do it in the order above. If ``train_*`` fails,
``tune_*`` will fail, too.
Flaky fault tolerance tests
---------------------------
The fault tolerance tests are currently flaky. In some runs, more nodes die
than expected, causing the test to fail. In other cases, the re-scheduled
actors become available too soon after crashing, causing the assertions to
fail. Please consider re-running the test a couple of times or contact the
test owner with outputs from the tests for further questions.
Acceptance criteria
-------------------
These tests are considered passing when they throw no error at the end of
the output log.