mirror of
https://github.com/wassname/ray.git
synced 2026-06-27 16:31:25 +08:00
[tune/xgboost] Update release test docs (#13880)
* Update release test docs * Update
This commit is contained in:
@@ -60,6 +60,20 @@ This checklist is meant to be used in conjunction with the RELEASE_PROCESS.rst d
|
||||
- [ ] K8s Test
|
||||
- [ ] K8s cluster launcher test
|
||||
- [ ] K8s operator test
|
||||
- [ ] Data processing tests
|
||||
- [ ] streaming_shuffle
|
||||
- [x] Tune tests
|
||||
- [x] ignore for now
|
||||
- [ ] XGBoost Tests
|
||||
- [ ] distributed_api_test
|
||||
- [ ] train_small
|
||||
- [ ] train_moderate
|
||||
- [ ] train_gpu
|
||||
- [ ] tune_small
|
||||
- [ ] tune_4x32
|
||||
- [ ] tune_32x4
|
||||
- [ ] ft_small_non_elastic (flaky!)
|
||||
- [ ] ft_small_elastic (flaky!)
|
||||
|
||||
## Final Steps
|
||||
- [ ] Wheels uploaded to Test PyPI
|
||||
|
||||
@@ -144,11 +144,11 @@ is generally the easiest way to run release tests.
|
||||
|
||||
Run the ``ci/asan_tests`` with the commit. This will enable ASAN build and run the whole Python tests to detect memory leaks.
|
||||
|
||||
6. **K8s operator tests**
|
||||
7. **K8s operator tests**
|
||||
|
||||
Run the ``python/ray/tests/test_k8s_*`` to make sure K8s cluster launcher and operator works. Make sure the docker image is the released version.
|
||||
|
||||
6. **Data processing tests**
|
||||
8. **Data processing tests**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
@@ -162,7 +162,26 @@ is generally the easiest way to run release tests.
|
||||
|
||||
**IMPORTANT** Check if the workload scripts has terminated. If so, please record the result (both read/write bandwidth and the shuffle result) to the ``release_logs/data_processing_tests/[test_name]``.
|
||||
Both shuffling runtime and read/write bandwidth shouldn't be decreasing more than 15% compared to the previous release.
|
||||
|
||||
|
||||
9. **Ray Tune release tests**
|
||||
|
||||
General Ray Tune functionality is implicitly tested via RLLib and XGBoost release tests.
|
||||
We are in the process of introducing scalability envelopes for Ray Tune.
|
||||
This is an ongoing effort and will only be introduced in the next release.
|
||||
For now, **you can ignore the tune_tests directory**.
|
||||
|
||||
10. **XGBoost release tests**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
xgboost_tests/README.rst
|
||||
|
||||
Follow the instructions to kick off the tests and check the status of the workloads.
|
||||
The XGBoost release tests use assertions or fail with exceptions and thus
|
||||
should automatically tell you if they failed or not.
|
||||
Only in the case of the fault tolerance tests you might want
|
||||
to check the logs. See the readme for more information.
|
||||
|
||||
|
||||
Identify and Resolve Release Blockers
|
||||
-------------------------------------
|
||||
|
||||
@@ -0,0 +1,32 @@
|
||||
XGBoost on Ray tests
|
||||
====================
|
||||
|
||||
This directory contains various XGBoost on Ray release tests.
|
||||
|
||||
You should run these tests with the `releaser <https://github.com/ray-project/releaser>`_ tool.
|
||||
|
||||
Overview
|
||||
--------
|
||||
There are four kinds of tests:
|
||||
|
||||
1. ``distributed_api_test`` - checks general API functionality and should finish very quickly (< 1 minute)
|
||||
2. ``train_*`` - checks single trial training on different setups.
|
||||
3. ``tune_*`` - checks multi trial training via Ray Tune.
|
||||
4. ``ft_*`` - checks fault tolerance. **These tests are currently flaky**
|
||||
|
||||
Generally the releaser tool will run all tests in parallel, but if you do
|
||||
it sequentially, be sure to do it in the order above. If ``train_*`` fails,
|
||||
``tune_*`` will fail, too.
|
||||
|
||||
Flaky fault tolerance tests
|
||||
---------------------------
|
||||
The fault tolerance tests are currently flaky. In some runs, more nodes die
|
||||
than expected, causing the test to fail. In other cases, the re-scheduled
|
||||
actors become available too soon after crashing, causing the assertions to
|
||||
fail. Please consider re-running the test a couple of times or contact the
|
||||
test owner with outputs from the tests for further questions.
|
||||
|
||||
Acceptance criteria
|
||||
-------------------
|
||||
These tests are considered passing when they throw no error at the end of
|
||||
the output log.
|
||||
Reference in New Issue
Block a user