diff --git a/doc/examples/overview.rst b/doc/examples/overview.rst
index cae3504a9..8df8bf78b 100644
--- a/doc/examples/overview.rst
+++ b/doc/examples/overview.rst
@@ -14,6 +14,7 @@ Get started with Ray, Tune, and RLlib with these notebooks that you can run onli
plot_newsreader.rst
plot_streaming.rst
plot_example-lm.rst
+ testing-tips.rst
.. customgalleryitem::
@@ -51,3 +52,7 @@ Get started with Ray, Tune, and RLlib with these notebooks that you can run onli
.. customgalleryitem::
:tooltip: Distributed Fault-Tolerant BERT training for FAIRSeq using Ray.
:description: :doc:`/auto_examples/plot_example-lm`
+
+.. customgalleryitem::
+ :tooltip: Tips for testing Ray applications
+ :description: :doc:`/auto_examples/testing-tips`
diff --git a/doc/examples/testing-tips.rst b/doc/examples/testing-tips.rst
new file mode 100644
index 000000000..ca185a5a5
--- /dev/null
+++ b/doc/examples/testing-tips.rst
@@ -0,0 +1,140 @@
+Tips for testing Ray programs
+=============================
+
+Ray programs can be a little tricky to test due to the nature of parallel programs. We've put together a list of tips and tricks for common testing practices for Ray programs.
+
+.. contents::
+ :local:
+
+Tip 1: Fixing the resource quantity with ``ray.init(num_cpus=...)``
+-------------------------------------------------------------------
+
+By default, ``ray.init()`` detects the number of CPUs and GPUs on your local machine/cluster.
+
+However, your testing environment may have a significantly lower number of resources. For example, the TravisCI build environment only has `2 cores `_
+
+If tests are written to depend on ``ray.init()``, they may be implicitly written in a way that relies on a larger multi-core machine.
+
+This may easily result in tests exhibiting unexpected, flaky, or faulty behavior that is hard to reproduce.
+
+To overcome this, you should override the detected resources by setting them in ``ray.init`` like: ``ray.init(num_cpus=2)``
+
+
+Tip 2: Use ``ray.init(local_mode=True)`` if possible
+----------------------------------------------------
+
+A test suite for a Ray program may take longer to run than other test suites. One common culprit for long test durations is the overheads from inter-process communication.
+
+Ray provides a local mode for running Ray programs in a single process via ``ray.init(local_mode=True)``. This can be especially useful for testing since it allows you to reduce/remove inter-process communication.
+
+However, there are some caveats with using this. You should not do this if:
+
+1. If your application depends on setting environment variables per process
+2. If your application has recursive actor calls
+3. If your remote actor/task sets any sort of process-level global variables
+
+
+Tip 3: Sharing the ray cluster across tests if possible
+--------------------------------------------------------
+
+It is safest to start a new ray cluster for each test.
+
+.. code-block:: python
+
+ class RayTest(unittest.TestCase):
+ def setUp(self):
+ ray.init(num_cpus=4, num_gpus=0)
+
+ def tearDown(self):
+ ray.shutdown()
+
+However, starting and stopping a Ray cluster can actually incur a non-trivial amount of latency. For example, on a typical Macbook Pro laptop, starting and stopping can take nearly 5 seconds:
+
+.. code-block:: bash
+
+ python -c 'import ray; ray.init(); ray.shutdown()' 3.93s user 1.23s system 116% cpu 4.420 total
+
+Across 20 tests, this ends up being 90 seconds of added overhead.
+
+Reusing a Ray cluster across tests can provide significant speedups to your test suite. This reduces the overhead to a constant, amortized quantity:
+
+.. code-block:: python
+
+ class RayClassTest(unittest.TestCase):
+ @classmethod
+ def setUpClass(cls):
+ # Start it once for the entire test suite/module
+ ray.init(num_cpus=4, num_gpus=0)
+
+ @classmethod
+ def tearDownClass(cls):
+ ray.shutdown()
+
+Depending on your application, there are certain cases where it may be unsafe to reuse a Ray cluster across tests. For example:
+
+1. If your application depends on setting environment variables per process.
+2. If your remote actor/task sets any sort of process-level global variables.
+
+
+Tip 4: Create a mini-cluster with ``ray.cluster_utils.Cluster``
+---------------------------------------------------------------
+
+If writing an application for a cluster setting, you may want to mock a multi-node Ray cluster. This can be done with the ``ray.cluster_utils.Cluster`` utility.
+
+.. code-block:: python
+
+ from ray.cluster_utils import Cluster
+
+ # Starts a head-node for the cluster.
+ cluster = Cluster(
+ initialize_head=True,
+ head_node_args={
+ "num_cpus": 10,
+ })
+
+After starting a cluster, you can execute a typical ray script in the same process:
+
+.. code-block:: python
+
+ ray.init(address=cluster.address)
+
+ @ray.remote
+ def f(x):
+ return x
+
+ for _ in range(1):
+ ray.get([f.remote(1) for _ in range(1000)])
+
+ for _ in range(10):
+ ray.get([f.remote(1) for _ in range(100)])
+
+ for _ in range(100):
+ ray.get([f.remote(1) for _ in range(10)])
+
+ for _ in range(1000):
+ ray.get([f.remote(1) for _ in range(1)])
+
+
+You can also add multiple nodes, each with different resource quantities:
+
+.. code-block:: python
+
+ mock_node = cluster.add_node(num_cpus=10)
+
+ assert ray.cluster_resources()["CPU"] == 20
+
+You can also remove nodes, which is useful when testing failure-handling logic:
+
+.. code-block:: python
+
+ cluster.remove_node(mock_node)
+
+ assert ray.cluster_resources()["CPU"] == 10
+
+See the `Cluster Util for more details `_.
+
+
+Tip 5: Be careful when running tests in parallel
+------------------------------------------------
+
+Since Ray starts a variety of services, it is easy to trigger timeouts if too many services are started at once. Therefore, when using tools such as `pytest xdist `_ that run multiple tests in parallel, one should keep in mind that this may introduce flakiness into the test environment.
\ No newline at end of file
diff --git a/python/ray/cluster_utils.py b/python/ray/cluster_utils.py
index a8cdf9497..a3eadd809 100644
--- a/python/ray/cluster_utils.py
+++ b/python/ray/cluster_utils.py
@@ -13,7 +13,7 @@ class Cluster:
connect=False,
head_node_args=None,
shutdown_at_exit=True):
- """Initializes the cluster.
+ """Initializes all services of a Ray cluster.
Args:
initialize_head (bool): Automatically start a Ray cluster