[docs] Make walkthrough and starting Ray materials clear (#7099)

* make starting ray a separate page * concept * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * more fics * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2026-06-30 21:11:24 +08:00 · 2020-02-11 23:17:30 -08:00
parent 305eaaabe9
commit fc9352c588
10 changed files with 215 additions and 29 deletions
@@ -1,3 +1,5 @@
+.. _ref-automatic-cluster:
+
 Automatic Cluster Setup
 =======================

@@ -36,6 +38,8 @@ Test that it works by running the following commands from your local machine:

 .. tip:: For the AWS node configuration, you can set ``"ImageId: latest_dlami"`` to automatically use the newest `Deep Learning AMI <https://aws.amazon.com/machine-learning/amis/>`_ for your region. For example, ``head_node: {InstanceType: c5.xlarge, ImageId: latest_dlami}``.

+.. note:: You may see a message like: ``bash: cannot set terminal process group (-1): Inappropriate ioctl for device bash: no job control in this shell`` This is a harmless error. If the cluster launcher fails, it is most likely due to some other factor.
+
 GCP
 ~~~

@@ -1,3 +1,5 @@
+.. _configuring-ray:
+
 Configuring Ray
 ===============

@@ -5,7 +7,7 @@ This page discusses the various way to configure Ray, both from the Python API
 and from the command line. Take a look at the ``ray.init`` `documentation
 <package-ref.html#ray.init>`__ for a complete overview of the configurations.

-.. important:: For the multi-node setting, you must first run `ray start` on the command line before ``ray.init`` in Python. On a single machine, you can run ``ray.init()`` without `ray start`.
+.. important:: For the multi-node setting, you must first run ``ray start`` on the command line to start the Ray cluster services on the machine before ``ray.init`` in Python to connect to the cluster services. On a single machine, you can run ``ray.init()`` without ``ray start``, which will both start the Ray cluster services and connect to them.


 Cluster Resources
@@ -235,6 +235,7 @@ Getting Involved
   :maxdepth: -1
   :caption: Ray Core

+   walkthrough.rst
   using-ray.rst
   configure.rst
   cluster-index.rst
@@ -13,6 +13,9 @@ You can install the latest stable version of Ray as follows.

  pip install -U ray  # also recommended: ray[debug]

+
+.. _install-nightlies:
+
 Latest Snapshots (Nightlies)
 ----------------------------

@@ -40,6 +43,22 @@ master branch). To install these wheels, run the following command:
 .. _`MacOS Python 3.6`: https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-0.9.0.dev0-cp36-cp36m-macosx_10_13_intel.whl
 .. _`MacOS Python 3.5`: https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-0.9.0.dev0-cp35-cp35m-macosx_10_13_intel.whl

+Installing from a specific commit
+---------------------------------
+
+You can install the Ray wheels of any particular commit on ``master`` with the following template. You need to specify the commit hash, Ray version, Operating System, and Python version:
+
+.. code-block::
+
+    pip install https://ray-wheels.s3-us-west-2.amazonaws.com/master/{COMMIT_HASH}/ray-{RAY_VERSION}-{PYTHON_VERSION}-{PYTHON_VERSION}m-{OS_VERSION}_intel.whl
+
+For example, here are the Ray 0.9.0.dev0 wheels for Python 3.5, MacOS for commit ``a0ba4499ac645c9d3e82e68f3a281e48ad57f873``:
+
+.. code-block::
+
+    pip install https://ray-wheels.s3-us-west-2.amazonaws.com/master/a0ba4499ac645c9d3e82e68f3a281e48ad57f873/ray-0.9.0.dev0-cp35-cp35m-macosx_10_13_intel.whl
+
+
 Installing Ray with Anaconda
 ----------------------------

@@ -3,8 +3,58 @@ Memory Management

 This page describes how memory management works in Ray, and how you can set memory quotas to ensure memory-intensive applications run predictably and reliably.

-Overview
--------
+Summary
+-------
+
+You can set memory quotas to ensure your application runs predictably on any Ray cluster configuration. If you're not sure, you can start with a conservative default configuration like the following and see if any limits are hit.
+
+For Ray initialization on a single node, consider setting the following fields:
+
+.. code-block:: python
+
+  ray.init(
+      memory=2000 * 1024 * 1024,
+      object_store_memory=200 * 1024 * 1024,
+      driver_object_store_memory=100 * 1024 * 1024)
+
+For Ray usage on a cluster, consider setting the following fields on both the command line and in your Python script:
+
+.. tip:: 200 * 1024 * 1024 bytes is 200 MiB. Use double parentheses to evaluate math in Bash: ``$((200 * 1024 * 1024))``.
+
+.. code-block:: bash
+
+  # On the head node
+  ray start --head --redis-port=6379 \
+      --object-store-memory=$((200 * 1024 * 1024)) \
+      --memory=$((200 * 1024 * 1024)) \
+      --num-cpus=1
+
+  # On the worker node
+  ray start --object-store-memory=$((200 * 1024 * 1024)) \
+      --memory=$((200 * 1024 * 1024)) \
+      --num-cpus=1 \
+      --address=$RAY_HEAD_ADDRESS:6379
+
+.. code-block:: python
+
+  # In your Python script connecting to Ray:
+  ray.init(
+      address="auto",  # or "<hostname>:<port>" if not using the default port
+      driver_object_store_memory=100 * 1024 * 1024
+  )
+
+
+For any custom remote method or actor, you can set requirements as follows:
+
+.. code-block:: python
+
+  @ray.remote(
+      memory=2000 * 1024 * 1024,
+  )
+
+
+Concept Overview
+----------------

 There are several ways that Ray applications use memory:

@@ -79,14 +129,3 @@ Object store shared memory
 --------------------------

 Object store memory is also used to map objects returned by ``ray.get`` calls in shared memory. While an object is mapped in this way (i.e., there is a Python reference to the object), it is pinned and cannot be evicted from the object store. However, ray does not provide quota management for this kind of shared memory usage.
-
-Summary
-------
-
-You can set memory quotas to ensure your application runs predictably on any Ray cluster configuration. If you're not sure, you can start with a conservative default configuration like the following and see if any limits are hit:
-
-.. code-block:: python
-
-  @ray.remote(
-      memory=2000 * 1024 * 1024,
-      object_store_memory=200 * 1024 * 1024)
@@ -0,0 +1,107 @@
+Starting Ray
+============
+
+This page covers how to start Ray on your single machine or cluster of machines.
+
+.. contents:: :local:
+
+Installation
+------------
+
+Install Ray with ``pip install -U ray``. For the latest wheels (a snapshot of the ``master`` branch), you can use the instructions at :ref:`install-nightlies`.
+
+Starting Ray on a single machine
+--------------------------------
+
+You can start Ray by calling ``ray.init()`` in your Python script. This will start the local services that Ray uses to schedule remote tasks and actors and then connect to them. Note that you must initialize Ray before any tasks or actors are called (i.e., ``function.remote()`` will not work until `ray.init()` is called).
+
+.. code-block:: python
+
+  import ray
+  ray.init()
+
+To stop or restart Ray, use ``ray.shutdown()``.
+
+.. code-block:: python
+
+  import ray
+  ray.init()
+  ... # ray program
+  ray.shutdown()
+
+
+To check if Ray is initialized, you can call ``ray.is_initialized()``:
+
+.. code-block:: python
+
+  import ray
+  ray.init()
+  assert ray.is_initialized() == True
+
+  ray.shutdown()
+  assert ray.is_initialized() == False
+
+See the `Configuration <configure.html>`__ documentation for the various ways to configure Ray.
+
+Using Ray on a cluster
+----------------------
+
+There are two steps needed to use Ray in a distributed setting:
+
+    1. You must first start the Ray cluster.
+    2. You need to add the ``address`` parameter to ``ray.init`` (like ``ray.init(address=...)``). This causes Ray to connect to the existing cluster instead of starting a new one on the local node.
+
+If you have a Ray cluster specification (:ref:`ref-automatic-cluster`), you can launch a multi-node cluster with Ray initialized on each node with ``ray up``. **From your local machine/laptop**:
+
+.. code-block:: bash
+
+    ray up cluster.yaml
+
+You can monitor the Ray cluster status with ``ray monitor cluster.yaml`` and ssh into the head node with ``ray attach cluster.yaml``.
+
+Your Python script **only** needs to execute on one machine in the cluster (usually the head node). To connect your program to the Ray cluster, add the following to your Python script:
+
+.. code-block:: python
+
+    ray.init(address="auto")
+
+.. note:: Without ``ray.init(address...)``, your Ray program will only be parallelized across a single machine!
+
+Manual cluster setup
+~~~~~~~~~~~~~~~~~~~~
+
+You can also use the manual cluster setup (:ref:`ref-cluster-setup`) by running initialization commands on each node.
+
+**On the head node**:
+
+.. code-block:: bash
+
+    # If the ``--redis-port`` argument is omitted, Ray will choose a port at random.
+    $ ray start --head --redis-port=6379
+
+The command will print out the address of the Redis server that was started (and some other address information).
+
+**Then on all of the other nodes**, run the following. Make sure to replace ``<address>`` with the value printed by the command on the head node (it should look something like ``123.45.67.89:6379``).
+
+.. code-block:: bash
+
+    $ ray start --address=<address>
+
+
+Turning off parallelism
+-----------------------
+
+.. caution:: This feature is maintained solely to help with debugging, so it's possible you may encounter some issues. If you do, please `file an issue <https://github.com/ray-project/ray/issues>`_.
+
+By default, Ray will parallelize its workload. However, if you need to debug your Ray program, it may be easier to do everything on a single process. You can force all Ray functions to occur on a single process with ``local_mode`` by calling the following:
+
+.. code-block:: python
+
+    ray.init(local_mode=True)
+
+Note that some behavior such as setting global process variables may not work as expected.
+
+What's next?
+------------
+
+Check out our `Deployment section <cluster-index.html>`_ for more information on deploying Ray in different settings, including Kubernetes, YARN, and SLURM.
@@ -105,7 +105,7 @@ Launching a cloud cluster

    If you have already have a list of nodes, go to the `Local Cluster Setup`_ section.

-Ray currently supports AWS and GCP. Below, we will launch nodes on AWS that will default to using the Deep Learning AMI. See the `cluster setup documentation <autoscaling.html>`_. Save the below cluster configuration (``tune-default.yaml``):
+Ray currently supports AWS and GCP. Follow the instructions below to launch nodes on AWS (using the Deep Learning AMI). See the `cluster setup documentation <autoscaling.html>`_. Save the below cluster configuration (``tune-default.yaml``):

 .. literalinclude:: ../../python/ray/tune/examples/tune-default.yaml
   :language: yaml
@@ -119,6 +119,8 @@ Ray currently supports AWS and GCP. Below, we will launch nodes on AWS that will

 ``ray submit --start`` starts a cluster as specified by the given cluster configuration YAML file, uploads ``tune_script.py`` to the cluster, and runs ``python tune_script.py [args]``.

+.. note:: You may see a message like: ``bash: cannot set terminal process group (-1): Inappropriate ioctl for device bash: no job control in this shell`` This is a harmless error. If the cluster launcher fails, it is most likely due to some other factor.
+
 .. code-block:: bash

    ray submit tune-default.yaml tune_script.py --start --args="--ray-address=localhost:6379"
@@ -1,3 +1,5 @@
+.. _ref-cluster-setup:
+
 Manual Cluster Setup
 ====================

@@ -3,14 +3,14 @@ Using Ray

 If you’re brand new to Ray, we recommend starting with our `tutorials <https://github.com/ray-project/tutorial>`_.

-Below, you'll find information ranging from beginner material (like our `walkthrough <walkthrough.html>`_) to `advanced usage <advanced.html>`_. There are also detailed instructions on how to work with Ray concepts such as Actors and managing GPUs.
+Below, you'll find information ranging from how to `start Ray <starting-ray.html>`_ to `advanced usage <advanced.html>`_. There are also detailed instructions on how to work with Ray concepts such as Actors and managing GPUs.

 Finally, we've also included some content on using core Ray APIs with `Tensorflow <using-ray-with-tensorflow.html>`_ and `PyTorch <using-ray-with-pytorch.html>`_.

 .. toctree::
   :maxdepth: -1

-   walkthrough.rst
+   starting-ray.rst
   actors.rst
   using-ray-with-gpus.rst
   serialization.rst
@@ -1,14 +1,24 @@
-Walkthrough
-===========
+Ray Core Walkthrough
+====================

 This walkthrough will overview the core concepts of Ray:

-1. Using remote functions (tasks) [``ray.remote``]
-2. Fetching results (object IDs) [``ray.put``, ``ray.get``, ``ray.wait``]
-3. Using remote classes (actors) [``ray.remote``]
+1. Starting Ray
+2. Using remote functions (tasks) [``ray.remote``]
+3. Fetching results (object IDs) [``ray.put``, ``ray.get``, ``ray.wait``]
+4. Using remote classes (actors) [``ray.remote``]

-With Ray, your code will work on a single machine and can be easily scaled to a
-large cluster. To run this walkthrough, install Ray with ``pip install -U ray``.
+With Ray, your code will work on a single machine and can be easily scaled to large cluster.
+
+Installation
+------------
+
+To run this walkthrough, install Ray with ``pip install -U ray``. For the latest wheels (for a snapshot of ``master``), you can use these instructions at :ref:`install-nightlies`.
+
+Starting Ray
+------------
+
+You can start Ray on a single machine by adding this to your python script.

 .. code-block:: python

@@ -18,11 +28,11 @@ large cluster. To run this walkthrough, install Ray with ``pip install -U ray``.
  # ray.init(address=<cluster-address>) instead.
  ray.init()

-See the `Configuration <configure.html>`__ documentation for the various ways to
-configure Ray. To start a multi-node Ray cluster, see the `cluster setup page
-<using-ray-on-a-cluster.html>`__. You can stop ray by calling
-``ray.shutdown()``. To check if Ray is initialized, you can call
-``ray.is_initialized()``.
+  ...
+
+Ray will then be able to utilize all cores of your machine. Find out how to configure the number of cores Ray will use at :ref:`configuring-ray`.
+
+To start a multi-node Ray cluster, see the `cluster setup page <using-ray-on-a-cluster.html>`__.

 Remote functions (Tasks)
 ------------------------