diff --git a/doc/source/autoscaler-status.png b/doc/source/autoscaler-status.png new file mode 100644 index 000000000..5ee87decd Binary files /dev/null and b/doc/source/autoscaler-status.png differ diff --git a/doc/source/autoscaling.rst b/doc/source/autoscaling.rst index 570aabe32..a882abb75 100644 --- a/doc/source/autoscaling.rst +++ b/doc/source/autoscaling.rst @@ -1,7 +1,7 @@ -Cluster setup and auto-scaling (Experimental) -============================================= +Cloud Setup and Auto-Scaling +============================ -The Ray ``create_or_update`` command starts an AWS Ray cluster from your personal computer. Once the cluster is up, you can then SSH into it to run Ray programs. +The ``ray create_or_update`` command starts an AWS Ray cluster from your personal computer. Once the cluster is up, you can then SSH into it to run Ray programs. Quick start ----------- @@ -9,11 +9,10 @@ Quick start First, ensure you have configured your AWS credentials in ``~/.aws/credentials``, as described in `the boto docs `__. -Then you're ready to go. The provided `ray/python/ray/autoscaler/aws/example.yaml `__ cluster config file will create a small cluster with a m4.large -head node (on-demand), and two m4.large `spot workers `__. +Then you're ready to go. The provided `ray/python/ray/autoscaler/aws/example.yaml `__ cluster config file will create a small cluster with a m4.large head node (on-demand), and two m4.large `spot workers `__, configured to autoscale up to four m4.large workers. Try it out by running these commands from your personal computer. Once the cluster is started, you can then -SSH into the head node to run Ray programs with ``ray.init(redis_address=":6379")``. +SSH into the head node to run Ray programs with ``ray.init(redis_address=":6379")``. .. code-block:: bash @@ -21,22 +20,70 @@ SSH into the head node to run Ray programs with ``ray.init(redis_address="", + } + + setup_commands: + - test -e || git clone https://github.com//.git + - cd && git fetch && git checkout `cat /tmp/current_branch_sha` + +This tells ``ray create_or_update`` to sync the current git branch SHA from your personal computer to a temporary file on the cluster. Then, the setup commands read that file to figure out which SHA they should checkout on the nodes. The final workflow to update the cluster then becomes just this: + +1. Make local changes to a git branch +2. Commit the changes with ``git commit`` and ``git push`` +3. Update files on your Ray cluster with ``ray create_or_update`` + +Common cluster configurations +----------------------------- + +The ``example.yaml`` configuration is enough to get started with Ray, but for more compute intensive workloads you will want to change the instance types to e.g. use GPU or larger compute instance by editing the yaml file. Here are a few common configurations: **GPU single node**: use Ray on a single large GPU instance. @@ -84,10 +131,9 @@ with GPU worker nodes instead. worker_nodes: InstanceMarketOptions: MarketType: spot - InstanceType: p2.8xlarge + InstanceType: p2.xlarge Additional Cloud providers -------------------------- -To use Ray autoscaling on other Cloud providers or cluster management systems, you can implement the ``NodeProvider`` interface -(~100 LOC) and register it in `node_provider.py `__. +To use Ray autoscaling on other Cloud providers or cluster management systems, you can implement the ``NodeProvider`` interface (~100 LOC) and register it in `node_provider.py `__. Contributions are welcome! diff --git a/doc/source/using-ray-on-a-cluster.rst b/doc/source/using-ray-on-a-cluster.rst index 4e57e3bda..7135f0c2f 100644 --- a/doc/source/using-ray-on-a-cluster.rst +++ b/doc/source/using-ray-on-a-cluster.rst @@ -1,6 +1,10 @@ Using Ray on a Cluster ====================== +.. note:: + + Starting with Ray 0.4.0 if you're using AWS you can use the automated `setup commands `__. + The instructions in this document work well for small clusters. For larger clusters, follow the instructions for `managing a cluster with parallel ssh`_. diff --git a/doc/source/using-ray-on-a-large-cluster.rst b/doc/source/using-ray-on-a-large-cluster.rst index 0d31c4c9b..56e2e02b1 100644 --- a/doc/source/using-ray-on-a-large-cluster.rst +++ b/doc/source/using-ray-on-a-large-cluster.rst @@ -1,6 +1,10 @@ Using Ray on a Large Cluster ============================ +.. note:: + + Starting with Ray 0.4.0 if you're using AWS you can use the automated `setup commands `__. + Deploying Ray on a cluster requires a bit of manual work. The instructions here illustrate how to use parallel ssh commands to simplify the process of running commands and scripts on many machines simultaneously. diff --git a/python/ray/autoscaler/aws/example.yaml b/python/ray/autoscaler/aws/example.yaml index 933d3db8b..82375664d 100644 --- a/python/ray/autoscaler/aws/example.yaml +++ b/python/ray/autoscaler/aws/example.yaml @@ -3,7 +3,7 @@ cluster_name: default # The minimum number of workers nodes to launch in addition to the head # node. This number should be >= 0. -min_workers: 0 +min_workers: 2 # The maximum number of workers nodes to launch in addition to the head # node. This takes precedence over min_workers. @@ -70,8 +70,7 @@ setup_commands: # Note: if you're developing Ray, you probably want to create an AMI that # has your Ray repo pre-cloned. Then, you can replace the pip installs # below with a git checkout (and possibly a recompile). - # TODO(ekl) update this to a wheel from master - - pip install -U https://s3-us-west-2.amazonaws.com/ray-wheels/f5ea44338eca392df3a868035df3901829cc2ca1/ray-0.3.0-cp36-cp36m-manylinux1_x86_64.whl + - pip install -U https://s3-us-west-2.amazonaws.com/ray-wheels/b6c42f96beab3ee00fe4b246e5e9d0479ad379ca/ray-0.3.0-cp36-cp36m-manylinux1_x86_64.whl # Custom commands that will be run on the head node after common setup. head_setup_commands: