Docker Updates (#308)

* new path for python build * add flag * build tar using git archive * no exit from start_ray.sh * update Docker instructions * update build docker script * add git revision * fix typo * bug fixes and clarifications * mend * add objectmanager ports to docker instructions * rewording * Small updates to documentation.
2026-07-02 19:05:42 +08:00 · 2017-02-28 18:57:51 -08:00
parent b91d9cba45
commit ad4b03bf7f
7 changed files with 368 additions and 65 deletions
@@ -1,8 +1,32 @@
 #!/bin/bash

-docker build -t ray-project/base-deps docker/base-deps
+while [[ $# -gt 0 ]]
+do
+key="$1"
+case $key in
+    --no-cache)
+    NO_CACHE="--no-cache"
+    ;;
+    --skip-examples)
+    SKIP_EXAMPLES=YES
+    ;;
+    *)
+    echo "Usage: build-docker.sh [ --no-cache ] [ --skip-examples ]"
+    exit 1
+esac
+shift
+done

-tar --exclude './docker' -c . > ./docker/deploy/ray.tar
+# Build base dependencies, allow caching
+docker build $NO_CACHE -t ray-project/base-deps docker/base-deps
+
+# Build the current Ray source
+git rev-parse HEAD > ./docker/deploy/git-rev
+git archive -o ./docker/deploy/ray.tar $(git rev-parse HEAD)
 docker build --no-cache -t ray-project/deploy docker/deploy
-rm ./docker/deploy/ray.tar
-docker build -t ray-project/examples docker/examples
+rm ./docker/deploy/ray.tar ./docker/deploy/git-rev
+
+
+if [ ! $SKIP_EXAMPLES ]; then
+    docker build $NO_CACHE -t ray-project/examples docker/examples
+fi
@@ -43,10 +43,15 @@ extensions = [
 # Add any paths that contain templates here, relative to this directory.
 templates_path = ['_templates']

-# The suffix(es) of source filenames.
-# You can specify multiple suffix as a list of string:
-# source_suffix = ['.rst', '.md']
-source_suffix = '.rst'
+# The suffix of source filenames.
+from recommonmark.parser import CommonMarkParser
+
+# The suffix of source filenames.
+source_suffix = ['.rst', '.md']
+
+source_parsers = {
+   '.md': CommonMarkParser,
+}

 # The encoding of source files.
 #source_encoding = 'utf-8-sig'
@@ -64,9 +69,9 @@ author = u'The Ray Team'
 # built documents.
 #
 # The short X.Y version.
-version = '0.01'
+from ray import __version__ as version
 # The full version, including alpha/beta/rc tags.
-release = '0.01'
+release = version

 # The language for content autogenerated by Sphinx. Refer to documentation
 # for a list of supported languages.
@@ -297,13 +302,3 @@ texinfo_documents = [
 # sudo pip install recommonmark

 # see also http://searchvoidstar.tumblr.com/post/125486358368/making-pdfs-from-markdown-on-readthedocsorg-using
-
-# The suffix of source filenames.
-from recommonmark.parser import CommonMarkParser
-
-# The suffix of source filenames.
-source_suffix = ['.rst', '.md']
-
-source_parsers = {
-   '.md': CommonMarkParser,
-}
@@ -6,7 +6,7 @@ Ray
 learning and reinforcement learning applications.*

 .. toctree::
-   :maxdepth: 0
+   :maxdepth: 1
   :caption: Installation

   install-on-ubuntu.md
@@ -15,7 +15,7 @@ learning and reinforcement learning applications.*
   installation-troubleshooting.md

 .. toctree::
-   :maxdepth: 0
+   :maxdepth: 1
   :caption: Examples

   example-hyperopt.md
@@ -24,7 +24,7 @@ learning and reinforcement learning applications.*
   using-ray-with-tensorflow.md

 .. toctree::
-   :maxdepth: 0
+   :maxdepth: 1
   :caption: Getting Started

   api.rst
@@ -43,3 +43,4 @@ learning and reinforcement learning applications.*

   using-ray-on-a-cluster.md
   using-ray-on-a-large-cluster.md
+   using-ray-and-docker-on-a-cluster.md
@@ -1,10 +1,54 @@
 # Installation on Docker

-You can install Ray on any platform that runs Docker. We do not presently publish Docker images for Ray, but you can build them yourself using the Ray distribution. Using Docker can provide a reliable way to get up and running quickly.
+You can install Ray on any platform that runs Docker. We do not presently publish Docker images for Ray, but you can build them yourself using the Ray distribution.
+
+Using Docker can streamline the build process and provide a reliable way to get up and running quickly.

 ## Install Docker

-The Docker Platform release is available for Mac, Windows, and Linux platforms. Please download the appropriate version from the [Docker website](https://www.docker.com/products/overview#/install_the_platform).
+### Mac, Linux, Windows platforms
+
+The Docker Platform release is available for Mac, Windows, and Linux platforms. Please download the appropriate version from the [Docker website](https://www.docker.com/products/overview#/install_the_platform) and follow the corresponding installation instructions.
+Linux user may find these [alternate instructions](https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-16-04) helpful.
+
+### Docker installation on EC2 with Ubuntu
+
+The instructions below show in detail how to prepare an Amazon EC2 instance running Ubuntu 16.04 for use with Docker.
+
+Apply initialize the package repository and apply system updates:
+
+```
+sudo apt-get update
+sudo apt-get -y dist-upgrade
+```
+
+Install Docker and start the service:
+```
+sudo apt-get install -y docker.io
+sudo service docker start
+```
+
+Add the `ubuntu` user to the `docker` group to allow running Docker commands without sudo:
+```
+sudo usermod -a -G docker ubuntu
+```
+
+Initiate a new login to gain group permissions (alternatively, log out and log back in again):
+
+```
+exec sudo su -l ubuntu
+```
+
+Confirm that docker is running:
+
+```
+docker images
+```
+Should produce an empty table similar to the following:
+```
+REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
+```
+

 ## Clone the Ray repository

@@ -23,21 +67,50 @@ cd ray

 This script creates several Docker images:

- * The `ray-project/ray:deploy` image is a self-contained copy of code and binaries suitable for end users.
- * The `ray-project/ray:examples` adds additional libraries for running examples.
- * Ray developers who want to edit locally on the host filesystem should use the `ray-project/ray:devel` image, which allows local changes to be reflected immediately within the container.
+ * The `ray-project/deploy` image is a self-contained copy of code and binaries suitable for end users.
+ * The `ray-project/examples` adds additional libraries for running examples.
+ * The `ray-project/base-deps` image builds from Ubuntu Xenial and includes Anaconda and other basic dependencies and can serve as a starting point for developers.
+
+Review images by listing them:
+```
+$ docker images
+```
+
+Output should look something like the following:
+```
+REPOSITORY                          TAG                 IMAGE ID            CREATED             SIZE
+ray-project/examples                latest              7584bde65894        4 days ago          3.257 GB
+ray-project/deploy                  latest              970966166c71        4 days ago          2.899 GB
+ray-project/base-deps               latest              f45d66963151        4 days ago          2.649 GB
+ubuntu                              xenial              f49eec89601e        3 weeks ago         129.5 MB
+```
+

 ## Launch Ray in Docker

 Start out by launching the deployment container.

 ```
-docker run --shm-size=1024m -t -i ray-project/ray:deploy
+docker run --shm-size=<shm-size> -t -i ray-project/deploy
 ```

+Replace `<shm-size>` with a limit appropriate for your system, for example `512M` or `2G`.
+The `-t` and `-i` options here are required to support interactive use of the container.
+
+**Note:** Ray requires a **large** amount of shared memory because each object
+store keeps all of its objects in shared memory, so the amount of shared memory
+will limit the size of the object store.
+
+You should now see a prompt that looks something like:
+
+```
+root@ebc78f68d100:/ray#
+```
+
+
 ## Test if the installation succeeded

-To test if the installation was successful, try running some tests.
+To test if the installation was successful, try running some tests. Within the container shell enter the following commands:

 ```
 python test/runtest.py # This tests basic functionality.
@@ -52,55 +125,27 @@ Ray includes a Docker image that includes dependencies necessary for running som

 Launch the examples container.
 ```
-docker run --shm-size=1024m -t -i ray-project/ray:examples
+docker run --shm-size=1024m -t -i ray-project/examples
 ```

 ### Hyperparameter optimization


 ```
-cd ~/ray/examples/hyperopt/
+cd /ray/examples/hyperopt/
 python driver.py
 ```

-See the [Hyperparameter optimization documentation](../examples/hyperopt/README.md).
-
 ### Batch L-BFGS

 ```
-cd ~/ray/examples/lbfgs/
+cd /ray/examples/lbfgs/
 python driver.py
 ```

-See the [Batch L-BFGS documentation](../examples/lbfgs/README.md).
-
 ### Learning to play Pong

 ```
-cd ~/ray/examples/rl_pong/
+cd /ray/examples/rl_pong/
 python driver.py
 ```
-
-See the [Learning to play Pong documentation](../examples/rl_pong/README.md).
-
-
-## Developing with Docker (Experimental)
-
-These steps apply only to Ray developers who prefer to use editing tools on the host machine while building and running Ray within Docker. If you have previously been building locally we suggest that you start with a clean checkout before building with Ray's developer Docker container.
-
-You may see errors while running `setup.sh` on Mac OS X. If you have this problem please try re-running the script. Increasing the memory of Docker's VM (say to 8GB from the default 2GB) seems to help.
-
-
-Launch the developer container.
-
-```
-docker run -v $(pwd):/home/ray-user/ray --shm-size=1024m -t -i ray-project/ray:devel
-```
-
-Build Ray inside of the container.
-
-```
-cd ray
-./setup.sh
-./build.sh
-```
@@ -0,0 +1,236 @@
+# Using Ray and Docker on a Cluster (EXPERIMENTAL)
+
+Packaging and deploying an application using Docker can provide certain advantages. It can make managing dependencies easier, help ensure that each cluster node receives a uniform configuration, and facilitate swapping hardware resources between applications.
+
+
+## Create your Docker image
+
+First build a Ray Docker image by following the instructions for [Installation on Docker](install-on-docker.md).
+This will allow you to create the `ray-project/deploy` image that serves as a basis for using Ray on a cluster with Docker.
+
+Docker images encapsulate the system state that will be used to run nodes in the cluster.
+We recommend building on top of the Ray-provided Docker images to add your application code and dependencies.
+
+You can do this in one of two ways: by building from a customized Dockerfile or by saving an image after entering commands manually into a running container.
+We describe both approaches below.
+
+### Creating a customized Dockerfile
+
+We recommend that you read the official Docker documentation for [Building your own image](https://docs.docker.com/engine/getstarted/step_four/) ahead of starting this section.
+Your customized Dockerfile is a script of commands needed to set up your application,
+possibly packaged in a folder with related resources.
+
+A simple template Dockerfile for a Ray application looks like this:
+
+```
+# Application Dockerfile template
+FROM ray-project/deploy
+RUN git clone <my-project-url>
+RUN <my-project-installation-script>
+```
+
+This file instructs Docker to load the image tagged `ray-project/deploy`, check out the git
+repository at `<my-project-url>`, and then run the script `<my-project-installation-script>`.
+
+Build the image by running something like:
+```
+docker build -t <my-app> .
+```
+Replace `<app-tag>` with a tag of your choice.
+
+
+### Creating a Docker image manually
+
+Launch the `ray-project/deploy` image interactively
+
+```
+docker run -t -i ray-project/deploy
+```
+
+Next, run whatever commands are needed to install your application.
+When you are finished type `exit` to stop the container.
+
+Run
+```
+docker ps -a
+```
+to identify the id of the container you just exited.
+
+Next, commit the container
+```
+docker commit -t <app-tag> <container-id>
+```
+
+Replace `<app-tag>` with a name for your container and replace `<container-id>` id with the hash id of the container used in configuration.
+
+## Publishing your Docker image to a repository
+
+When using Amazon EC2 it can be practical to publish images using the Repositories feature of Elastic Container Service.
+Follow the steps below and see [documentation for creating a repository](http://docs.aws.amazon.com/AmazonECR/latest/userguide/repository-create.html) for additional context.
+
+First ensure that the AWS command-line interface is installed.
+
+```
+sudo apt-get install -y awscli
+```
+
+Next create a repository in Amazon's Elastic Container Registry.
+This results in a shared resource for storing Docker images that will be accessible from all nodes.
+
+
+```
+aws ecr create-repository --repository-name <repository-name> --region=<region>
+```
+
+Replace `<repository-name>` with a string describing the application.
+Replace `<region>` with the AWS region string, e.g., `us-west-2`.
+This should produce output like the following:
+
+```
+{
+    "repository": {
+        "repositoryUri": "123456789012.dkr.ecr.us-west-2.amazonaws.com/my-app",
+        "createdAt": 1487227244.0,
+        "repositoryArn": "arn:aws:ecr:us-west-2:123456789012:repository/my-app",
+        "registryId": "123456789012",
+        "repositoryName": "my-app"
+    }
+}
+```
+
+Take note of the `repositoryUri` string, in this example `123456789012.dkr.ecr.us-west-2.amazonaws.com/my-app`.
+
+
+Tag the Docker image with the repository URI.
+
+```
+docker tag <app-tag> <repository-uri>
+```
+
+Replace the `<app-tag>` with the container name used previously and replace `<repository-uri>` with URI returned by the command used to create the repository.
+
+Log into the repository:
+
+```
+eval $(aws ecr get-login --region <region>)
+```
+
+Replace `<region>` with your selected AWS region.
+
+Push the image to the repository:
+```
+docker push <repository-uri>
+```
+Replace `<repository-uri>` with the URI of your repository. Now other hosts will be able to access your application Docker image.
+
+
+## Starting a cluster
+
+We assume a cluster configuration like that described in instructions for [using Ray on a large cluster](using-ray-on-a-large-cluster.md).
+In particular, we assume that there is a head node that has ssh access to all of the worker nodes, and that there is a file `workers.txt` listing the IP addresses of all worker nodes.
+
+### Install the Docker image on all nodes
+
+Create a script called `setup-docker.sh` on the head node.
+```
+# setup-docker.sh
+sudo apt-get install -y docker.io
+sudo service docker start
+sudo usermod -a -G docker ubuntu
+exec sudo su -l ubuntu
+eval $(aws ecr get-login --region <region>)
+docker pull <repository-uri>
+```
+
+Replace `<repository-uri>` with the URI of the repository created in the previous section.
+Replace `<region>` with the AWS region in which you created that repository.
+This script will install Docker, authenticate the session with the container registry, and download the container image from that registry.
+
+Run `setup-docker.sh` on the head node (if you used the head node to build the Docker image then you can skip this step):
+```
+bash setup-docker.sh
+```
+
+Run `setup-docker.sh` on the worker nodes:
+```
+parallel-ssh -h workers.txt -P -t 0 -I < setup-docker.sh
+```
+
+### Launch Ray cluster using Docker
+
+To start Ray on the head node run the following command:
+
+```
+eval $(aws ecr get-login --region <region>)
+docker run \
+    -d --shm-size=<shm-size> --net=host \
+    <repository-uri> \
+    /ray/scripts/start_ray.sh --head \
+        --object-manager-port=8076 \
+        --redis-port=6379 \
+        --num-workers=<num-workers>
+```
+
+Replace `<repository-uri>` with the URI of the repository.
+Replace `<region>` with the region of the repository.
+Replace `<num-workers>` with the number of workers, e.g., typically a number similar to the number of cores in the system.
+Replace `<shm-size>` with the the amount of shared memory to make available within the Docker container, e.g., `8G`.
+
+
+To start Ray on the worker nodes create a script `start-worker-docker.sh` with content like the following:
+```
+eval $(aws ecr get-login --region <region>)
+docker run -d --shm-size=<shm-size> --net=host \
+    <repository-uri> \
+    /ray/scripts/start_ray.sh \
+        --object-manager-port=8076 \
+        --redis-address=<redis-address> \
+        --num-workers=<num-workers>
+
+```
+
+Replace `<redis-address>` with the string `<head-node-private-ip>:6379` where `<head-node-private-ip>` is the private network IP address of the head node.
+
+Execute the script on the worker nodes:
+```
+parallel-ssh -h workers.txt -P -t 0 -I < setup-worker-docker.sh
+```
+
+
+## Running jobs on a cluster
+
+On the head node, identify the id of the container that you launched as the Ray head.
+
+```
+docker ps
+```
+
+the container id appears in the first column of the output.
+
+Now launch an interactive shell within the container:
+
+```
+docker exec -t -i <container-id> bash
+```
+
+Replace `<container-id>` with the container id found in the previous step.
+
+Next, launch your application program.
+The Python program should contain an initialization command that takes the Redis address as a parameter:
+
+```
+ray.init(redis_address="<redis-address>")
+```
+
+
+## Shutting down a cluster
+
+Kill all running Docker images on the worker nodes:
+```
+parallel-ssh -h workers.txt -P 'docker kill $(docker ps -q)'
+```
+
+Kill all running Docker images on the head node:
+```
+docker kill $(docker ps -q)
+```
@@ -10,5 +10,5 @@ RUN echo 'export PATH=/opt/conda/bin:$PATH' > /etc/profile.d/conda.sh \
    && /bin/bash /tmp/anaconda.sh -b -p /opt/conda \
    && rm /tmp/anaconda.sh
 ENV PATH "/opt/conda/bin:$PATH"
-RUN conda install libgcc
-RUN pip install --upgrade pip
+RUN conda install -y libgcc
+RUN pip install --upgrade pip cloudpickle
@@ -3,6 +3,8 @@

 FROM ray-project/base-deps
 ADD ray.tar /ray
-WORKDIR /ray/lib/python
+ADD git-rev /ray/git-rev
+WORKDIR /ray/python
 RUN python setup.py install
 WORKDIR /ray
+RUN echo "tail -f /dev/null" >> scripts/start_ray.sh