Docker Updates (#308)

* new path for python build

* add flag

* build tar using git archive

* no exit from start_ray.sh

* update Docker instructions

* update build docker script

* add git revision

* fix typo

* bug fixes and clarifications

* mend

* add objectmanager ports to docker instructions

* rewording

* Small updates to documentation.
This commit is contained in:
Johann Schleier-Smith
2017-02-28 18:57:51 -08:00
committed by Robert Nishihara
parent b91d9cba45
commit ad4b03bf7f
7 changed files with 368 additions and 65 deletions
+28 -4
View File
@@ -1,8 +1,32 @@
#!/bin/bash
docker build -t ray-project/base-deps docker/base-deps
while [[ $# -gt 0 ]]
do
key="$1"
case $key in
--no-cache)
NO_CACHE="--no-cache"
;;
--skip-examples)
SKIP_EXAMPLES=YES
;;
*)
echo "Usage: build-docker.sh [ --no-cache ] [ --skip-examples ]"
exit 1
esac
shift
done
tar --exclude './docker' -c . > ./docker/deploy/ray.tar
# Build base dependencies, allow caching
docker build $NO_CACHE -t ray-project/base-deps docker/base-deps
# Build the current Ray source
git rev-parse HEAD > ./docker/deploy/git-rev
git archive -o ./docker/deploy/ray.tar $(git rev-parse HEAD)
docker build --no-cache -t ray-project/deploy docker/deploy
rm ./docker/deploy/ray.tar
docker build -t ray-project/examples docker/examples
rm ./docker/deploy/ray.tar ./docker/deploy/git-rev
if [ ! $SKIP_EXAMPLES ]; then
docker build $NO_CACHE -t ray-project/examples docker/examples
fi
+11 -16
View File
@@ -43,10 +43,15 @@ extensions = [
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
# source_suffix = ['.rst', '.md']
source_suffix = '.rst'
# The suffix of source filenames.
from recommonmark.parser import CommonMarkParser
# The suffix of source filenames.
source_suffix = ['.rst', '.md']
source_parsers = {
'.md': CommonMarkParser,
}
# The encoding of source files.
#source_encoding = 'utf-8-sig'
@@ -64,9 +69,9 @@ author = u'The Ray Team'
# built documents.
#
# The short X.Y version.
version = '0.01'
from ray import __version__ as version
# The full version, including alpha/beta/rc tags.
release = '0.01'
release = version
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
@@ -297,13 +302,3 @@ texinfo_documents = [
# sudo pip install recommonmark
# see also http://searchvoidstar.tumblr.com/post/125486358368/making-pdfs-from-markdown-on-readthedocsorg-using
# The suffix of source filenames.
from recommonmark.parser import CommonMarkParser
# The suffix of source filenames.
source_suffix = ['.rst', '.md']
source_parsers = {
'.md': CommonMarkParser,
}
+4 -3
View File
@@ -6,7 +6,7 @@ Ray
learning and reinforcement learning applications.*
.. toctree::
:maxdepth: 0
:maxdepth: 1
:caption: Installation
install-on-ubuntu.md
@@ -15,7 +15,7 @@ learning and reinforcement learning applications.*
installation-troubleshooting.md
.. toctree::
:maxdepth: 0
:maxdepth: 1
:caption: Examples
example-hyperopt.md
@@ -24,7 +24,7 @@ learning and reinforcement learning applications.*
using-ray-with-tensorflow.md
.. toctree::
:maxdepth: 0
:maxdepth: 1
:caption: Getting Started
api.rst
@@ -43,3 +43,4 @@ learning and reinforcement learning applications.*
using-ray-on-a-cluster.md
using-ray-on-a-large-cluster.md
using-ray-and-docker-on-a-cluster.md
+84 -39
View File
@@ -1,10 +1,54 @@
# Installation on Docker
You can install Ray on any platform that runs Docker. We do not presently publish Docker images for Ray, but you can build them yourself using the Ray distribution. Using Docker can provide a reliable way to get up and running quickly.
You can install Ray on any platform that runs Docker. We do not presently publish Docker images for Ray, but you can build them yourself using the Ray distribution.
Using Docker can streamline the build process and provide a reliable way to get up and running quickly.
## Install Docker
The Docker Platform release is available for Mac, Windows, and Linux platforms. Please download the appropriate version from the [Docker website](https://www.docker.com/products/overview#/install_the_platform).
### Mac, Linux, Windows platforms
The Docker Platform release is available for Mac, Windows, and Linux platforms. Please download the appropriate version from the [Docker website](https://www.docker.com/products/overview#/install_the_platform) and follow the corresponding installation instructions.
Linux user may find these [alternate instructions](https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-16-04) helpful.
### Docker installation on EC2 with Ubuntu
The instructions below show in detail how to prepare an Amazon EC2 instance running Ubuntu 16.04 for use with Docker.
Apply initialize the package repository and apply system updates:
```
sudo apt-get update
sudo apt-get -y dist-upgrade
```
Install Docker and start the service:
```
sudo apt-get install -y docker.io
sudo service docker start
```
Add the `ubuntu` user to the `docker` group to allow running Docker commands without sudo:
```
sudo usermod -a -G docker ubuntu
```
Initiate a new login to gain group permissions (alternatively, log out and log back in again):
```
exec sudo su -l ubuntu
```
Confirm that docker is running:
```
docker images
```
Should produce an empty table similar to the following:
```
REPOSITORY TAG IMAGE ID CREATED SIZE
```
## Clone the Ray repository
@@ -23,21 +67,50 @@ cd ray
This script creates several Docker images:
* The `ray-project/ray:deploy` image is a self-contained copy of code and binaries suitable for end users.
* The `ray-project/ray:examples` adds additional libraries for running examples.
* Ray developers who want to edit locally on the host filesystem should use the `ray-project/ray:devel` image, which allows local changes to be reflected immediately within the container.
* The `ray-project/deploy` image is a self-contained copy of code and binaries suitable for end users.
* The `ray-project/examples` adds additional libraries for running examples.
* The `ray-project/base-deps` image builds from Ubuntu Xenial and includes Anaconda and other basic dependencies and can serve as a starting point for developers.
Review images by listing them:
```
$ docker images
```
Output should look something like the following:
```
REPOSITORY TAG IMAGE ID CREATED SIZE
ray-project/examples latest 7584bde65894 4 days ago 3.257 GB
ray-project/deploy latest 970966166c71 4 days ago 2.899 GB
ray-project/base-deps latest f45d66963151 4 days ago 2.649 GB
ubuntu xenial f49eec89601e 3 weeks ago 129.5 MB
```
## Launch Ray in Docker
Start out by launching the deployment container.
```
docker run --shm-size=1024m -t -i ray-project/ray:deploy
docker run --shm-size=<shm-size> -t -i ray-project/deploy
```
Replace `<shm-size>` with a limit appropriate for your system, for example `512M` or `2G`.
The `-t` and `-i` options here are required to support interactive use of the container.
**Note:** Ray requires a **large** amount of shared memory because each object
store keeps all of its objects in shared memory, so the amount of shared memory
will limit the size of the object store.
You should now see a prompt that looks something like:
```
root@ebc78f68d100:/ray#
```
## Test if the installation succeeded
To test if the installation was successful, try running some tests.
To test if the installation was successful, try running some tests. Within the container shell enter the following commands:
```
python test/runtest.py # This tests basic functionality.
@@ -52,55 +125,27 @@ Ray includes a Docker image that includes dependencies necessary for running som
Launch the examples container.
```
docker run --shm-size=1024m -t -i ray-project/ray:examples
docker run --shm-size=1024m -t -i ray-project/examples
```
### Hyperparameter optimization
```
cd ~/ray/examples/hyperopt/
cd /ray/examples/hyperopt/
python driver.py
```
See the [Hyperparameter optimization documentation](../examples/hyperopt/README.md).
### Batch L-BFGS
```
cd ~/ray/examples/lbfgs/
cd /ray/examples/lbfgs/
python driver.py
```
See the [Batch L-BFGS documentation](../examples/lbfgs/README.md).
### Learning to play Pong
```
cd ~/ray/examples/rl_pong/
cd /ray/examples/rl_pong/
python driver.py
```
See the [Learning to play Pong documentation](../examples/rl_pong/README.md).
## Developing with Docker (Experimental)
These steps apply only to Ray developers who prefer to use editing tools on the host machine while building and running Ray within Docker. If you have previously been building locally we suggest that you start with a clean checkout before building with Ray's developer Docker container.
You may see errors while running `setup.sh` on Mac OS X. If you have this problem please try re-running the script. Increasing the memory of Docker's VM (say to 8GB from the default 2GB) seems to help.
Launch the developer container.
```
docker run -v $(pwd):/home/ray-user/ray --shm-size=1024m -t -i ray-project/ray:devel
```
Build Ray inside of the container.
```
cd ray
./setup.sh
./build.sh
```
@@ -0,0 +1,236 @@
# Using Ray and Docker on a Cluster (EXPERIMENTAL)
Packaging and deploying an application using Docker can provide certain advantages. It can make managing dependencies easier, help ensure that each cluster node receives a uniform configuration, and facilitate swapping hardware resources between applications.
## Create your Docker image
First build a Ray Docker image by following the instructions for [Installation on Docker](install-on-docker.md).
This will allow you to create the `ray-project/deploy` image that serves as a basis for using Ray on a cluster with Docker.
Docker images encapsulate the system state that will be used to run nodes in the cluster.
We recommend building on top of the Ray-provided Docker images to add your application code and dependencies.
You can do this in one of two ways: by building from a customized Dockerfile or by saving an image after entering commands manually into a running container.
We describe both approaches below.
### Creating a customized Dockerfile
We recommend that you read the official Docker documentation for [Building your own image](https://docs.docker.com/engine/getstarted/step_four/) ahead of starting this section.
Your customized Dockerfile is a script of commands needed to set up your application,
possibly packaged in a folder with related resources.
A simple template Dockerfile for a Ray application looks like this:
```
# Application Dockerfile template
FROM ray-project/deploy
RUN git clone <my-project-url>
RUN <my-project-installation-script>
```
This file instructs Docker to load the image tagged `ray-project/deploy`, check out the git
repository at `<my-project-url>`, and then run the script `<my-project-installation-script>`.
Build the image by running something like:
```
docker build -t <my-app> .
```
Replace `<app-tag>` with a tag of your choice.
### Creating a Docker image manually
Launch the `ray-project/deploy` image interactively
```
docker run -t -i ray-project/deploy
```
Next, run whatever commands are needed to install your application.
When you are finished type `exit` to stop the container.
Run
```
docker ps -a
```
to identify the id of the container you just exited.
Next, commit the container
```
docker commit -t <app-tag> <container-id>
```
Replace `<app-tag>` with a name for your container and replace `<container-id>` id with the hash id of the container used in configuration.
## Publishing your Docker image to a repository
When using Amazon EC2 it can be practical to publish images using the Repositories feature of Elastic Container Service.
Follow the steps below and see [documentation for creating a repository](http://docs.aws.amazon.com/AmazonECR/latest/userguide/repository-create.html) for additional context.
First ensure that the AWS command-line interface is installed.
```
sudo apt-get install -y awscli
```
Next create a repository in Amazon's Elastic Container Registry.
This results in a shared resource for storing Docker images that will be accessible from all nodes.
```
aws ecr create-repository --repository-name <repository-name> --region=<region>
```
Replace `<repository-name>` with a string describing the application.
Replace `<region>` with the AWS region string, e.g., `us-west-2`.
This should produce output like the following:
```
{
"repository": {
"repositoryUri": "123456789012.dkr.ecr.us-west-2.amazonaws.com/my-app",
"createdAt": 1487227244.0,
"repositoryArn": "arn:aws:ecr:us-west-2:123456789012:repository/my-app",
"registryId": "123456789012",
"repositoryName": "my-app"
}
}
```
Take note of the `repositoryUri` string, in this example `123456789012.dkr.ecr.us-west-2.amazonaws.com/my-app`.
Tag the Docker image with the repository URI.
```
docker tag <app-tag> <repository-uri>
```
Replace the `<app-tag>` with the container name used previously and replace `<repository-uri>` with URI returned by the command used to create the repository.
Log into the repository:
```
eval $(aws ecr get-login --region <region>)
```
Replace `<region>` with your selected AWS region.
Push the image to the repository:
```
docker push <repository-uri>
```
Replace `<repository-uri>` with the URI of your repository. Now other hosts will be able to access your application Docker image.
## Starting a cluster
We assume a cluster configuration like that described in instructions for [using Ray on a large cluster](using-ray-on-a-large-cluster.md).
In particular, we assume that there is a head node that has ssh access to all of the worker nodes, and that there is a file `workers.txt` listing the IP addresses of all worker nodes.
### Install the Docker image on all nodes
Create a script called `setup-docker.sh` on the head node.
```
# setup-docker.sh
sudo apt-get install -y docker.io
sudo service docker start
sudo usermod -a -G docker ubuntu
exec sudo su -l ubuntu
eval $(aws ecr get-login --region <region>)
docker pull <repository-uri>
```
Replace `<repository-uri>` with the URI of the repository created in the previous section.
Replace `<region>` with the AWS region in which you created that repository.
This script will install Docker, authenticate the session with the container registry, and download the container image from that registry.
Run `setup-docker.sh` on the head node (if you used the head node to build the Docker image then you can skip this step):
```
bash setup-docker.sh
```
Run `setup-docker.sh` on the worker nodes:
```
parallel-ssh -h workers.txt -P -t 0 -I < setup-docker.sh
```
### Launch Ray cluster using Docker
To start Ray on the head node run the following command:
```
eval $(aws ecr get-login --region <region>)
docker run \
-d --shm-size=<shm-size> --net=host \
<repository-uri> \
/ray/scripts/start_ray.sh --head \
--object-manager-port=8076 \
--redis-port=6379 \
--num-workers=<num-workers>
```
Replace `<repository-uri>` with the URI of the repository.
Replace `<region>` with the region of the repository.
Replace `<num-workers>` with the number of workers, e.g., typically a number similar to the number of cores in the system.
Replace `<shm-size>` with the the amount of shared memory to make available within the Docker container, e.g., `8G`.
To start Ray on the worker nodes create a script `start-worker-docker.sh` with content like the following:
```
eval $(aws ecr get-login --region <region>)
docker run -d --shm-size=<shm-size> --net=host \
<repository-uri> \
/ray/scripts/start_ray.sh \
--object-manager-port=8076 \
--redis-address=<redis-address> \
--num-workers=<num-workers>
```
Replace `<redis-address>` with the string `<head-node-private-ip>:6379` where `<head-node-private-ip>` is the private network IP address of the head node.
Execute the script on the worker nodes:
```
parallel-ssh -h workers.txt -P -t 0 -I < setup-worker-docker.sh
```
## Running jobs on a cluster
On the head node, identify the id of the container that you launched as the Ray head.
```
docker ps
```
the container id appears in the first column of the output.
Now launch an interactive shell within the container:
```
docker exec -t -i <container-id> bash
```
Replace `<container-id>` with the container id found in the previous step.
Next, launch your application program.
The Python program should contain an initialization command that takes the Redis address as a parameter:
```
ray.init(redis_address="<redis-address>")
```
## Shutting down a cluster
Kill all running Docker images on the worker nodes:
```
parallel-ssh -h workers.txt -P 'docker kill $(docker ps -q)'
```
Kill all running Docker images on the head node:
```
docker kill $(docker ps -q)
```
+2 -2
View File
@@ -10,5 +10,5 @@ RUN echo 'export PATH=/opt/conda/bin:$PATH' > /etc/profile.d/conda.sh \
&& /bin/bash /tmp/anaconda.sh -b -p /opt/conda \
&& rm /tmp/anaconda.sh
ENV PATH "/opt/conda/bin:$PATH"
RUN conda install libgcc
RUN pip install --upgrade pip
RUN conda install -y libgcc
RUN pip install --upgrade pip cloudpickle
+3 -1
View File
@@ -3,6 +3,8 @@
FROM ray-project/base-deps
ADD ray.tar /ray
WORKDIR /ray/lib/python
ADD git-rev /ray/git-rev
WORKDIR /ray/python
RUN python setup.py install
WORKDIR /ray
RUN echo "tail -f /dev/null" >> scripts/start_ray.sh