Eric Liang b6c42f96be Auto-scale ray clusters based on GCS load metrics (#1348)
This adds (experimental) auto-scaling support for Ray clusters based on GCS load metrics. The auto-scaling algorithm is as follows:

Based on current (instantaneous) load information, we compute the approximate number of "used workers". This is based on the bottleneck resource, e.g. if 8/8 GPUs are used in a 8-node cluster but all the CPUs are idle, the number of used nodes is still counted as 8. This number can also be fractional.
We scale that number by 1 / target_utilization_fraction and round up to determine the target cluster size (subject to the max_workers constraint). The autoscaler control loop takes care of launching new nodes until the target cluster size is met.
When a node is idle for more than idle_timeout_minutes, we remove it from the cluster if that would not drop the cluster size below min_workers.
Note that we'll need to update the wheel in the example yaml file after this PR is merged.
2017-12-31 14:39:57 -08:00
2017-12-07 17:03:58 -08:00
2017-11-30 16:24:34 -08:00
2016-11-22 17:04:24 -08:00
2016-07-28 13:11:13 -07:00
2016-07-08 12:39:11 -07:00
2016-11-22 17:04:24 -08:00

Ray
===

.. image:: https://travis-ci.org/ray-project/ray.svg?branch=master
    :target: https://travis-ci.org/ray-project/ray

.. image:: https://readthedocs.org/projects/ray/badge/?version=latest
    :target: http://ray.readthedocs.io/en/latest/?badge=latest

|

Ray is a flexible, high-performance distributed execution framework.

Ray comes with libraries that accelerate deep learning and reinforcement learning development:

- `Ray.tune`_: Efficient Distributed Hyperparameter Search
- `Ray RLlib`_: A Composable and Scalable Reinforcement Learning Library

.. _`Ray.tune`: http://ray.readthedocs.io/en/latest/tune.html
.. _`Ray RLlib`: http://ray.readthedocs.io/en/latest/rllib.html


Installation
------------

- Ray can be installed on Linux and Mac with ``pip install ray``.
- To build Ray from source, see the instructions for `Ubuntu`_ and `Mac`_.

.. _`Ubuntu`: http://ray.readthedocs.io/en/latest/install-on-ubuntu.html
.. _`Mac`: http://ray.readthedocs.io/en/latest/install-on-macosx.html


Example Program
---------------

+------------------------------------------------+----------------------------------------------+
| **Basic Python**                               | **Distributed with Ray**                     |
+------------------------------------------------+----------------------------------------------+
|.. code:: python                                |.. code-block:: python                        |
|                                                |                                              |
|  import time                                   |  import time                                 |
|                                                |  import ray                                  |
|                                                |                                              |
|                                                |  ray.init()                                  |
|                                                |                                              |
|                                                |  @ray.remote                                 |
|  def f():                                      |  def f():                                    |
|      time.sleep(1)                             |      time.sleep(1)                           |
|      return 1                                  |      return 1                                |
|                                                |                                              |
|  # Execute f serially.                         |  # Execute f in parallel.                    |
|  results = [f() for i in range(4)]             |  object_ids = [f.remote() for i in range(4)] |
|                                                |  results = ray.get(object_ids)               |
+------------------------------------------------+----------------------------------------------+


More Information
----------------

- `Documentation`_
- `Tutorial`_
- `Blog`_
- `Ray HotOS paper`_

.. _`Documentation`: http://ray.readthedocs.io/en/latest/index.html
.. _`Tutorial`: https://github.com/ray-project/tutorial
.. _`Blog`: https://ray-project.github.io/
.. _`Ray HotOS paper`: https://arxiv.org/abs/1703.03924

Getting Involved
----------------

- Ask questions on our mailing list `ray-dev@googlegroups.com`_.
- Please report bugs by submitting a `GitHub issue`_.
- Submit contributions using `pull requests`_.

.. _`ray-dev@googlegroups.com`: https://groups.google.com/forum/#!forum/ray-dev
.. _`GitHub issue`: https://github.com/ray-project/ray/issues
.. _`pull requests`: https://github.com/ray-project/ray/pulls
S
Description
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.
Readme Multiple Licenses 111 MiB
Languages
Python 56.6%
C++ 28.8%
Java 8.5%
TypeScript 1.7%
Starlark 1.4%
Other 2.8%