wassname/ray: An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library. - ray

mirror of https://github.com/wassname/ray.git synced 2026-06-27 20:22:39 +08:00

T

Stephanie Wang d49b4bef0a [xray] Basic task reconstruction mechanism (#2526 )

## What do these changes do?

This implements basic task reconstruction in raylet. There are two parts to this PR:
1. Reconstruction suppression through the `TaskReconstructionLog`. This prevents two raylets from reconstructing the same task if they decide simultaneously (via the logic in #2497) that reconstruction is necessary.
2. Task resubmission once a raylet becomes responsible for reconstructing a task.

Reconstruction is quite slow in this PR, especially for long chains of dependent tasks. This is mainly due to the lease table mechanism, where nodes may wait too long before trying to reconstruct a task. There are two ways to improve this:
1. Expire entries in the lease table using Redis `PEXPIRE`. This is a WIP and I may include it in this PR.
2. Introduce a "fast path" for reconstructing dependencies of a re-executed task. Normally, we wait for an initial timeout before checking whether a task requires reconstruction. However, if a task requires reconstruction, then it's likely that its dependencies also require reconstruction. In this case, we could skip the initial timeout before checking the GCS to see whether reconstruction is necessary (e.g., if the object has been evicted).

Since handling failures of other raylets is probably not yet complete in master, this only turns back on Python tests for reconstructing evicted objects.

2018-08-09 07:24:37 -07:00

.github

Add docs for contributors. (#1191 )

2017-11-10 00:40:19 -08:00

.travis

[rllib] format with yapf (#2427 )

2018-07-19 15:30:36 -07:00

cmake/Modules

Upgrade arrow to include the plasma TensorFlow op (#2412 )

2018-07-18 12:33:02 -07:00

doc

[xray] Adds a driver table. (#2289 )

2018-08-08 23:41:40 -07:00

docker

[tune] Fix Categorical Space + Add Keras Example (#2401 )

2018-07-17 23:52:52 +02:00

examples

[rllib] _init renamed to _build_layers in example

2018-07-12 19:21:58 +02:00

java

[xray] Basic task reconstruction mechanism (#2526 )

2018-08-09 07:24:37 -07:00

python

[xray] Adds a driver table. (#2289 )

2018-08-08 23:41:40 -07:00

site

Add parameter server blog post. (#2398 )

2018-07-16 21:51:39 -07:00

src

[xray] Basic task reconstruction mechanism (#2526 )

2018-08-09 07:24:37 -07:00

test

[xray] Basic task reconstruction mechanism (#2526 )

2018-08-09 07:24:37 -07:00

thirdparty/scripts

Update arrow to include plasma memory footprint reduction (#2545 )

2018-08-02 14:37:37 -07:00

.clang-format

Implement object table notification subscriptions and switch to using Redis modules for object table. (#134 )

2016-12-18 18:19:02 -08:00

.gitignore

[xray] Adds a driver table. (#2289 )

2018-08-08 23:41:40 -07:00

.style.yapf

YAPF, take 3 (#2098 )

2018-05-19 16:07:28 -07:00

.travis.yml

[xray] Adds a driver table. (#2289 )

2018-08-08 23:41:40 -07:00

build-docker.sh

adding -x flag for better debugging during builds (#1079 )

2017-10-04 13:56:14 -07:00

build.sh

unify build dir for Python and Java (#2171 )

2018-06-01 16:28:27 -07:00

CMakeLists.txt

Upgrade arrow to include the plasma TensorFlow op (#2412 )

2018-07-18 12:33:02 -07:00

CONTRIBUTING.rst

Replace special single quote with regular single quote. (#1693 )

2018-03-10 20:36:01 -08:00

LICENSE

[rllib] Basic IMPALA implementation (using deepmind's reference vtrace.py) (#2504 )

2018-08-01 20:53:53 -07:00

pylintrc

adding pylint (#233 )

2016-07-08 12:39:11 -07:00

README.rst

Update Travis CI badge from travis-ci.org to travis-ci.com. (#2155 )

2018-05-29 16:44:02 -07:00

scripts

Improve yapf speed and document its usage (#2160 )

2018-06-05 20:22:11 -07:00

setup_thirdparty.sh

Use absolute path to get to thirdparty dir (#2442 )

2018-07-20 15:12:25 -07:00

README.rst

Ray
===

.. image:: https://travis-ci.com/ray-project/ray.svg?branch=master
    :target: https://travis-ci.com/ray-project/ray

.. image:: https://readthedocs.org/projects/ray/badge/?version=latest
    :target: http://ray.readthedocs.io/en/latest/?badge=latest

|

Ray is a flexible, high-performance distributed execution framework.


Ray is easy to install: ``pip install ray``

Example Use
-----------

+------------------------------------------------+----------------------------------------------------+
| **Basic Python**                               | **Distributed with Ray**                           |
+------------------------------------------------+----------------------------------------------------+
|.. code-block:: python                          |.. code-block:: python                              |
|                                                |                                                    |
|  # Execute f serially.                         |  # Execute f in parallel.                          |
|                                                |                                                    |
|                                                |  @ray.remote                                       |
|  def f():                                      |  def f():                                          |
|      time.sleep(1)                             |      time.sleep(1)                                 |
|      return 1                                  |      return 1                                      |
|                                                |                                                    |
|                                                |                                                    |
|                                                |  ray.init()                                        |
|  results = [f() for i in range(4)]             |  results = ray.get([f.remote() for i in range(4)]) |
+------------------------------------------------+----------------------------------------------------+


Ray comes with libraries that accelerate deep learning and reinforcement learning development:

- `Ray Tune`_: Hyperparameter Optimization Framework
- `Ray RLlib`_: Scalable Reinforcement Learning

.. _`Ray Tune`: http://ray.readthedocs.io/en/latest/tune.html
.. _`Ray RLlib`: http://ray.readthedocs.io/en/latest/rllib.html

Installation
------------

Ray can be installed on Linux and Mac with ``pip install ray``.

To build Ray from source or to install the nightly versions, see the `installation documentation`_.

.. _`installation documentation`: http://ray.readthedocs.io/en/latest/installation.html

More Information
----------------

- `Documentation`_
- `Tutorial`_
- `Blog`_
- `Ray paper`_
- `Ray HotOS paper`_

.. _`Documentation`: http://ray.readthedocs.io/en/latest/index.html
.. _`Tutorial`: https://github.com/ray-project/tutorial
.. _`Blog`: https://ray-project.github.io/
.. _`Ray paper`: https://arxiv.org/abs/1712.05889
.. _`Ray HotOS paper`: https://arxiv.org/abs/1703.03924

Getting Involved
----------------

- Ask questions on our mailing list `ray-dev@googlegroups.com`_.
- Please report bugs by submitting a `GitHub issue`_.
- Submit contributions using `pull requests`_.

.. _`ray-dev@googlegroups.com`: https://groups.google.com/forum/#!forum/ray-dev
.. _`GitHub issue`: https://github.com/ray-project/ray/issues
.. _`pull requests`: https://github.com/ray-project/ray/pulls

Description

An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Readme Multiple Licenses 111 MiB

Languages

Python 56.6%

C++ 28.8%

Java 8.5%

TypeScript 1.7%

Starlark 1.4%

Other 2.8%