wassname/ray: An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library. - ray

mirror of https://github.com/wassname/ray.git synced 2026-07-02 02:42:52 +08:00

T

Stephanie Wang f76ce836b2 Distributed ref counting for serialized ObjectIDs (#6945 )

* Skeleton plus a unit test for simple borrower case

* First unit test passes - forward an ID and task returns with 1 submitted task pending on the inner ID

* Invariant for contained_in

* Unit test passes for testing task return without creating a borrower

* Wrap ref count functionality in test case

* Fix bad delete

* Unit test and fix for borrowers creating more borrowers

* Unit test and fix for simple borrowing, but owner sends call after borrower's ref count goes to 0

* Refactor:
- keep a sentinel ref count for task argument IDs
- keep contained_in_borrowed in addition to contained_in_owned

* Unit test for nested IDs passes

* Refactor so that an object ID can only be contained in 1 borrowed ID at a time

* Add check

* Fix

* Unit test (passes) to test nesting object IDs but no borrowers created

* Unit test for nested objects from different owners passes, refactor to unset contained_in when popping refs

* Unit tests for borrowers receiving an ObjectID from multiple sources,
skip adding ownership info if we already have it to handle duplicate
refs

* Unit test for returning object ID passes

* More unit tests for returning object IDs pass

* Add serialized ID tests

* fix serialization issue

* remove swap

* It builds!

* debugging and some fixes:
- register handler for WaitForRefRemoved
- don't create a python reference for arg IDs
- pass in client factory into ReferenceCounter
- fix bad decrement in PopBorrowerRefs

* Fix accounting for serialized IDs:
- don't decrement for IDs on dependency resolution, wait until task finished
- add object IDs that were inlined when building the arguments to the task spec, pin these on the task executor until task finishes

* mu_ -> mutex_

* lint

* fix build

* clear outer_object_id

* add direct call type check

* Fix test for direct call IDs and return IDs for actor calls

* Fix CoreWorkerClient.Addr()

* Remove unneeded lock

* Remove unnecessary ObjectID refs

* Fix worker holding serialized refs test

* Fix hex IDs

* fix

* fix tests

* fix tests

* refactor and cleanups

* lint

* Put inlined Ids in task args and some cleanup

* Add back gc.collect() line for test case

* Refactor and fixes:
- store inlined IDs in RayObject
- allow storing objects with inlined IDs in memory store
- pin objects that were promoted to plasma

* oops

* make sure worker ID is set in address, pass in rpc::Address to CoreWorkerClient

* todos

* cleanups and test builds

* Fix tests

* Add feature flag

* cleanups

* address comments and some cleanups

* cleanup

* fix recursive test

* Comments for tests

* Turn off ref counting by default

* Skip tests

* Fix some bugs for test_array.py, java build

* Don't include nested objects in the ref count when the feature flag is off

* C++ feature flag does not work...

* Remove

* Turn on python tests and add a warning when plasma objects are evicted before being pinned

* Fix build and remove irrelevant test

* Fix for java

* Revert "Fix build and remove irrelevant test"

This reverts commit 056cca9b263ed05b0f9ab2250907338edcbca2d5.

* Fix ray.internal.free

* Fixes and skip some flaky tests

* fix java build

* fix windows build

* Add IDs contained in owned objects

* Update src/ray/protobuf/core_worker.proto

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/core_worker/reference_count.cc

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/protobuf/core_worker.proto

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/protobuf/core_worker.proto

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/core_worker/reference_count.h

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/core_worker/reference_count.h

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/core_worker/reference_count.cc

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Apply suggestions from code review

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* update

* Try to fix ::test_direct_call_serialized_id_eviction

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>

2020-02-18 18:21:34 -08:00

.github

Fix build errors and add more targets to Windows builds (#6811 )

2020-02-11 16:49:33 -08:00

bazel

Revert "Use Boost.Process instead of pid_t (#6510 )" (#6909 )

2020-01-26 10:26:44 -06:00

Revert "Removing Pyarrow dependency (#7146 )" (#7209 )

2020-02-18 14:12:06 -08:00

deploy/ray-operator

Fix build errors and add more targets to Windows builds (#6811 )

2020-02-11 16:49:33 -08:00

doc

Revert "Removing Pyarrow dependency (#7146 )" (#7209 )

2020-02-18 14:12:06 -08:00

docker

[tune] Remove keras dependency (#6827 )

2020-01-18 23:24:42 -08:00

java

[xlang] Cross language serialize ActorHandle (#7134 )

2020-02-17 20:44:56 +08:00

python

Distributed ref counting for serialized ObjectIDs (#6945 )

2020-02-18 18:21:34 -08:00

rllib

Revert "Removing Pyarrow dependency (#7146 )" (#7209 )

2020-02-18 14:12:06 -08:00

src

Distributed ref counting for serialized ObjectIDs (#6945 )

2020-02-18 18:21:34 -08:00

streaming

Distributed ref counting for serialized ObjectIDs (#6945 )

2020-02-18 18:21:34 -08:00

thirdparty

Make Cython rules more consistent for Bazel (#6840 )

2020-02-10 10:45:54 -08:00

.bazelrc

Fix build errors and add more targets to Windows builds (#6811 )

2020-02-11 16:49:33 -08:00

.clang-format

Remove legacy Ray code. (#3121 )

2018-10-26 13:36:58 -07:00

.editorconfig

Use standard EditorConfig file for editor settings (#6861 )

2020-01-20 08:03:06 -08:00

.gitignore

Fix hang if actor object id is returned from a task that exits (#6885 )

2020-02-11 20:28:13 -08:00

.style.yapf

YAPF, take 3 (#2098 )

2018-05-19 16:07:28 -07:00

.travis.yml

Add ray.util package and move libraries from experimental (#7100 )

2020-02-18 13:43:19 -08:00

build-docker.sh

Find bazel even if it isn't in the PATH. (#4729 )

2019-05-01 21:29:48 -07:00

BUILD.bazel

Add ray.util package and move libraries from experimental (#7100 )

2020-02-18 13:43:19 -08:00

build.sh

Revert "Removing Pyarrow dependency (#7146 )" (#7209 )

2020-02-18 14:12:06 -08:00

CONTRIBUTING.rst

Add linting pre-push hook (#5154 )

2019-07-09 21:49:12 -07:00

LICENSE

[rllib] add augmented random search (#2714 )

2018-08-24 22:20:02 -07:00

pylintrc

adding pylint (#233 )

2016-07-08 12:39:11 -07:00

README.rst

Add ray.util package and move libraries from experimental (#7100 )

2020-02-18 13:43:19 -08:00

scripts

Lint script link broken, also lint filter was broken for generated py files (#4133 )

2019-02-22 17:33:08 -08:00

setup_hooks.sh

Make sure pre-push is executable. (#7079 )

2020-02-07 11:38:14 -08:00

WORKSPACE

Use GRCP and Bazel 1.0 (#6002 )

2019-11-08 15:58:28 -08:00

README.rst

.. image:: https://github.com/ray-project/ray/raw/master/doc/source/images/ray_header_logo.png

.. image:: https://travis-ci.com/ray-project/ray.svg?branch=master
    :target: https://travis-ci.com/ray-project/ray

.. image:: https://readthedocs.org/projects/ray/badge/?version=latest
    :target: http://ray.readthedocs.io/en/latest/?badge=latest

|


**Ray is a fast and simple framework for building and running distributed applications.**

Ray is packaged with the following libraries for accelerating machine learning workloads:

- `Tune`_: Scalable Hyperparameter Tuning
- `RLlib`_: Scalable Reinforcement Learning
- `RaySGD <https://ray.readthedocs.io/en/latest/raysgd/raysgd.html>`__: Distributed Training Wrappers

Install Ray with: ``pip install ray``. For nightly wheels, see the
`Installation page <https://ray.readthedocs.io/en/latest/installation.html>`__.

**NOTE:** `We are deprecating Python 2 support soon.`_

.. _`We are deprecating Python 2 support soon.`: https://github.com/ray-project/ray/issues/6580

Quick Start
-----------

Execute Python functions in parallel.

.. code-block:: python

    import ray
    ray.init()

    @ray.remote
    def f(x):
        return x * x

    futures = [f.remote(i) for i in range(4)]
    print(ray.get(futures))

To use Ray's actor model:

.. code-block:: python


    import ray
    ray.init()

    @ray.remote
    class Counter(object):
        def __init__(self):
            self.n = 0

        def increment(self):
            self.n += 1

        def read(self):
            return self.n

    counters = [Counter.remote() for i in range(4)]
    [c.increment.remote() for c in counters]
    futures = [c.read.remote() for c in counters]
    print(ray.get(futures))


Ray programs can run on a single machine, and can also seamlessly scale to large clusters. To execute the above Ray script in the cloud, just download `this configuration file <https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/aws/example-full.yaml>`__, and run:

``ray submit [CLUSTER.YAML] example.py --start``

Read more about `launching clusters <https://ray.readthedocs.io/en/latest/autoscaling.html>`_.

Tune Quick Start
----------------

.. image:: https://github.com/ray-project/ray/raw/master/doc/source/images/tune-wide.png

`Tune`_ is a library for hyperparameter tuning at any scale.

- Launch a multi-node distributed hyperparameter sweep in less than 10 lines of code.
- Supports any deep learning framework, including PyTorch, TensorFlow, and Keras.
- Visualize results with `TensorBoard <https://www.tensorflow.org/get_started/summaries_and_tensorboard>`__.
- Choose among scalable SOTA algorithms such as `Population Based Training (PBT)`_, `Vizier's Median Stopping Rule`_, `HyperBand/ASHA`_.
- Tune integrates with many optimization libraries such as `Facebook Ax <http://ax.dev>`_, `HyperOpt <https://github.com/hyperopt/hyperopt>`_, and `Bayesian Optimization <https://github.com/fmfn/BayesianOptimization>`_ and enables you to scale them transparently.

To run this example, you will need to install the following:

.. code-block:: bash

    $ pip install ray[tune] torch torchvision filelock


This example runs a parallel grid search to train a Convolutional Neural Network using PyTorch.

.. code-block:: python


    import torch.optim as optim
    from ray import tune
    from ray.tune.examples.mnist_pytorch import (
        get_data_loaders, ConvNet, train, test)


    def train_mnist(config):
        train_loader, test_loader = get_data_loaders()
        model = ConvNet()
        optimizer = optim.SGD(model.parameters(), lr=config["lr"])
        for i in range(10):
            train(model, optimizer, train_loader)
            acc = test(model, test_loader)
            tune.track.log(mean_accuracy=acc)


    analysis = tune.run(
        train_mnist, config={"lr": tune.grid_search([0.001, 0.01, 0.1])})

    print("Best config: ", analysis.get_best_config(metric="mean_accuracy"))

    # Get a dataframe for analyzing trial results.
    df = analysis.dataframe()

If TensorBoard is installed, automatically visualize all trial results:

.. code-block:: bash

    tensorboard --logdir ~/ray_results

.. _`Tune`: https://ray.readthedocs.io/en/latest/tune.html
.. _`Population Based Training (PBT)`: https://ray.readthedocs.io/en/latest/tune-schedulers.html#population-based-training-pbt
.. _`Vizier's Median Stopping Rule`: https://ray.readthedocs.io/en/latest/tune-schedulers.html#median-stopping-rule
.. _`HyperBand/ASHA`: https://ray.readthedocs.io/en/latest/tune-schedulers.html#asynchronous-hyperband

RLlib Quick Start
-----------------

.. image:: https://github.com/ray-project/ray/raw/master/doc/source/images/rllib-wide.jpg

`RLlib`_ is an open-source library for reinforcement learning built on top of Ray that offers both high scalability and a unified API for a variety of applications.

.. code-block:: bash

  pip install tensorflow  # or tensorflow-gpu
  pip install ray[rllib]  # also recommended: ray[debug]

.. code-block:: python

    import gym
    from gym.spaces import Discrete, Box
    from ray import tune

    class SimpleCorridor(gym.Env):
        def __init__(self, config):
            self.end_pos = config["corridor_length"]
            self.cur_pos = 0
            self.action_space = Discrete(2)
            self.observation_space = Box(0.0, self.end_pos, shape=(1, ))

        def reset(self):
            self.cur_pos = 0
            return [self.cur_pos]

        def step(self, action):
            if action == 0 and self.cur_pos > 0:
                self.cur_pos -= 1
            elif action == 1:
                self.cur_pos += 1
            done = self.cur_pos >= self.end_pos
            return [self.cur_pos], 1 if done else 0, done, {}

    tune.run(
        "PPO",
        config={
            "env": SimpleCorridor,
            "num_workers": 4,
            "env_config": {"corridor_length": 5}})

.. _`RLlib`: https://ray.readthedocs.io/en/latest/rllib.html


More Information
----------------

- `Documentation`_, in particular `Building Ray and Contributing to Ray`_
- `Tutorial`_
- `Blog`_
- `Ray paper`_
- `Ray HotOS paper`_
- `RLlib paper`_
- `Tune paper`_

.. _`Documentation`: http://ray.readthedocs.io/en/latest/index.html
.. _`Building Ray and Contributing to Ray`: https://ray.readthedocs.io/en/latest/development.html
.. _`Tutorial`: https://github.com/ray-project/tutorial
.. _`Blog`: https://ray-project.github.io/
.. _`Ray paper`: https://arxiv.org/abs/1712.05889
.. _`Ray HotOS paper`: https://arxiv.org/abs/1703.03924
.. _`RLlib paper`: https://arxiv.org/abs/1712.09381
.. _`Tune paper`: https://arxiv.org/abs/1807.05118

Getting Involved
----------------

- `ray-dev@googlegroups.com`_: For discussions about development or any general
  questions.
- `StackOverflow`_: For questions about how to use Ray.
- `GitHub Issues`_: For reporting bugs and feature requests.
- `Pull Requests`_: For submitting code contributions.
- `Meetup Group`_: Join our meetup group.
- `Community Slack`_: Join our Slack workspace.
- `Twitter`_: Follow updates on Twitter.

.. _`ray-dev@googlegroups.com`: https://groups.google.com/forum/#!forum/ray-dev
.. _`GitHub Issues`: https://github.com/ray-project/ray/issues
.. _`StackOverflow`: https://stackoverflow.com/questions/tagged/ray
.. _`Pull Requests`: https://github.com/ray-project/ray/pulls
.. _`Meetup Group`: https://www.meetup.com/Bay-Area-Ray-Meetup/
.. _`Community Slack`: https://forms.gle/9TSdDYUgxYs8SA9e8
.. _`Twitter`: https://twitter.com/raydistributed

Description

An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Readme Multiple Licenses 111 MiB

Languages

Python 56.6%

C++ 28.8%

Java 8.5%

TypeScript 1.7%

Starlark 1.4%

Other 2.8%