mirror of
https://github.com/wassname/ray.git
synced 2026-07-02 02:42:52 +08:00
f76ce836b2cbfef53b71647684572a6fde228a8e
* Skeleton plus a unit test for simple borrower case * First unit test passes - forward an ID and task returns with 1 submitted task pending on the inner ID * Invariant for contained_in * Unit test passes for testing task return without creating a borrower * Wrap ref count functionality in test case * Fix bad delete * Unit test and fix for borrowers creating more borrowers * Unit test and fix for simple borrowing, but owner sends call after borrower's ref count goes to 0 * Refactor: - keep a sentinel ref count for task argument IDs - keep contained_in_borrowed in addition to contained_in_owned * Unit test for nested IDs passes * Refactor so that an object ID can only be contained in 1 borrowed ID at a time * Add check * Fix * Unit test (passes) to test nesting object IDs but no borrowers created * Unit test for nested objects from different owners passes, refactor to unset contained_in when popping refs * Unit tests for borrowers receiving an ObjectID from multiple sources, skip adding ownership info if we already have it to handle duplicate refs * Unit test for returning object ID passes * More unit tests for returning object IDs pass * Add serialized ID tests * fix serialization issue * remove swap * It builds! * debugging and some fixes: - register handler for WaitForRefRemoved - don't create a python reference for arg IDs - pass in client factory into ReferenceCounter - fix bad decrement in PopBorrowerRefs * Fix accounting for serialized IDs: - don't decrement for IDs on dependency resolution, wait until task finished - add object IDs that were inlined when building the arguments to the task spec, pin these on the task executor until task finishes * mu_ -> mutex_ * lint * fix build * clear outer_object_id * add direct call type check * Fix test for direct call IDs and return IDs for actor calls * Fix CoreWorkerClient.Addr() * Remove unneeded lock * Remove unnecessary ObjectID refs * Fix worker holding serialized refs test * Fix hex IDs * fix * fix tests * fix tests * refactor and cleanups * lint * Put inlined Ids in task args and some cleanup * Add back gc.collect() line for test case * Refactor and fixes: - store inlined IDs in RayObject - allow storing objects with inlined IDs in memory store - pin objects that were promoted to plasma * oops * make sure worker ID is set in address, pass in rpc::Address to CoreWorkerClient * todos * cleanups and test builds * Fix tests * Add feature flag * cleanups * address comments and some cleanups * cleanup * fix recursive test * Comments for tests * Turn off ref counting by default * Skip tests * Fix some bugs for test_array.py, java build * Don't include nested objects in the ref count when the feature flag is off * C++ feature flag does not work... * Remove * Turn on python tests and add a warning when plasma objects are evicted before being pinned * Fix build and remove irrelevant test * Fix for java * Revert "Fix build and remove irrelevant test" This reverts commit 056cca9b263ed05b0f9ab2250907338edcbca2d5. * Fix ray.internal.free * Fixes and skip some flaky tests * fix java build * fix windows build * Add IDs contained in owned objects * Update src/ray/protobuf/core_worker.proto Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * Update src/ray/core_worker/reference_count.cc Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * Update src/ray/protobuf/core_worker.proto Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * Update src/ray/protobuf/core_worker.proto Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * Update src/ray/core_worker/reference_count.h Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * Update src/ray/core_worker/reference_count.h Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * Update src/ray/core_worker/reference_count.cc Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * update * Try to fix ::test_direct_call_serialized_id_eviction Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
.. image:: https://github.com/ray-project/ray/raw/master/doc/source/images/ray_header_logo.png
.. image:: https://travis-ci.com/ray-project/ray.svg?branch=master
:target: https://travis-ci.com/ray-project/ray
.. image:: https://readthedocs.org/projects/ray/badge/?version=latest
:target: http://ray.readthedocs.io/en/latest/?badge=latest
|
**Ray is a fast and simple framework for building and running distributed applications.**
Ray is packaged with the following libraries for accelerating machine learning workloads:
- `Tune`_: Scalable Hyperparameter Tuning
- `RLlib`_: Scalable Reinforcement Learning
- `RaySGD <https://ray.readthedocs.io/en/latest/raysgd/raysgd.html>`__: Distributed Training Wrappers
Install Ray with: ``pip install ray``. For nightly wheels, see the
`Installation page <https://ray.readthedocs.io/en/latest/installation.html>`__.
**NOTE:** `We are deprecating Python 2 support soon.`_
.. _`We are deprecating Python 2 support soon.`: https://github.com/ray-project/ray/issues/6580
Quick Start
-----------
Execute Python functions in parallel.
.. code-block:: python
import ray
ray.init()
@ray.remote
def f(x):
return x * x
futures = [f.remote(i) for i in range(4)]
print(ray.get(futures))
To use Ray's actor model:
.. code-block:: python
import ray
ray.init()
@ray.remote
class Counter(object):
def __init__(self):
self.n = 0
def increment(self):
self.n += 1
def read(self):
return self.n
counters = [Counter.remote() for i in range(4)]
[c.increment.remote() for c in counters]
futures = [c.read.remote() for c in counters]
print(ray.get(futures))
Ray programs can run on a single machine, and can also seamlessly scale to large clusters. To execute the above Ray script in the cloud, just download `this configuration file <https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/aws/example-full.yaml>`__, and run:
``ray submit [CLUSTER.YAML] example.py --start``
Read more about `launching clusters <https://ray.readthedocs.io/en/latest/autoscaling.html>`_.
Tune Quick Start
----------------
.. image:: https://github.com/ray-project/ray/raw/master/doc/source/images/tune-wide.png
`Tune`_ is a library for hyperparameter tuning at any scale.
- Launch a multi-node distributed hyperparameter sweep in less than 10 lines of code.
- Supports any deep learning framework, including PyTorch, TensorFlow, and Keras.
- Visualize results with `TensorBoard <https://www.tensorflow.org/get_started/summaries_and_tensorboard>`__.
- Choose among scalable SOTA algorithms such as `Population Based Training (PBT)`_, `Vizier's Median Stopping Rule`_, `HyperBand/ASHA`_.
- Tune integrates with many optimization libraries such as `Facebook Ax <http://ax.dev>`_, `HyperOpt <https://github.com/hyperopt/hyperopt>`_, and `Bayesian Optimization <https://github.com/fmfn/BayesianOptimization>`_ and enables you to scale them transparently.
To run this example, you will need to install the following:
.. code-block:: bash
$ pip install ray[tune] torch torchvision filelock
This example runs a parallel grid search to train a Convolutional Neural Network using PyTorch.
.. code-block:: python
import torch.optim as optim
from ray import tune
from ray.tune.examples.mnist_pytorch import (
get_data_loaders, ConvNet, train, test)
def train_mnist(config):
train_loader, test_loader = get_data_loaders()
model = ConvNet()
optimizer = optim.SGD(model.parameters(), lr=config["lr"])
for i in range(10):
train(model, optimizer, train_loader)
acc = test(model, test_loader)
tune.track.log(mean_accuracy=acc)
analysis = tune.run(
train_mnist, config={"lr": tune.grid_search([0.001, 0.01, 0.1])})
print("Best config: ", analysis.get_best_config(metric="mean_accuracy"))
# Get a dataframe for analyzing trial results.
df = analysis.dataframe()
If TensorBoard is installed, automatically visualize all trial results:
.. code-block:: bash
tensorboard --logdir ~/ray_results
.. _`Tune`: https://ray.readthedocs.io/en/latest/tune.html
.. _`Population Based Training (PBT)`: https://ray.readthedocs.io/en/latest/tune-schedulers.html#population-based-training-pbt
.. _`Vizier's Median Stopping Rule`: https://ray.readthedocs.io/en/latest/tune-schedulers.html#median-stopping-rule
.. _`HyperBand/ASHA`: https://ray.readthedocs.io/en/latest/tune-schedulers.html#asynchronous-hyperband
RLlib Quick Start
-----------------
.. image:: https://github.com/ray-project/ray/raw/master/doc/source/images/rllib-wide.jpg
`RLlib`_ is an open-source library for reinforcement learning built on top of Ray that offers both high scalability and a unified API for a variety of applications.
.. code-block:: bash
pip install tensorflow # or tensorflow-gpu
pip install ray[rllib] # also recommended: ray[debug]
.. code-block:: python
import gym
from gym.spaces import Discrete, Box
from ray import tune
class SimpleCorridor(gym.Env):
def __init__(self, config):
self.end_pos = config["corridor_length"]
self.cur_pos = 0
self.action_space = Discrete(2)
self.observation_space = Box(0.0, self.end_pos, shape=(1, ))
def reset(self):
self.cur_pos = 0
return [self.cur_pos]
def step(self, action):
if action == 0 and self.cur_pos > 0:
self.cur_pos -= 1
elif action == 1:
self.cur_pos += 1
done = self.cur_pos >= self.end_pos
return [self.cur_pos], 1 if done else 0, done, {}
tune.run(
"PPO",
config={
"env": SimpleCorridor,
"num_workers": 4,
"env_config": {"corridor_length": 5}})
.. _`RLlib`: https://ray.readthedocs.io/en/latest/rllib.html
More Information
----------------
- `Documentation`_, in particular `Building Ray and Contributing to Ray`_
- `Tutorial`_
- `Blog`_
- `Ray paper`_
- `Ray HotOS paper`_
- `RLlib paper`_
- `Tune paper`_
.. _`Documentation`: http://ray.readthedocs.io/en/latest/index.html
.. _`Building Ray and Contributing to Ray`: https://ray.readthedocs.io/en/latest/development.html
.. _`Tutorial`: https://github.com/ray-project/tutorial
.. _`Blog`: https://ray-project.github.io/
.. _`Ray paper`: https://arxiv.org/abs/1712.05889
.. _`Ray HotOS paper`: https://arxiv.org/abs/1703.03924
.. _`RLlib paper`: https://arxiv.org/abs/1712.09381
.. _`Tune paper`: https://arxiv.org/abs/1807.05118
Getting Involved
----------------
- `ray-dev@googlegroups.com`_: For discussions about development or any general
questions.
- `StackOverflow`_: For questions about how to use Ray.
- `GitHub Issues`_: For reporting bugs and feature requests.
- `Pull Requests`_: For submitting code contributions.
- `Meetup Group`_: Join our meetup group.
- `Community Slack`_: Join our Slack workspace.
- `Twitter`_: Follow updates on Twitter.
.. _`ray-dev@googlegroups.com`: https://groups.google.com/forum/#!forum/ray-dev
.. _`GitHub Issues`: https://github.com/ray-project/ray/issues
.. _`StackOverflow`: https://stackoverflow.com/questions/tagged/ray
.. _`Pull Requests`: https://github.com/ray-project/ray/pulls
.. _`Meetup Group`: https://www.meetup.com/Bay-Area-Ray-Meetup/
.. _`Community Slack`: https://forms.gle/9TSdDYUgxYs8SA9e8
.. _`Twitter`: https://twitter.com/raydistributed
Description
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.
Languages
Python
56.6%
C++
28.8%
Java
8.5%
TypeScript
1.7%
Starlark
1.4%
Other
2.8%