From 849d2aaf47c726044bf7935b976b76f7e2db5df3 Mon Sep 17 00:00:00 2001 From: Robert Nishihara Date: Sat, 20 May 2017 23:10:18 -0700 Subject: [PATCH] Fix image paths in blog post, add section on ray.wait. (#580) * Fix image paths in blog post. * Use future instead of object ID. * Add description of ray.wait. * Revert to keep some of the object ID terminology. --- site/README.md | 5 ++ .../_posts/2017-05-17-announcing-ray.markdown | 56 ++++++++++++++++--- 2 files changed, 52 insertions(+), 9 deletions(-) diff --git a/site/README.md b/site/README.md index ff55def0f..2e3871216 100644 --- a/site/README.md +++ b/site/README.md @@ -15,6 +15,11 @@ To view the site, run: bundle exec jekyll serve ``` +Note that images included under `site/assets/` should be referred to with +``. They will not render properly +when serving the site locally, but this is required for getting the paths to +work out on GitHub. + ## Deployment To deploy the site, run (inside the main ray directory): diff --git a/site/_posts/2017-05-17-announcing-ray.markdown b/site/_posts/2017-05-17-announcing-ray.markdown index 5ba97078b..a20a86145 100644 --- a/site/_posts/2017-05-17-announcing-ray.markdown +++ b/site/_posts/2017-05-17-announcing-ray.markdown @@ -46,8 +46,9 @@ for _ in range(8): {% endhighlight %} **With Ray**, when you call a **remote function**, the call immediately returns -an object ID. A task is then created, scheduled, and executed somewhere in the -cluster. This example would take 1 second to execute. +a future (we will refer to these as object IDs). A task is then created, +scheduled, and executed somewhere in the cluster. This example would take 1 +second to execute. {% highlight python %} @ray.remote @@ -65,11 +66,11 @@ results = ray.get(results) Note that the only changes are that we add the ``@ray.remote`` decorator to the function definition, we call the function with ``f.remote()``, and we call -``ray.get`` on the list of object IDs in order to block until the corresponding -tasks have finished executing. +``ray.get`` on the list of object IDs (remember that object IDs are futures) in +order to block until the corresponding tasks have finished executing.
- +
A graph depicting the tasks and objects in this example. The circles represent tasks, and the boxes represent objects. There are no arrows between @@ -112,7 +113,7 @@ actual values will be unpacked before the function is executed, so when the `aggregate_data` function is executed, `x` and `y` will be numpy arrays.
- +
A graph depicting the tasks and objects in this example. The circles represent tasks, and the boxes represent objects. Arrows point from tasks to the @@ -152,11 +153,11 @@ Each call to `simulator.step.remote` generates a task that is scheduled on the actor. These tasks mutate the state of the simulator object, and they are executed one at a time. -Like remote functions, actor methods return object IDs that can be passed into -other tasks and whose values can be retrieved with `ray.get`. +Like remote functions, actor methods return object IDs (that is, futures) that +can be passed into other tasks and whose values can be retrieved with `ray.get`.
- +
A graph depicting the tasks and objects in this example. The circles represent tasks, and the boxes represent objects. The first task is the actor's @@ -164,6 +165,43 @@ constructor. The thick arrows are used to show that the methods invoked on this actor share the underlying state of the actor.

+# Waiting for a Subset of Tasks to Finish + +Sometimes when running tasks with variable durations, we don't want to wait for +all of the tasks to finish. Instead, we may wish to wait for half of the tasks +to finish or perhaps to use whichever tasks have completed after one second. + +{% highlight python %} +@ray.remote +def f(): + time.sleep(np.random.uniform(0, 5)) + +# Launch 10 tasks with variable durations. +results = [f.remote() for _ in range(10)] + +# Wait until either five tasks have completed or two seconds have passed and +# return a list of the object IDs whose tasks have finished. +ready_ids, remaining_ids = ray.wait(results, num_returns=5, timeout=2000) +{% endhighlight %} + +In this example `ready_ids` is a list of object IDs whose corresponding tasks +have finished executing, and `remaining_ids` is a list of the remaining object +IDs. + +This primitive makes it easy to implement other behaviors, for example we may +wish to process some tasks in the order that they complete. + +{% highlight python %} +# Launch 10 tasks with variable durations. +remaining_ids = [f.remote() for _ in range(10)] + +# Process the tasks in the order that they complete. +results = [] +while len(remaining_ids) > 0: + ready_ids, remaining_ids = ray.wait(remaining_ids, num_returns=1) + results.append(ray.get(ready_ids[0])) +{% endhighlight %} + # Efficient Shared Memory and Serialization with Apache Arrow Serializing and deserializing data is often a bottleneck in distributed