mirror of
https://github.com/wassname/ray.git
synced 2026-07-04 23:03:57 +08:00
Fix some paths /tmp/raylogs -> /tmp/ray. (#3189)
This commit is contained in:
committed by
Philipp Moritz
parent
5822aa2388
commit
e495ab5e7c
@@ -136,7 +136,8 @@ The default idle timeout is 5 minutes. This is to prevent excessive node churn w
|
||||
Monitoring cluster status
|
||||
-------------------------
|
||||
|
||||
You can monitor cluster usage and auto-scaling status by tailing the autoscaling logs in ``/tmp/raylogs/monitor-*``.
|
||||
You can monitor cluster usage and auto-scaling status by tailing the autoscaling
|
||||
logs in ``/tmp/ray/session_*/logs/monitor*``.
|
||||
|
||||
The Ray autoscaler also reports per-node status in the form of instance tags. In your cloud provider console, you can click on a Node, go the the "Tags" pane, and add the ``ray-node-status`` tag as a column. This lets you see per-node statuses at a glance:
|
||||
|
||||
|
||||
@@ -61,10 +61,10 @@ of the following reasons.
|
||||
- **Stressful workloads:** Workloads that create many many tasks in a short
|
||||
amount of time can sometimes interfere with the heartbeat mechanism that we
|
||||
use to check that processes are still alive. On the head node in the cluster,
|
||||
you can check the files ``/tmp/raylogs/monitor-******.out`` and
|
||||
``/tmp/raylogs/monitor-******.err``. They will indicate which processes Ray
|
||||
has marked as dead (due to a lack of heartbeats). However, it is currently
|
||||
possible for a process to get marked as dead without actually having died.
|
||||
you can check the files ``/tmp/ray/session_*/logs/monitor*``. They will
|
||||
indicate which processes Ray has marked as dead (due to a lack of heartbeats).
|
||||
However, it is currently possible for a process to get marked as dead without
|
||||
actually having died.
|
||||
|
||||
- **Starting many actors:** Workloads that start a large number of actors all at
|
||||
once may exhibit problems when the processes (or libraries that they use)
|
||||
|
||||
+134
-134
@@ -1,11 +1,11 @@
|
||||
Profiling for Ray Users
|
||||
=======================
|
||||
|
||||
This document is intended for users of Ray who want to know how to evaluate
|
||||
the performance of their code while running on Ray. Profiling the
|
||||
performance of your code can be very helpful to determine performance
|
||||
bottlenecks or to find out where your code may not be parallelized properly.
|
||||
If you are interested in pinpointing why your Ray application may not be
|
||||
This document is intended for users of Ray who want to know how to evaluate
|
||||
the performance of their code while running on Ray. Profiling the
|
||||
performance of your code can be very helpful to determine performance
|
||||
bottlenecks or to find out where your code may not be parallelized properly.
|
||||
If you are interested in pinpointing why your Ray application may not be
|
||||
achieving the expected speedup, read on!
|
||||
|
||||
|
||||
@@ -28,26 +28,26 @@ let's define our remote function to just sleep for 0.5 seconds:
|
||||
def func():
|
||||
time.sleep(0.5)
|
||||
|
||||
In our example setup, we wish to call our remote function ``func()`` five
|
||||
times, and store the result of each call into a list. To compare the
|
||||
performance of different ways of looping our calls to our remote function,
|
||||
In our example setup, we wish to call our remote function ``func()`` five
|
||||
times, and store the result of each call into a list. To compare the
|
||||
performance of different ways of looping our calls to our remote function,
|
||||
we can define each loop version as a separate function on the driver script.
|
||||
|
||||
For the first version **ex1**, each iteration of the loop calls the remote
|
||||
function, then calls ``ray.get`` in an attempt to store the current result
|
||||
For the first version **ex1**, each iteration of the loop calls the remote
|
||||
function, then calls ``ray.get`` in an attempt to store the current result
|
||||
into the list, as follows:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# This loop is suboptimal in Ray, and should only be used for the sake of this example
|
||||
def ex1():
|
||||
def ex1():
|
||||
list1 = []
|
||||
for i in range(5):
|
||||
list1.append(ray.get(func.remote()))
|
||||
|
||||
For the second version **ex2**, each iteration of the loop calls the remote
|
||||
function, and stores it into the list **without** calling ``ray.get`` each time.
|
||||
``ray.get`` is used after the loop has finished, in preparation for processing
|
||||
For the second version **ex2**, each iteration of the loop calls the remote
|
||||
function, and stores it into the list **without** calling ``ray.get`` each time.
|
||||
``ray.get`` is used after the loop has finished, in preparation for processing
|
||||
``func()``'s results:
|
||||
|
||||
.. code-block:: python
|
||||
@@ -59,8 +59,8 @@ function, and stores it into the list **without** calling ``ray.get`` each time.
|
||||
list2.append(func.remote())
|
||||
ray.get(list2)
|
||||
|
||||
Finally, for an example that's not so parallelizable, let's create a
|
||||
third version **ex3** where the driver has to call a local
|
||||
Finally, for an example that's not so parallelizable, let's create a
|
||||
third version **ex3** where the driver has to call a local
|
||||
function in between each call to the remote function ``func()``:
|
||||
|
||||
.. code-block:: python
|
||||
@@ -81,14 +81,14 @@ Timing Performance Using Python's Timestamps
|
||||
--------------------------------------------
|
||||
|
||||
One way to sanity-check the performance of the three loops is simply to
|
||||
time how long it takes to complete each loop version. We can do this using
|
||||
time how long it takes to complete each loop version. We can do this using
|
||||
python's built-in ``time`` `module`_.
|
||||
|
||||
.. _`module`: https://docs.python.org/3/library/time.html
|
||||
|
||||
The ``time`` module contains a useful ``time()`` function that returns the
|
||||
current timestamp in unix time whenever it's called. We can create a generic
|
||||
function wrapper to call ``time()`` right before and right after each loop
|
||||
The ``time`` module contains a useful ``time()`` function that returns the
|
||||
current timestamp in unix time whenever it's called. We can create a generic
|
||||
function wrapper to call ``time()`` right before and right after each loop
|
||||
function to print out how long each loop takes overall:
|
||||
|
||||
.. code-block:: python
|
||||
@@ -106,8 +106,8 @@ function to print out how long each loop takes overall:
|
||||
return result
|
||||
return timed_wrapper
|
||||
|
||||
To always print out how long the loop takes to run each time the loop
|
||||
function ``ex1()`` is called, we can evoke our ``time_this`` wrapper with
|
||||
To always print out how long the loop takes to run each time the loop
|
||||
function ``ex1()`` is called, we can evoke our ``time_this`` wrapper with
|
||||
a function decorator. This can similarly be done to functions ``ex2()``
|
||||
and ``ex3()``:
|
||||
|
||||
@@ -136,9 +136,9 @@ Then, running the three timed loops should yield output similar to this:
|
||||
| func:'ex2' args:[(), {}] took: 1.0032 seconds |
|
||||
| func:'ex3' args:[(), {}] took: 2.0039 seconds |
|
||||
|
||||
Let's interpret these results.
|
||||
Let's interpret these results.
|
||||
|
||||
Here, ``ex1()`` took substantially more time than ``ex2()``, where
|
||||
Here, ``ex1()`` took substantially more time than ``ex2()``, where
|
||||
their only difference is that ``ex1()`` calls ``ray.get`` on the remote
|
||||
function before adding it to the list, while ``ex2()`` waits to fetch the
|
||||
entire list with ``ray.get`` at once.
|
||||
@@ -160,28 +160,28 @@ entire list with ``ray.get`` at once.
|
||||
list2.append(func.remote())
|
||||
ray.get(list2)
|
||||
|
||||
Notice how ``ex1()`` took 2.5 seconds, exactly five times 0.5 seconds, or
|
||||
the time it would take to wait for our remote function five times in a row.
|
||||
Notice how ``ex1()`` took 2.5 seconds, exactly five times 0.5 seconds, or
|
||||
the time it would take to wait for our remote function five times in a row.
|
||||
|
||||
By calling ``ray.get`` after each call to the remote function, ``ex1()``
|
||||
removes all ability to parallelize work, by forcing the driver to wait for
|
||||
each ``func()``'s result in succession. We are not taking advantage of Ray
|
||||
parallelization here!
|
||||
By calling ``ray.get`` after each call to the remote function, ``ex1()``
|
||||
removes all ability to parallelize work, by forcing the driver to wait for
|
||||
each ``func()``'s result in succession. We are not taking advantage of Ray
|
||||
parallelization here!
|
||||
|
||||
Meanwhile, ``ex2()`` takes about 1 second, much faster than it would normally
|
||||
take to call ``func()`` five times iteratively. Ray is running each call to
|
||||
``func()`` in parallel, saving us time.
|
||||
Meanwhile, ``ex2()`` takes about 1 second, much faster than it would normally
|
||||
take to call ``func()`` five times iteratively. Ray is running each call to
|
||||
``func()`` in parallel, saving us time.
|
||||
|
||||
``ex1()`` is actually a common user mistake in Ray. ``ray.get`` is not
|
||||
necessary to do before adding the result of ``func()`` to the list. Instead,
|
||||
the driver should send out all parallelizable calls to the remote function
|
||||
``ex1()`` is actually a common user mistake in Ray. ``ray.get`` is not
|
||||
necessary to do before adding the result of ``func()`` to the list. Instead,
|
||||
the driver should send out all parallelizable calls to the remote function
|
||||
to Ray before waiting to receive their results with ``ray.get``. ``ex1()``'s
|
||||
suboptimal behavior can be noticed just using this simple timing test.
|
||||
|
||||
Realistically, however, many applications are not as highly parallelizable
|
||||
as ``ex2()``, and the application includes sections where the code must run in
|
||||
Realistically, however, many applications are not as highly parallelizable
|
||||
as ``ex2()``, and the application includes sections where the code must run in
|
||||
serial. ``ex3()`` is such an example, where the local function ``other_func()``
|
||||
must run first before each call to ``func()`` can be submitted to Ray.
|
||||
must run first before each call to ``func()`` can be submitted to Ray.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@@ -196,23 +196,23 @@ must run first before each call to ``func()`` can be submitted to Ray.
|
||||
list2.append(func.remote())
|
||||
ray.get(list3)
|
||||
|
||||
What results is that while ``ex3()`` still gained 0.5 seconds of speedup
|
||||
What results is that while ``ex3()`` still gained 0.5 seconds of speedup
|
||||
compared to the completely serialized ``ex1()`` version, this speedup is
|
||||
still nowhere near the ideal speedup of ``ex2()``.
|
||||
still nowhere near the ideal speedup of ``ex2()``.
|
||||
|
||||
The dramatic speedup of ``ex2()`` is possible because ``ex2()`` is
|
||||
theoretically completely parallelizable: if we were given 5 CPUs, all 5 calls
|
||||
to ``func()`` can be run in parallel. What is happening with ``ex3()``,
|
||||
however, is that each parallelized call to ``func()`` is staggered by a wait
|
||||
The dramatic speedup of ``ex2()`` is possible because ``ex2()`` is
|
||||
theoretically completely parallelizable: if we were given 5 CPUs, all 5 calls
|
||||
to ``func()`` can be run in parallel. What is happening with ``ex3()``,
|
||||
however, is that each parallelized call to ``func()`` is staggered by a wait
|
||||
of 0.3 seconds for the local ``other_func()`` to finish.
|
||||
|
||||
``ex3()`` is thus a manifestation of `Amdahls Law`_: the fastest theoretically
|
||||
possible execution time from parallelizing an application is limited to be
|
||||
no better than the time it takes to run all serial parts in serial.
|
||||
``ex3()`` is thus a manifestation of `Amdahls Law`_: the fastest theoretically
|
||||
possible execution time from parallelizing an application is limited to be
|
||||
no better than the time it takes to run all serial parts in serial.
|
||||
|
||||
.. _`Amdahls Law`: https://en.wikipedia.org/wiki/Amdahl%27s_law
|
||||
|
||||
Due to Amdahl's Law, ``ex3()`` must take at least 1.5
|
||||
Due to Amdahl's Law, ``ex3()`` must take at least 1.5
|
||||
seconds -- the time it takes for 5 serial calls to ``other_func()`` to finish!
|
||||
After an additional 0.5 seconds to execute func and get the result, the
|
||||
computation is done.
|
||||
@@ -224,7 +224,7 @@ Profiling Using An External Profiler (Line Profiler)
|
||||
One way to profile the performance of our code using Ray is to use a third-party
|
||||
profiler such as `Line_profiler`_. Line_profiler is a useful line-by-line
|
||||
profiler for pure Python applications that formats its output side-by-side with
|
||||
the profiled code itself.
|
||||
the profiled code itself.
|
||||
|
||||
Alternatively, another third-party profiler (not covered in this documentation)
|
||||
that you could use is `Pyflame`_, which can generate profiling graphs.
|
||||
@@ -238,11 +238,11 @@ First install ``line_profiler`` with pip:
|
||||
|
||||
pip install line_profiler
|
||||
|
||||
``line_profiler`` requires each section of driver code that you want to profile as
|
||||
its own independent function. Conveniently, we have already done so by defining
|
||||
``line_profiler`` requires each section of driver code that you want to profile as
|
||||
its own independent function. Conveniently, we have already done so by defining
|
||||
each loop version as its own function. To tell ``line_profiler`` which functions
|
||||
to profile, just add the ``@profile`` decorator to ``ex1()``, ``ex2()`` and
|
||||
``ex3()``. Note that you do not need to import ``line_profiler`` into your Ray
|
||||
to profile, just add the ``@profile`` decorator to ``ex1()``, ``ex2()`` and
|
||||
``ex3()``. Note that you do not need to import ``line_profiler`` into your Ray
|
||||
application:
|
||||
|
||||
.. code-block:: python
|
||||
@@ -262,16 +262,16 @@ application:
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
||||
Then, when we want to execute our Python script from the command line, instead
|
||||
of ``python your_script_here.py``, we use the following shell command to run the
|
||||
Then, when we want to execute our Python script from the command line, instead
|
||||
of ``python your_script_here.py``, we use the following shell command to run the
|
||||
script with ``line_profiler`` enabled:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
kernprof -l your_script_here.py
|
||||
kernprof -l your_script_here.py
|
||||
|
||||
This command runs your script and prints only your script's output as usual.
|
||||
``Line_profiler`` instead outputs its profiling results to a corresponding
|
||||
This command runs your script and prints only your script's output as usual.
|
||||
``Line_profiler`` instead outputs its profiling results to a corresponding
|
||||
binary file called ``your_script_here.py.lprof``.
|
||||
|
||||
To read ``line_profiler``'s results to terminal, use this shell command:
|
||||
@@ -300,10 +300,10 @@ Note that execution time is given in units of 1e-06 seconds:
|
||||
33 5 2508805.0 501761.0 100.0 list1.append(ray.get(func.remote()))
|
||||
|
||||
|
||||
Notice that each hit to ``list1.append(ray.get(func.remote()))`` at line 33
|
||||
takes the full 0.5 seconds waiting for ``func()`` to finish. Meanwhile, in
|
||||
``ex2()`` below, each call of ``func.remote()`` at line 40 only takes 0.127 ms,
|
||||
and the majority of the time (about 1 second) is spent on waiting for ``ray.get()``
|
||||
Notice that each hit to ``list1.append(ray.get(func.remote()))`` at line 33
|
||||
takes the full 0.5 seconds waiting for ``func()`` to finish. Meanwhile, in
|
||||
``ex2()`` below, each call of ``func.remote()`` at line 40 only takes 0.127 ms,
|
||||
and the majority of the time (about 1 second) is spent on waiting for ``ray.get()``
|
||||
at the end:
|
||||
|
||||
|
||||
@@ -323,11 +323,11 @@ at the end:
|
||||
41 1 1002919.0 1002919.0 99.9 ray.get(list2)
|
||||
|
||||
|
||||
And finally, ``line_profiler``'s output for ``ex3()``. Each call to
|
||||
``func.remote()`` at line 50 still take magnitudes faster than 0.5 seconds,
|
||||
showing that Ray is successfully parallelizing the remote calls. However, each
|
||||
call to the local function ``other_func()`` takes the full 0.3 seconds,
|
||||
totalling up to the guaranteed minimum application execution time of 1.5
|
||||
And finally, ``line_profiler``'s output for ``ex3()``. Each call to
|
||||
``func.remote()`` at line 50 still take magnitudes faster than 0.5 seconds,
|
||||
showing that Ray is successfully parallelizing the remote calls. However, each
|
||||
call to the local function ``other_func()`` takes the full 0.3 seconds,
|
||||
totalling up to the guaranteed minimum application execution time of 1.5
|
||||
seconds:
|
||||
|
||||
.. code-block:: bash
|
||||
@@ -351,20 +351,20 @@ seconds:
|
||||
Profiling Using Python's CProfile
|
||||
---------------------------------
|
||||
|
||||
A second way to profile the performance of your Ray application is to
|
||||
use Python's native cProfile `profiling module`_. Rather than tracking
|
||||
A second way to profile the performance of your Ray application is to
|
||||
use Python's native cProfile `profiling module`_. Rather than tracking
|
||||
line-by-line of your application code, cProfile can give the total runtime
|
||||
of each loop function, as well as list the number of calls made and
|
||||
execution time of all function calls made within the profiled code.
|
||||
execution time of all function calls made within the profiled code.
|
||||
|
||||
.. _`profiling module`: https://docs.python.org/3/library/profile.html#module-cProfile
|
||||
|
||||
Unlike ``line_profiler`` above, this detailed list of profiled function calls
|
||||
**includes** internal function calls and function calls made within Ray!
|
||||
Unlike ``line_profiler`` above, this detailed list of profiled function calls
|
||||
**includes** internal function calls and function calls made within Ray!
|
||||
|
||||
However, similar to ``line_profiler``, cProfile can be enabled with minimal
|
||||
changes to your application code (given that each section of the code you want
|
||||
to profile is defined as its own function). To use cProfile, add an import
|
||||
However, similar to ``line_profiler``, cProfile can be enabled with minimal
|
||||
changes to your application code (given that each section of the code you want
|
||||
to profile is defined as its own function). To use cProfile, add an import
|
||||
statement, then replace calls to the loop functions as follows:
|
||||
|
||||
.. code-block:: python
|
||||
@@ -385,17 +385,17 @@ statement, then replace calls to the loop functions as follows:
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
||||
Now, when executing your Python script, a cProfile list of profiled function
|
||||
Now, when executing your Python script, a cProfile list of profiled function
|
||||
calls will be outputted to terminal for each call made to ``cProfile.run()``.
|
||||
At the very top of cProfile's output gives the total execution time for
|
||||
At the very top of cProfile's output gives the total execution time for
|
||||
``'ex1()'``:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
601 function calls (595 primitive calls) in 2.509 seconds
|
||||
|
||||
Following is a snippet of profiled function calls for ``'ex1()'``. Most of
|
||||
these calls are quick and take around 0.000 seconds, so the functions of
|
||||
Following is a snippet of profiled function calls for ``'ex1()'``. Most of
|
||||
these calls are quick and take around 0.000 seconds, so the functions of
|
||||
interest are the ones with non-zero execution times:
|
||||
|
||||
.. code-block:: bash
|
||||
@@ -405,7 +405,7 @@ interest are the ones with non-zero execution times:
|
||||
1 0.000 0.000 2.509 2.509 your_script_here.py:31(ex1)
|
||||
5 0.000 0.000 0.001 0.000 remote_function.py:103(remote)
|
||||
5 0.000 0.000 0.001 0.000 remote_function.py:107(_submit)
|
||||
...
|
||||
...
|
||||
10 0.000 0.000 0.000 0.000 worker.py:2459(__init__)
|
||||
5 0.000 0.000 2.508 0.502 worker.py:2535(get)
|
||||
5 0.000 0.000 0.000 0.000 worker.py:2695(get_global_worker)
|
||||
@@ -414,25 +414,25 @@ interest are the ones with non-zero execution times:
|
||||
5 0.000 0.000 0.000 0.000 worker.py:514(submit_task)
|
||||
...
|
||||
|
||||
The 5 separate calls to Ray's ``get``, taking the full 0.502 seconds each call,
|
||||
can be noticed at ``worker.py:2535(get)``. Meanwhile, the act of calling the
|
||||
remote function itself at ``remote_function.py:103(remote)`` only takes 0.001
|
||||
seconds over 5 calls, and thus is not the source of the slow performance of
|
||||
The 5 separate calls to Ray's ``get``, taking the full 0.502 seconds each call,
|
||||
can be noticed at ``worker.py:2535(get)``. Meanwhile, the act of calling the
|
||||
remote function itself at ``remote_function.py:103(remote)`` only takes 0.001
|
||||
seconds over 5 calls, and thus is not the source of the slow performance of
|
||||
``ex1()``.
|
||||
|
||||
|
||||
Profiling Ray Actors with cProfile
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Considering that the detailed output of cProfile can be quite different depending
|
||||
on what Ray functionalities we use, let us see what cProfile's output might look
|
||||
like if our example involved Actors (for an introduction to Ray actors, see our
|
||||
`Actor documentation here`_).
|
||||
Considering that the detailed output of cProfile can be quite different depending
|
||||
on what Ray functionalities we use, let us see what cProfile's output might look
|
||||
like if our example involved Actors (for an introduction to Ray actors, see our
|
||||
`Actor documentation here`_).
|
||||
|
||||
.. _`Actor documentation here`: http://ray.readthedocs.io/en/latest/actors.html
|
||||
|
||||
Now, instead of looping over five calls to a remote function like in ``ex1``,
|
||||
let's create a new example and loop over five calls to a remote function
|
||||
let's create a new example and loop over five calls to a remote function
|
||||
**inside an actor**. Our actor's remote function again just sleeps for 0.5
|
||||
seconds:
|
||||
|
||||
@@ -440,7 +440,7 @@ seconds:
|
||||
|
||||
# Our actor
|
||||
@ray.remote
|
||||
class Sleeper(object):
|
||||
class Sleeper(object):
|
||||
def __init__(self):
|
||||
self.sleepValue = 0.5
|
||||
|
||||
@@ -448,7 +448,7 @@ seconds:
|
||||
def actor_func(self):
|
||||
time.sleep(self.sleepValue)
|
||||
|
||||
Recalling the suboptimality of ``ex1``, let's first see what happens if we
|
||||
Recalling the suboptimality of ``ex1``, let's first see what happens if we
|
||||
attempt to perform all five ``actor_func()`` calls within a single actor:
|
||||
|
||||
.. code-block:: python
|
||||
@@ -470,7 +470,7 @@ We enable cProfile on this example as follows:
|
||||
|
||||
def main():
|
||||
ray.init()
|
||||
cProfile.run('ex4()')
|
||||
cProfile.run('ex4()')
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -497,22 +497,22 @@ Running our new Actor example, cProfile's abbreviated output is as follows:
|
||||
8 0.000 0.000 0.001 0.000 worker.py:514(submit_task)
|
||||
...
|
||||
|
||||
It turns out that the entire example still took 2.5 seconds to execute, or the
|
||||
time for five calls to ``actor_func()`` to run in serial. We remember in ``ex1``
|
||||
that this behavior was because we did not wait until after submitting all five
|
||||
It turns out that the entire example still took 2.5 seconds to execute, or the
|
||||
time for five calls to ``actor_func()`` to run in serial. We remember in ``ex1``
|
||||
that this behavior was because we did not wait until after submitting all five
|
||||
remote function tasks to call ``ray.get()``, but we can verify on cProfile's
|
||||
output line ``worker.py:2535(get)`` that ``ray.get()`` was only called once at
|
||||
the end, for 2.509 seconds. What happened?
|
||||
output line ``worker.py:2535(get)`` that ``ray.get()`` was only called once at
|
||||
the end, for 2.509 seconds. What happened?
|
||||
|
||||
It turns out Ray cannot parallelize this example, because we have only
|
||||
initialized a single ``Sleeper`` actor. Because each actor is a single,
|
||||
stateful worker, our entire code is submitted and ran on a single worker the
|
||||
It turns out Ray cannot parallelize this example, because we have only
|
||||
initialized a single ``Sleeper`` actor. Because each actor is a single,
|
||||
stateful worker, our entire code is submitted and ran on a single worker the
|
||||
whole time.
|
||||
|
||||
To better parallelize the actors in ``ex4``, we can take advantage
|
||||
that each call to ``actor_func()`` is independent, and instead
|
||||
create five ``Sleeper`` actors. That way, we are creating five workers
|
||||
that can run in parallel, instead of creating a single worker that
|
||||
that can run in parallel, instead of creating a single worker that
|
||||
can only handle one call to ``actor_func()`` at a time.
|
||||
|
||||
.. code-block:: python
|
||||
@@ -530,7 +530,7 @@ can only handle one call to ``actor_func()`` at a time.
|
||||
|
||||
Our example in total now takes only 1.5 seconds to run:
|
||||
|
||||
.. code-block:: bash
|
||||
.. code-block:: bash
|
||||
|
||||
1378 function calls (1363 primitive calls) in 1.567 seconds
|
||||
|
||||
@@ -553,27 +553,27 @@ Our example in total now takes only 1.5 seconds to run:
|
||||
|
||||
Visualizing Tasks in the Ray Timeline
|
||||
-------------------------------------
|
||||
Profiling the performance of your Ray application doesn't need to be
|
||||
an eye-straining endeavor of interpreting numbers among hundreds of
|
||||
lines of text. Ray comes with its own visual web UI to visualize the
|
||||
Profiling the performance of your Ray application doesn't need to be
|
||||
an eye-straining endeavor of interpreting numbers among hundreds of
|
||||
lines of text. Ray comes with its own visual web UI to visualize the
|
||||
parallelization (or lack thereof) of user tasks submitted to Ray!
|
||||
|
||||
This method does have its own limitations, however. The Ray Timeline
|
||||
This method does have its own limitations, however. The Ray Timeline
|
||||
can only show timing info about Ray tasks, and not timing for normal
|
||||
Python functions. This can be an issue especially for debugging slow
|
||||
Python code that is running on the driver, and not running as a task on
|
||||
one of the workers. The other profiling techniques above are options that
|
||||
Python code that is running on the driver, and not running as a task on
|
||||
one of the workers. The other profiling techniques above are options that
|
||||
do cover profiling normal Python functions.
|
||||
|
||||
Currently, whenever initializing Ray, a URL is generated and printed
|
||||
in the terminal. This URL can be used to view Ray's web UI as a Jupyter
|
||||
in the terminal. This URL can be used to view Ray's web UI as a Jupyter
|
||||
notebook:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
~$: python your_script_here.py
|
||||
|
||||
Process STDOUT and STDERR is being redirected to /tmp/raylogs/.
|
||||
Process STDOUT and STDERR is being redirected to /tmp/ray/session_2018-11-01_14-31-43_27211/logs.
|
||||
Waiting for redis server at 127.0.0.1:61150 to respond...
|
||||
Waiting for redis server at 127.0.0.1:21607 to respond...
|
||||
Starting local scheduler with the following resources: {'CPU': 4, 'GPU': 0}.
|
||||
@@ -582,13 +582,13 @@ notebook:
|
||||
View the web UI at http://localhost:8897/notebooks/ray_ui84907.ipynb?token=025e8ab295270a57fac209204b37349fdf34e037671a13ff
|
||||
======================================================================
|
||||
|
||||
Ray's web UI attempts to run on localhost at port 8888, and if it fails
|
||||
it tries successive ports until it finds an open port. In this above
|
||||
Ray's web UI attempts to run on localhost at port 8888, and if it fails
|
||||
it tries successive ports until it finds an open port. In this above
|
||||
example, it has opened on port 8897.
|
||||
|
||||
Because this web UI is only available as long as your Ray application
|
||||
is currently running, you may need to add a user prompt to prevent
|
||||
your Ray application from exiting once it has finished executing,
|
||||
Because this web UI is only available as long as your Ray application
|
||||
is currently running, you may need to add a user prompt to prevent
|
||||
your Ray application from exiting once it has finished executing,
|
||||
such as below. You can then browse the web UI for as long as you like:
|
||||
|
||||
.. code-block:: python
|
||||
@@ -606,44 +606,44 @@ such as below. You can then browse the web UI for as long as you like:
|
||||
main()
|
||||
|
||||
Now, when executing your python script, you can access the Ray timeline
|
||||
by copying the web UI URL into your web browser on the Ray machine. To
|
||||
load the web UI in the jupyter notebook, select **Kernel -> Restart and
|
||||
by copying the web UI URL into your web browser on the Ray machine. To
|
||||
load the web UI in the jupyter notebook, select **Kernel -> Restart and
|
||||
Run All** in the jupyter menu.
|
||||
|
||||
The Ray timeline can be viewed in the fourth cell of the UI notebook by
|
||||
using the task filter options, then clicking on the **View task timeline**
|
||||
The Ray timeline can be viewed in the fourth cell of the UI notebook by
|
||||
using the task filter options, then clicking on the **View task timeline**
|
||||
button.
|
||||
|
||||
For example, here are the results of executing ``ex1()``, ``ex2()``, and
|
||||
``ex3()`` visualized in the Ray timeline. Each red block is a call to one
|
||||
of our user-defined remote functions, namely ``func()``, which sleeps for
|
||||
For example, here are the results of executing ``ex1()``, ``ex2()``, and
|
||||
``ex3()`` visualized in the Ray timeline. Each red block is a call to one
|
||||
of our user-defined remote functions, namely ``func()``, which sleeps for
|
||||
0.5 seconds:
|
||||
|
||||
.. image:: user-profiling-timeline.gif
|
||||
|
||||
(highlighted color boxes for ``ex1()``, ``ex2()``, and ``ex3()`` added for
|
||||
(highlighted color boxes for ``ex1()``, ``ex2()``, and ``ex3()`` added for
|
||||
the sake of this example)
|
||||
|
||||
Note how ``ex1()`` executes all five calls to ``func()`` in serial,
|
||||
Note how ``ex1()`` executes all five calls to ``func()`` in serial,
|
||||
while ``ex2()`` and ``ex3()`` are able to parallelize their remote
|
||||
function calls.
|
||||
function calls.
|
||||
|
||||
Because we have 4 CPUs available on our machine, we can only able to
|
||||
execute up to 4 remote functions in parallel. So, the fifth call to the
|
||||
remote function in ``ex2()`` must wait until the first batch of ``func()``
|
||||
Because we have 4 CPUs available on our machine, we can only able to
|
||||
execute up to 4 remote functions in parallel. So, the fifth call to the
|
||||
remote function in ``ex2()`` must wait until the first batch of ``func()``
|
||||
calls is finished.
|
||||
|
||||
In ``ex3()``, because of the serial dependency on ``other_func()``, we
|
||||
In ``ex3()``, because of the serial dependency on ``other_func()``, we
|
||||
aren't even able to use all 4 of our cores to parallelize calls to ``func()``.
|
||||
The time gaps between the ``func()`` blocks are a result of staggering the
|
||||
calls to ``func()`` in between waiting 0.3 seconds for ``other_func()``.
|
||||
calls to ``func()`` in between waiting 0.3 seconds for ``other_func()``.
|
||||
|
||||
Also, notice that due to the aforementioned limitation of the Ray timeline,
|
||||
``other_func()``, as a driver function and not a Ray task, is never
|
||||
Also, notice that due to the aforementioned limitation of the Ray timeline,
|
||||
``other_func()``, as a driver function and not a Ray task, is never
|
||||
visualized on the Ray timeline.
|
||||
|
||||
**For more on Ray's Web UI,** such as how to access the UI on a remote
|
||||
node over ssh, or for troubleshooting installation, please see our
|
||||
node over ssh, or for troubleshooting installation, please see our
|
||||
`Web UI documentation section`_.
|
||||
|
||||
.. _`Web UI documentation section`: http://ray.readthedocs.io/en/latest/webui.html
|
||||
|
||||
@@ -189,7 +189,7 @@ def get_or_create_head_node(config, config_file, no_restart, restart_only, yes,
|
||||
logger.info("Head node up-to-date, IP address is: {}".format(
|
||||
provider.external_ip(head_node)))
|
||||
|
||||
monitor_str = "tail -n 100 -f /tmp/raylogs/monitor-*"
|
||||
monitor_str = "tail -n 100 -f /tmp/ray/session_*/logs/monitor*"
|
||||
for s in init_commands:
|
||||
if ("ray start" in s and "docker exec" in s
|
||||
and "--autoscaling-config" in s):
|
||||
|
||||
@@ -402,7 +402,7 @@ def stop():
|
||||
from notebook.notebookapp import list_running_servers
|
||||
pids = [
|
||||
str(server["pid"]) for server in list_running_servers()
|
||||
if "/tmp/raylogs" in server["notebook_dir"]
|
||||
if "/tmp/ray" in server["notebook_dir"]
|
||||
]
|
||||
subprocess.call(
|
||||
["kill {} 2> /dev/null".format(" ".join(pids))], shell=True)
|
||||
|
||||
Reference in New Issue
Block a user