diff --git a/doc/source/index.rst b/doc/source/index.rst index f308cbea4..d29f34f4e 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -65,7 +65,6 @@ Ray comes with libraries that accelerate deep learning and reinforcement learnin api.rst actors.rst using-ray-with-gpus.rst - webui.rst signals.rst async_api.rst diff --git a/doc/source/user-profiling-timeline.gif b/doc/source/user-profiling-timeline.gif deleted file mode 100644 index 455f1a9af..000000000 Binary files a/doc/source/user-profiling-timeline.gif and /dev/null differ diff --git a/doc/source/user-profiling.rst b/doc/source/user-profiling.rst index cdbabff39..4bf152e52 100644 --- a/doc/source/user-profiling.rst +++ b/doc/source/user-profiling.rst @@ -9,6 +9,23 @@ If you are interested in pinpointing why your Ray application may not be achieving the expected speedup, read on! +Visualizing Tasks in the Ray Timeline +------------------------------------- + +The most important tool is the timeline visualization tool. To visualize tasks +in the Ray timeline, you can dump the timeline as a JSON file using the +following command. + +.. code-block:: python + + ray.global_state.chrome_tracing_dump(filename="/tmp/timeline.json") + +Then open `chrome://tracing`_ in the Chrome web browser, and load +``timeline.json``. + +.. _`chrome://tracing`: chrome://tracing + + A Basic Example to Profile -------------------------- @@ -549,101 +566,3 @@ Our example in total now takes only 1.5 seconds to run: 1 0.000 0.000 1.564 1.564 worker.py:424(get_object) 20 0.001 0.000 0.001 0.000 worker.py:514(submit_task) ... - - -Visualizing Tasks in the Ray Timeline -------------------------------------- -Profiling the performance of your Ray application doesn't need to be -an eye-straining endeavor of interpreting numbers among hundreds of -lines of text. Ray comes with its own visual web UI to visualize the -parallelization (or lack thereof) of user tasks submitted to Ray! - -This method does have its own limitations, however. The Ray Timeline -can only show timing info about Ray tasks, and not timing for normal -Python functions. This can be an issue especially for debugging slow -Python code that is running on the driver, and not running as a task on -one of the workers. The other profiling techniques above are options that -do cover profiling normal Python functions. - -Currently, whenever initializing Ray, a URL is generated and printed -in the terminal. This URL can be used to view Ray's web UI as a Jupyter -notebook: - -.. code-block:: bash - - ~$: python your_script_here.py - - Process STDOUT and STDERR is being redirected to /tmp/ray/session_2018-11-01_14-31-43_27211/logs. - Waiting for redis server at 127.0.0.1:61150 to respond... - Waiting for redis server at 127.0.0.1:21607 to respond... - Starting local scheduler with the following resources: {'CPU': 4, 'GPU': 0}. - - ====================================================================== - View the web UI at http://localhost:8897/notebooks/ray_ui84907.ipynb?token=025e8ab295270a57fac209204b37349fdf34e037671a13ff - ====================================================================== - -Ray's web UI attempts to run on localhost at port 8888, and if it fails -it tries successive ports until it finds an open port. In this above -example, it has opened on port 8897. - -Because this web UI is only available as long as your Ray application -is currently running, you may need to add a user prompt to prevent -your Ray application from exiting once it has finished executing, -such as below. You can then browse the web UI for as long as you like: - -.. code-block:: python - - def main(): - ray.init() - ex1() - ex2() - ex3() - - # Require user input confirmation before exiting - hang = input('Examples finished executing. Press enter to exit:') - - if __name__ == "__main__": - main() - -Now, when executing your python script, you can access the Ray timeline -by copying the web UI URL into your web browser on the Ray machine. To -load the web UI in the jupyter notebook, select **Kernel -> Restart and -Run All** in the jupyter menu. - -The Ray timeline can be viewed in the fourth cell of the UI notebook by -using the task filter options, then clicking on the **View task timeline** -button. - -For example, here are the results of executing ``ex1()``, ``ex2()``, and -``ex3()`` visualized in the Ray timeline. Each red block is a call to one -of our user-defined remote functions, namely ``func()``, which sleeps for -0.5 seconds: - -.. image:: user-profiling-timeline.gif - -(highlighted color boxes for ``ex1()``, ``ex2()``, and ``ex3()`` added for -the sake of this example) - -Note how ``ex1()`` executes all five calls to ``func()`` in serial, -while ``ex2()`` and ``ex3()`` are able to parallelize their remote -function calls. - -Because we have 4 CPUs available on our machine, we can only able to -execute up to 4 remote functions in parallel. So, the fifth call to the -remote function in ``ex2()`` must wait until the first batch of ``func()`` -calls is finished. - -In ``ex3()``, because of the serial dependency on ``other_func()``, we -aren't even able to use all 4 of our cores to parallelize calls to ``func()``. -The time gaps between the ``func()`` blocks are a result of staggering the -calls to ``func()`` in between waiting 0.3 seconds for ``other_func()``. - -Also, notice that due to the aforementioned limitation of the Ray timeline, -``other_func()``, as a driver function and not a Ray task, is never -visualized on the Ray timeline. - -**For more on Ray's Web UI,** such as how to access the UI on a remote -node over ssh, or for troubleshooting installation, please see our -`Web UI documentation section`_. - -.. _`Web UI documentation section`: http://ray.readthedocs.io/en/latest/webui.html diff --git a/doc/source/webui.rst b/doc/source/webui.rst deleted file mode 100644 index 42b0635fe..000000000 --- a/doc/source/webui.rst +++ /dev/null @@ -1,147 +0,0 @@ -Web UI -====== - -The Ray web UI includes tools for debugging Ray jobs. The following -image shows an example of using the task timeline for performance debugging: - -.. image:: timeline.png - -Dependencies ------------- - -To use the UI, you will need to install the following. - -.. code-block:: bash - - pip install jupyter ipywidgets bokeh - -If you see an error like - -.. code-block:: bash - - Widget Javascript not detected. It may not be installed properly. - -Then you may need to run the following. - -.. code-block:: bash - - jupyter nbextension enable --py --sys-prefix widgetsnbextension - -**Note:** If you are building Ray from source, then you will also need a -``python2`` executable. - -Running the Web UI ------------------- - -Currently, the web UI is launched automatically when ``ray.init`` is called. The -command will print a URL of the form: - -.. code-block:: text - - ============================================================================= - View the web UI at http://localhost:8889/notebooks/ray_ui92131.ipynb?token=89354a314e5a81bf56e023ad18bda3a3d272ee216f342938 - ============================================================================= - -If you are running Ray on your local machine, then you can head directly to that -URL with your browser to see the Jupyter notebook. Otherwise, if you are using -Ray remotely, such as on EC2, you will need to ensure that port is open on that -machine. Typically, when you ssh into the machine, you can also port forward -with the ``-L`` option as such: - -.. code-block:: bash - - ssh -L :localhost: @ - -So for the above URL, you would use the port 8889. The Jupyter notebook attempts -to run on port 8888, but if that fails it tries successive ports until it finds -an open port. - -You can also open the port on the machine as well, which is not recommended for -security as the machine would be open to the Internet. In this case, you would -need to replace localhost by the public IP the remote machine is using. - -Once you have navigated to the URL, start the UI by clicking on the following. - -.. code-block:: text - - Kernel -> Restart and Run all - -Features --------- - -The UI supports a search for additional details on Task IDs and Object IDs, a -task timeline, a distribution of task completion times, and time series for CPU -utilization and cluster usage. - -Task and Object IDs -~~~~~~~~~~~~~~~~~~~ - -These widgets show additional details about an object or task given the ID. If -you have the object in Python, the ID can be found by simply calling ``.hex`` on -an Object ID as below: - -.. code-block:: python - - # This will return a hex string of the ID. - objectid = ray.put(1) - literal_id = objectid.hex() - -and pasting in the returned string with no quotes. Otherwise, they can be found -in the task timeline in the output area below the timeline when you select a -task. - -For Task IDs, they can be found by searching for an object ID the task created, -or via the task timeline in the output area. - -The additional details for tasks here can also be found in the task timeline; -the search just provides an easier method to find a specific task when you have -millions. - -Task Timeline -~~~~~~~~~~~~~ - -There are three components to this widget: the controls for the widget at the -top, the timeline itself, and the details area at the bottom. In the controls, -you first select whether you want to select a subset of tasks via the time they -were completed or by the number of tasks. You can control the percentages either -via a double sided slider, or by setting specific values in the text boxes. If -you choose to select by the number of tasks, then entering a negative number N -in the text field denotes the last N tasks run, while a positive value N denotes -the first N tasks run. If there are ten tasks and you enter -1 into the field, -then the slider will show 90% to 100%, where 1 would show 0% to 10%. Finally, -you can choose if you want edges for task submission (if a task invokes another -task) or object dependencies (if the result from a task is passed to another -task) to be added, and if you want the different phases of a task broken up into -separate tasks in the timeline. - -For the timeline, each node has its own dropdown with a timeline, and each row -in the dropdown is a worker. Moving and zooming are handled by selecting the -appropiate icons on the floating taskbar. The first is selection, the second -panning, the third zooming, and the fourth timing. To shown edges, you can -enable Flow Events in View Options. - -If you have selection enabled in the floating taskbar and select a task, then -the details area at the bottom will fill up with information such as task ID, -function ID, and the duration in seconds of each phase of the task. - -Time Distributions and Time Series -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The completion time distribution, CPU utilization, and cluster usage all have -the same task selection controls as the task timeline. - -The task completion time distribution tracks the histogram of completion tasks -for all tasks selected. - -CPU utilization gives you a count of how many CPU cores are being used at a -given time. As typically each core has a worker assigned to it, this is -equivalent to utilization of the workers running in Ray. - -Cluster Usage gives you a heat-map with time on the x-axis, node IP addresses on -the y-axis, and coloring based on how many tasks were running on that node at -that given time. - -Troubleshooting ---------------- - -The Ray timeline visualization may not work in Firefox or Safari. diff --git a/python/ray/node.py b/python/ray/node.py index 9131ee352..5f3d89be6 100644 --- a/python/ray/node.py +++ b/python/ray/node.py @@ -84,7 +84,6 @@ class Node(object): self._plasma_store_socket_name = None self._raylet_socket_name = None self._webui_url = None - self._dashboard_url = None else: self._plasma_store_socket_name = ( ray_params.plasma_store_socket_name) @@ -306,7 +305,7 @@ class Node(object): def start_dashboard(self): """Start the dashboard.""" stdout_file, stderr_file = self.new_log_files("dashboard", True) - self._dashboard_url, process_info = ray.services.start_dashboard( + self._webui_url, process_info = ray.services.start_dashboard( self.redis_address, self._temp_dir, stdout_file=stdout_file, @@ -317,13 +316,15 @@ class Node(object): self.all_processes[ray_constants.PROCESS_TYPE_DASHBOARD] = [ process_info ] + redis_client = self.create_redis_client() + redis_client.hmset("webui", {"url": self._webui_url}) def start_ui(self): """Start the web UI.""" stdout_file, stderr_file = self.new_log_files("webui") notebook_name = self._make_inc_temp( suffix=".ipynb", prefix="ray_ui", directory_name=self._temp_dir) - self._webui_url, process_info = ray.services.start_ui( + _, process_info = ray.services.start_ui( self._redis_address, notebook_name, stdout_file=stdout_file, diff --git a/python/ray/worker.py b/python/ray/worker.py index bb7406555..c8012de33 100644 --- a/python/ray/worker.py +++ b/python/ray/worker.py @@ -1870,9 +1870,6 @@ def connect(info, if hasattr(main, "__file__") else "INTERACTIVE MODE") } worker.redis_client.hmset(b"Drivers:" + worker.worker_id, driver_info) - if (not worker.redis_client.exists("webui") - and info["webui_url"] is not None): - worker.redis_client.hmset("webui", {"url": info["webui_url"]}) elif mode == WORKER_MODE: # Register the worker with Redis. worker_dict = {