Commit Graph

51 Commits

Author SHA1 Message Date
Edward Oakes b796de4104 [metrics] Check that all tag_keys are set when recording (#13420) 2021-01-20 13:09:44 -06:00
fyrestone 05ad4c7499 [Dashboard] Optimize dashboard datacenter (#11391)
* Optimize dashboard datacenter

* Fix tests

* Fix tests

* Fix

* Fix CI

* python/build-wheel-macos.sh

Co-authored-by: 刘宝 <po.lb@antfin.com>
Co-authored-by: Max Fitton <maxfitton@anyscale.com>
2020-10-27 23:49:31 -07:00
Max Fitton caf3b04b27 [Dashboard] Turn on new dashboard by default pt 2 (#11510) 2020-10-23 15:52:14 -05:00
Max Fitton cdca5af53b Revert "[Dashboard] Turn on New Dashboard by Default (#11321)" (#11502)
This reverts commit f500292d41.
2020-10-20 10:53:10 -05:00
Max Fitton f500292d41 [Dashboard] Turn on New Dashboard by Default (#11321) 2020-10-19 12:31:11 -05:00
Sumanth Ratna 9da7bdcc8e Use master for links to docs in source (#10866) 2020-09-19 00:30:45 -07:00
Max Fitton 3e8164ff8a [Dashboard] Logical View Actor Class Grouping Details (#10453)
* wip

* wip

* wip

* wip

* Need to track the timestamp actors are created for the dashboard. This adds that functionality back in and deletes unused code

* Add the materialui lab packages to get access to the Alert component and fix up some vulnerabilities with npm audit.

* Finish supporting information on a per-actor-class basis in the logical view, add bug fixes around timestamps and infeasible task names, and add a new warning popup that shows if there are infeasible actors around.

* lint and add seconds annotation to actor lifetime values

* real lint

* remove typo

* Somehow missed something last lint

* Add new comments for actor states

* Add underscores to some private functions

* Add tooltips to the actor states on the logical view

* change test metrics to be aligned with new changes.

* lint

* Remove some unnecessary log lines and catch error that happens when we try to decode data from an unexpected source

* Re-add a function I had removed. It is used in the Java codebase.

Co-authored-by: Max Fitton <max@semprehealth.com>
2020-09-09 10:34:54 -07:00
Eric Liang 519354a39a [api] Initial API deprecations for Ray 1.0 (#10325) 2020-08-28 15:03:50 -07:00
SangBin Cho f35339b5ff [Dashboard] Change default ip address for the dashboard to ipv4 (#10287)
* Done.

* Add todo.

* Addressed code review.

* Fix issue.

* Fix test failure.

* Fix a test.
2020-08-27 14:43:10 -07:00
Max Fitton 9c5e5a9757 [Dashboard] Fix and Recommit Reverted Group by Actor Class PR (#10186)
* Revert "Revert "[Dashboard] Group by Actor Class (#10147)" (#10180)"

This reverts commit e4d2ca620a.

* Fix metrics test to agree with the new logical view API

* lint2

Co-authored-by: Max Fitton <max@semprehealth.com>
2020-08-18 20:55:58 -07:00
SangBin Cho ec2f1a225e [Stats] Metrics Export User Interface Part 1 (#9913)
* Metrics export port expose done.

* Support exposing metrics port + metrics agent service discovery through ray.nodes()

* Formatting.

* Added a doc.

* Linting.

* Change the location of metrics agent port.

* Addressed code review.

* Addressed code review.
2020-08-06 16:16:29 -07:00
Robert Nishihara db0d6e8efa Make wait_for_condition raise exception when timing out. (#9710) 2020-07-26 22:56:32 -07:00
ZhuSenlin a269ae9bc4 [GCS] Fix actor task hang when its owner exits before local dependencies resolved (#8045) 2020-07-27 10:56:52 +08:00
Hao Chen d49dadf891 Change Python's ObjectID to ObjectRef (#9353) 2020-07-10 17:49:04 +08:00
Zhuohan Li 8a76f4cbb5 [Core] put small objects in memory store (#8972)
* remove the put in memory store

* put small objects directly in memory store

* cast data type

* fix another place that uses Put to spill to plasma store

* fix multiple tests related to memory limits

* partially fix test_metrics

* remove not functioning codes

* fix core_worker_test

* refactor put to plasma codes

* add a flag for the new feature

* add flag to more places

* do a warmup round for the plasma store

* lint

* lint again

* fix warmup store

* Update _raylet.pyx

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-07-09 15:39:40 -07:00
Max Fitton ad09aa985c Make Dashboard Port Configurable (#8999) 2020-06-19 16:26:22 -05:00
Edward Oakes 8a99fd205e [dashboard] Pipe resource assignments to dashboard (#8998) 2020-06-18 11:14:59 -05:00
mehrdadn 101c215125 Get more tests running on Windows (#6537)
* Get rid of system() calls

* Work around '/usr/share/mini' showing up on GitHub Actions (probably due to psutil truncation)

https://github.com/ray-project/ray/runs/722480047?check_suite_focus=true

* Don't check for socket max path length on Windows

* Don't check for socket existence on Windows

* Fix race condition in Windows fate-sharing

* Work around missing .exe extension for Redis tests

* Add more tests to GitHub Actions

Co-authored-by: Mehrdad <noreply@github.com>
2020-06-12 21:32:10 -07:00
SangBin Cho e372c06257 Hotfix dashboard broken tests. (#8757) 2020-06-04 09:44:00 -07:00
SangBin Cho aa1cbe8abc [Dashboard] Ray memory dashboard backend (#8461) 2020-05-21 12:22:28 -07:00
mehrdadn 4bdef78e2e Various CI fixes and cleanup (#8289) 2020-05-05 10:47:49 -07:00
fangfengbin 97430b2d0f GCS adapts to node table pub sub (#8209) 2020-05-05 18:34:41 +08:00
ZhuSenlin 4a81793ba5 GCS-Based actor management implementation (#6763)
* add gcs actor manager

* fix test_metrics.py

* fix TestTaskInfo

* fix comment

* fix comment

* fix comment

* fix comment

* fix comment

* fix comment

* fix compile error

* fix merge error

Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
2020-04-13 09:48:48 -07:00
Edward Oakes d87563937e Revert "[Dashboard] Metrics Export Service. (#7728)" (#7789) 2020-03-28 19:27:34 -07:00
SangBin Cho 7a0befb0a7 [Dashboard] Metrics Export Service. (#7728) 2020-03-26 14:03:00 -07:00
ZhuSenlin 7d08b418fc fix test_worker_stats (#7655)
* fix test_worker_stats

* fix lint error

* fix lint error

Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
2020-03-20 14:53:40 +08:00
fangfengbin fca9dc73e1 Fix test_raylet_pending_tasks test case failed (#7636) 2020-03-19 11:09:38 +08:00
Stephanie Wang fdb528514b [core] Ref counting for actor handles (#7434)
* tmp

* Move Exit handler into CoreWorker, exit once owner's ref count goes to 0

* fix build

* Remove __ray_terminate__ and add test case for distributed ref counting

* lint

* Remove unused

* Fixes for detached actor, duplicate actor handles

* Remove unused

* Remove creation return ID

* Remove ObjectIDs from python, set references in CoreWorker

* Fix crash

* Fix memory crash

* Fix tests

* fix

* fixes

* fix tests

* fix java build

* fix build

* fix

* check status

* check status
2020-03-10 17:45:07 -07:00
Simon Mo 0ddc389830 Fix documentation building with psutil issue (#7077) 2020-02-11 10:00:29 -08:00
SangBin Cho 1e690673d8 Render tasks that are not schedulable on the dashboard. (#7034) 2020-02-10 14:23:06 -08:00
fyrestone 0648bd28ef [xlang] Cross language Python support (#6709) 2020-02-08 13:01:28 +08:00
SangBin Cho ca5a9c6739 Exclude test profiling info endpoint (#7030)
* Skip test_profiling_info_endpoint when pytest running locally

* Fixed formatting.

* Fixed the reason for skipping the test based on pr comments
2020-02-03 16:49:03 -08:00
Simon Mo 396d7fafc8 UI improvement for asyncio (#6905) 2020-01-27 12:45:51 -08:00
Yunzhi Zhang aa5427ca78 [Dashboard] Kill actor (#6906) 2020-01-24 17:21:44 -08:00
Mitchell Stern 33423627ca [Dashboard] Add profiling button to logical view (#6901) 2020-01-24 11:52:14 -08:00
Yunzhi Zhang 0834bda8c1 [Dashboard] Display actor task execution info (#6705)
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
2020-01-22 22:33:55 -08:00
Yunzhi Zhang 3acf3c7675 [Dashboard] Add actor task counter (#6820) 2020-01-17 15:43:56 -08:00
Mitchell Stern 9f96091aef [Dashboard] Add logical view displaying actor tree (#6810)
* [Dashboard] Add logical view displaying actor tree

* Fix key error in test_raylet_info_endpoint
2020-01-17 10:25:27 -08:00
Sven 60d4d5e1aa Remove future imports (#6724)
* Remove all __future__ imports from RLlib.

* Remove (object) again from tf_run_builder.py::TFRunBuilder.

* Fix 2xLINT warnings.

* Fix broken appo_policy import (must be appo_tf_policy)

* Remove future imports from all other ray files (not just RLlib).

* Remove future imports from all other ray files (not just RLlib).

* Remove future import blocks that contain `unicode_literals` as well.
Revert appo_tf_policy.py to appo_policy.py (belongs to another PR).

* Add two empty lines before Schedule class.

* Put back __future__ imports into determine_tests_to_run.py. Fails otherwise on a py2/print related error.
2020-01-09 00:15:48 -08:00
Yunzhi Zhang 816b84808d [Dashboard] Display memory usage of nodes and core workers (#6671) 2020-01-03 20:12:42 -08:00
Robert Nishihara 39a3459886 Remove (object) from class declarations. (#6658) 2020-01-02 17:42:13 -08:00
Yunzhi Zhang 8a0a30b5f0 [Dashboard] display actor status and infeasible tasks (#6652)
* expose actor status and protobuf message of infeasible tasks

* move infeasible tasks into actor tree

* add pytest for displaying infeasible tasks info

* fix base64 decoding

* fix race condition after #6629 merged
2020-01-02 14:27:59 -08:00
Zhijun Fu 91a98d2295 [rpc] refactor GRPC client (#6637)
* refactor RPC client

* remove unused code

* format

* fix

* resolve comments

* format

* update

* fix

* fix python pb build failure

* lint
2019-12-31 22:28:25 -08:00
Yunzhi Zhang 65acb54553 [Dashboard] Logical view backend for dashboard (#6590) 2019-12-30 13:08:08 -08:00
Robert Nishihara 8724e5ffd5 Start WebUI by default. (#6493) 2019-12-27 13:49:07 -08:00
Yunzhi Zhang bac6f3b61e [Dashboard] Collecting worker stats in node manager and implement webui display in the backend (#6574) 2019-12-22 17:50:23 -08:00
Eric Liang 53641f1f74 Move more unit tests to bazel (#6250)
* move more unit tests to bazel

* move to avoid conflict

* fix lint

* fix deps

* seprate

* fix failing tests

* show tests

* ignore mismatch

* try combining bazel runs

* build lint

* remove tests from install

* fix test utils

* better config

* split up

* exclusive

* fix verbosity

* fix tests class

* cleanup

* remove flaky

* fix metrics test

* Update .travis.yml

* no retry flaky

* split up actor

* split basic test

* split up trial runner test

* split stress

* fix basic test

* fix tests

* switch to pytest runner for main

* make microbench not fail

* move load code to py3

* test is no longer package

* bazel to end
2019-11-24 11:43:34 -08:00
Philipp Moritz f24d96ec4f Revert "Try to enable dashboard (again) (#6069)" (#6159)
This reverts commit 4044af8520.
2019-11-13 12:32:12 -08:00
Eric Liang 4044af8520 Try to enable dashboard (again) (#6069)
* Revert "Revert "Enable the Ray dashboard by default (#5976)" (#6068)"

This reverts commit 1a3e97cf23.

* fix tests that assume the dashboard isn't a job

* travis
2019-11-08 10:48:48 -08:00
Eric Liang eef4ad3bba Report census view data as part of raylet node stats (#6060) 2019-11-01 14:26:09 -07:00