Commit Graph

63 Commits

Author SHA1 Message Date
Simon Mo dac8b3d58a [CI] Enable Dashboard tests for master (#13425) 2021-01-15 09:43:34 -08:00
Simon Mo 321bbe1ffb [Dashboard] Fix GPU resource rendering issue (#13388) 2021-01-14 12:23:21 -08:00
fyrestone 4853aa96cb [Dashboard] Fix missing actor pid (#13229) 2021-01-13 16:45:12 +08:00
fyrestone a6d135a072 [Dashboard] Add GET /log_proxy API (#13165) 2021-01-08 11:45:07 +08:00
SangBin Cho 32dc5676b4 [Metrics] Record per node and raylet cpu / mem usage (#12982)
* Record per node and raylet cpu / mem usage

* Add comments.

* Addressed code review.
2021-01-05 21:57:21 -08:00
Edward Oakes ef6d859e9b [dashboard] Fix RAY_RAYLET_PID KeyError on Windows (#12948) 2020-12-31 10:54:40 -06:00
fyrestone 6a54897577 Job module without submission (#13081)
Co-authored-by: 刘宝 <po.lb@antfin.com>
2020-12-31 11:12:17 +08:00
Max Fitton 25f7bdc0d8 [Bugfix][Dashboard] Fix undefined logCount, errorCount UI crash (#13113) 2020-12-30 14:19:56 -06:00
Alex Wu 8df94e33e0 [Autoscaler] New output log format (#12772) 2020-12-23 12:02:55 -08:00
fyrestone 62a5832007 [Dashboard] Add GET /logical/actors API (#12913) 2020-12-23 11:14:23 +08:00
Eric Liang 03a5b90ed6 Revert "Revert "Increase the number of unique bits for actors to avoi… (#12990) 2020-12-21 15:16:42 -08:00
Eric Liang 64c97d25d3 Enable by default new scheduler (#12735) 2020-12-19 13:22:24 -08:00
Eric Liang 5d987f5988 Revert "Increase the number of unique bits for actors to avoid handle collisions (#12894)" (#12988)
This reverts commit 3e492a79ec.
2020-12-18 23:51:44 -08:00
Eric Liang 3e492a79ec Increase the number of unique bits for actors to avoid handle collisions (#12894) 2020-12-18 15:59:03 -08:00
Edward Oakes 261b2f9053 Check for raylet PID as ppid in dashboard agent fate-sharing (#12867) 2020-12-15 12:13:11 -06:00
Max Fitton d0813c1c58 [Dashboard] Add dashboard multi-node churn test (#11768) 2020-12-14 17:03:33 -06:00
Max Fitton ac24d1db30 [Dashboard][Bugfix] Fix GPU List Bug (#12666)
* Fix bug where None was passed as the empty value for ActorInfo.gpu_stats instead of an empty list

* lint

* dashboard/modules/logical_view

* fix test

* trigger build
2020-12-12 23:34:24 -08:00
Stephanie Wang a776209aec Revert "Fix dashboard agent check ppid is raylet pid (#12256)" (#12729)
This reverts commit 3ce9286977.
2020-12-09 17:20:38 -05:00
fyrestone 3ce9286977 Fix dashboard agent check ppid is raylet pid (#12256)
* Dashboard agent check ppid is raylet pid

* Improve implementation

* Refine code

* Make the RAY_NODE_PID environment required for dashboard agent

Co-authored-by: 刘宝 <po.lb@antfin.com>
2020-12-09 09:12:34 -05:00
Sumanth Ratna b7404e7955 [dashboard] Resolve npm vulnerabilities (#12620)
* npm audit fix

* npm dedupe
2020-12-08 10:26:49 -08:00
SangBin Cho 162f361dab [Logging] Fix log monitor issue (#12588)
* Try fixing issues.

* Verficiation.
2020-12-07 22:01:18 -08:00
Max Fitton cc2f43c826 [Dashboard][Bugfix] Fix bug in display of worker logs and errors in Dashboard (#12660)
* Fix bug with worker logs/errors not displaying in the dashboard

* Add error endpoint test.

* lint
2020-12-07 21:41:13 -08:00
Max Fitton 34b9c7449b [Dashboard] Fix object store memory display. (#12664) 2020-12-07 21:40:49 -08:00
Max Fitton a5c846c83b [Dashboard][Bugfix] Filter dead nodes from Machine View (fixes duplicate node issue) (#12579) 2020-12-02 14:08:14 -08:00
SangBin Cho 8223a33bff [Logging] Log rotation on all components (#12101)
* In Progress.

* Done.

* Fix the issue.

* Add wait for condition because logs are not written right away now.

* debug string.

* lint.

* Fix flaky test.

* Fix issues.

* Fix test.

* lint.
2020-11-30 19:03:55 -08:00
Max Fitton 2708b3abbc [Dashboard][Bug] Fix duplicate node total rows in dashboard (#12410)
* Fix duplicate node total rows in dashboard by changing the react key of the NodeTotalRow component from the node IP to the node ID (node IP can be duplicated in the case of docker).

* simplify a piece of test code and fix a flaky time out

* lint
2020-11-30 18:43:09 -08:00
SangBin Cho 753cda2f28 [Dashboard] Delete old dashboard (#12144)
* Delete old dashboard from repo.

* Delete old dashboard from repo. 2
2020-11-25 11:31:02 -08:00
Max Fitton 2e95552f0c [Dashboard] Defensive change to make sure we do not iterate over "None" in the case that workers is not present in node physical stats for a given node (#12358) 2020-11-25 11:06:45 -08:00
SangBin Cho 5fb410cfbf [Dashboard] New dashboard view data doesn't exist. (#12129)
* Fix.

* Fix the issue.
2020-11-19 11:04:59 -08:00
SangBin Cho 7d67af6c2a [Metrics] Add stats to measure process startup time + scheduling stats. (#12100)
* Add new stats.

* Fix issues.
2020-11-19 11:04:26 -08:00
fyrestone 0c6bb745cd Fix dashboard agent use incorrect ip (#12038) 2020-11-16 14:02:20 -06:00
Max Fitton f545418c3f [Dashboard] Fix dashboard regression caused by logCount and errCount being removed from worker payload (#11954) 2020-11-11 14:55:54 -08:00
Eric Liang 9b8218aabd [docs] Move all /latest links to /master (#11897)
* use master link

* remae

* revert non-ray

* more

* mre
2020-11-10 10:53:28 -08:00
Max Fitton 368b14a0da Stop dashboard from erroring when an actor does not have a corresponding core worker (#11870) 2020-11-09 11:36:34 -06:00
Max Fitton d352feadf0 [Dashboard] Memory Page Loading Wheel (#11651)
* Switch memory view loading message over to a loading wheel to make UX less confusing.

* lint

Co-authored-by: Max Fitton <max@semprehealth.com>
2020-11-03 09:37:30 -08:00
Max Fitton 3202ff74c2 [Dashboard] Don't show GPU columns if no GPU in cluster (#11704) 2020-11-02 18:07:27 -06:00
Max Fitton b4df42b027 [Dashboard] Make Infeasible Actor UX Less Scary (#11654)
* Update infeasible actor UI so that it only shows infeasible for an ActorClassGroup if at least one actor in the class is infeasible

* lint
2020-10-29 23:12:43 -07:00
Max Fitton d6628cdbfb [Dashboard] Fix null gpu utilization (#11650)
* update dashboard to work if GPU utilization field is missing from GPU payload

* lint

* lint
2020-10-29 23:11:50 -07:00
fyrestone 05ad4c7499 [Dashboard] Optimize dashboard datacenter (#11391)
* Optimize dashboard datacenter

* Fix tests

* Fix tests

* Fix

* Fix CI

* python/build-wheel-macos.sh

Co-authored-by: 刘宝 <po.lb@antfin.com>
Co-authored-by: Max Fitton <maxfitton@anyscale.com>
2020-10-27 23:49:31 -07:00
Max Fitton caf3b04b27 [Dashboard] Turn on new dashboard by default pt 2 (#11510) 2020-10-23 15:52:14 -05:00
Max Fitton cdca5af53b Revert "[Dashboard] Turn on New Dashboard by Default (#11321)" (#11502)
This reverts commit f500292d41.
2020-10-20 10:53:10 -05:00
Max Fitton 0a9cc9cce5 Revert "remove .fake build files (#11478)" (#11488)
This reverts commit 3ed3dea004.
2020-10-19 18:48:32 -07:00
Max Fitton 3ed3dea004 remove .fake build files (#11478)
Co-authored-by: Max Fitton <max@semprehealth.com>
2020-10-19 15:36:47 -07:00
Max Fitton f500292d41 [Dashboard] Turn on New Dashboard by Default (#11321) 2020-10-19 12:31:11 -05:00
Edward Oakes 798bd6a359 [dashboard] Add /api/cluster_status endpoint (#11456) 2020-10-19 11:00:47 -05:00
Max Fitton cd9dcfca0d [Dashboard] CPU/GPU usage details in actor pane (#11269) 2020-10-13 20:23:23 -05:00
fyrestone defd41aad7 [Dashboard] http route handler cache (#10921)
* Add aiohttp_cache to dashboard

* Add comments; Refine code

* Keep NODE_STATS_UPDATE_INTERVAL_SECONDS 1 second; Change AIOHTTP_CACHE_TTL_SECONDS to 2 seconds

* Update merge

Co-authored-by: 刘宝 <po.lb@antfin.com>
2020-10-09 22:27:05 -07:00
Max Fitton ff6d412ad9 [Dashboard] Add API support for the logical view and machine view in new backend (#11012)
* Add API support for the logical view and machine view, which lean on datacenter in common.

* Update dashboard/datacenter.py

Co-authored-by: fyrestone <fyrestone@outlook.com>

* Update dashboard/modules/logical_view/logical_view_head.py

Co-authored-by: fyrestone <fyrestone@outlook.com>

* Address PR comments

* lint

* Add dashboard tests to CI build

* Fix integration issues

* lint

Co-authored-by: Max Fitton <max@semprehealth.com>
Co-authored-by: fyrestone <fyrestone@outlook.com>
2020-10-02 17:58:44 -07:00
Max Fitton 5a42ed1848 [Dashboard] Add support for new backend to existing front-end (#11013)
* Trying to commit on top of old code again

* address comment

Co-authored-by: Max Fitton <max@semprehealth.com>
2020-10-02 12:46:47 -07:00
Max Fitton 6ed8459f25 [Dashboard] Add tune API to support tune tab in new backend (#11009)
* Add tune API to support tune tab in new backend

* Make requested changes

Co-authored-by: Max Fitton <max@semprehealth.com>
2020-10-02 11:48:48 -07:00