Commit Graph

73 Commits

Author SHA1 Message Date
Alex Wu 02938f3a21 [hotfix] Disable dashboard agent windows (#14062) 2021-02-11 17:54:55 -08:00
Dominic Ming 4b60c388ef [Dashboard] fix new dashboard entrance and some table problem (#13790) 2021-01-30 10:42:16 +08:00
Dominic Ming 752da83bb7 [Dashboard] Add the new dashboard code and prompt users to try it (#11667) 2021-01-29 15:22:26 +08:00
Tao Wang 56ee6ef55f [GCS]only update states related fields when publish actor table data (#13448) 2021-01-28 11:12:57 +08:00
Clark Zinzow 2d34e95c93 Don't gather check_parent_task on Windows, since it's undefined. (#13700) 2021-01-27 09:19:58 -08:00
Amog Kamsetty d96a9fa192 Revert "Revert "[dashboard] Fix RAY_RAYLET_PID KeyError on Windows (#12948)" (#13572)" (#13685)
This reverts commit c4a710369b.
2021-01-25 10:35:25 -08:00
Ameer Haj Ali b7dd7ddb52 deprecate useless fields in the cluster yaml. (#13637)
* prepare for head node

* move command runner interface outside _private

* remove space

* Eric

* flake

* min_workers in multi node type

* fixing edge cases

* eric not idle

* fix target_workers to consider min_workers of node types

* idle timeout

* minor

* minor fix

* test

* lint

* eric v2

* eric 3

* min_workers constraint before bin packing

* Update resource_demand_scheduler.py

* Revert "Update resource_demand_scheduler.py"

This reverts commit 818a63a2c86d8437b3ef21c5035d701c1d1127b5.

* reducing diff

* make get_nodes_to_launch return a dict

* merge

* weird merge fix

* auto fill instance types for AWS

* Alex/Eric

* Update doc/source/cluster/autoscaling.rst

* merge autofill and input from user

* logger.exception

* make the yaml use the default autofill

* docs Eric

* remove test_autoscaler_yaml from windows tests

* lets try changing the test a bit

* return test

* lets see

* edward

* Limit max launch concurrency

* commenting frac TODO

* move to resource demand scheduler

* use STATUS UP TO DATE

* Eric

* make logger of gc freed refs debug instead of info

* add cluster name to docker mount prefix directory

* grrR

* fix tests

* moving docker directory to sdk

* move the import to prevent circular dependency

* smallf fix

* ian

* fix max launch concurrency bug to assume failing nodes as pending and consider only load_metric's connected nodes as running

* small fix

* deflake test_joblib

* lint

* placement groups bypass

* remove space

* Eric

* first ocmmit

* lint

* exmaple

* documentation

* hmm

* file path fix

* fix test

* some format issue in docs

* modified docs

* joblib strikes again on windows

* add ability to not start autoscaler/monitor

* a

* remove worker_default

* Remove default pod type from operator

* Remove worker_default_node_type from rewrite_legacy_yaml_to_availble_node_types

* deprecate useless fields

Co-authored-by: Ameer Haj Ali <ameerhajali@ameers-mbp.lan>
Co-authored-by: Alex Wu <alex@anyscale.io>
Co-authored-by: Alex Wu <itswu.alex@gmail.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Ameer Haj Ali <ameerhajali@Ameers-MacBook-Pro.local>
Co-authored-by: root <root@ip-172-31-56-188.us-west-2.compute.internal>
Co-authored-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
2021-01-23 12:06:51 -08:00
Amog Kamsetty c4a710369b Revert "[dashboard] Fix RAY_RAYLET_PID KeyError on Windows (#12948)" (#13572)
This reverts commit ef6d859e9b.
2021-01-22 14:10:24 -06:00
Tao Wang aa5d7a5e6c [Dashboard]Don't set node actors when node_id of actor is Nil (#13573)
* Don't set node actors when node_id of actor is Nil

* add test per comment
2021-01-21 20:18:34 -08:00
Xianyang Liu 4ecd29ea2b [dashboard] Fixes dashboard issues when environments have set http_proxy (#12598)
* fixes ray start with http_proxy

* format

* fixes

* fixes

* increase timeout

* address comments
2021-01-21 20:10:01 -08:00
Simon Mo dac8b3d58a [CI] Enable Dashboard tests for master (#13425) 2021-01-15 09:43:34 -08:00
Simon Mo 321bbe1ffb [Dashboard] Fix GPU resource rendering issue (#13388) 2021-01-14 12:23:21 -08:00
fyrestone 4853aa96cb [Dashboard] Fix missing actor pid (#13229) 2021-01-13 16:45:12 +08:00
fyrestone a6d135a072 [Dashboard] Add GET /log_proxy API (#13165) 2021-01-08 11:45:07 +08:00
SangBin Cho 32dc5676b4 [Metrics] Record per node and raylet cpu / mem usage (#12982)
* Record per node and raylet cpu / mem usage

* Add comments.

* Addressed code review.
2021-01-05 21:57:21 -08:00
Edward Oakes ef6d859e9b [dashboard] Fix RAY_RAYLET_PID KeyError on Windows (#12948) 2020-12-31 10:54:40 -06:00
fyrestone 6a54897577 Job module without submission (#13081)
Co-authored-by: 刘宝 <po.lb@antfin.com>
2020-12-31 11:12:17 +08:00
Max Fitton 25f7bdc0d8 [Bugfix][Dashboard] Fix undefined logCount, errorCount UI crash (#13113) 2020-12-30 14:19:56 -06:00
Alex Wu 8df94e33e0 [Autoscaler] New output log format (#12772) 2020-12-23 12:02:55 -08:00
fyrestone 62a5832007 [Dashboard] Add GET /logical/actors API (#12913) 2020-12-23 11:14:23 +08:00
Eric Liang 03a5b90ed6 Revert "Revert "Increase the number of unique bits for actors to avoi… (#12990) 2020-12-21 15:16:42 -08:00
Eric Liang 64c97d25d3 Enable by default new scheduler (#12735) 2020-12-19 13:22:24 -08:00
Eric Liang 5d987f5988 Revert "Increase the number of unique bits for actors to avoid handle collisions (#12894)" (#12988)
This reverts commit 3e492a79ec.
2020-12-18 23:51:44 -08:00
Eric Liang 3e492a79ec Increase the number of unique bits for actors to avoid handle collisions (#12894) 2020-12-18 15:59:03 -08:00
Edward Oakes 261b2f9053 Check for raylet PID as ppid in dashboard agent fate-sharing (#12867) 2020-12-15 12:13:11 -06:00
Max Fitton d0813c1c58 [Dashboard] Add dashboard multi-node churn test (#11768) 2020-12-14 17:03:33 -06:00
Max Fitton ac24d1db30 [Dashboard][Bugfix] Fix GPU List Bug (#12666)
* Fix bug where None was passed as the empty value for ActorInfo.gpu_stats instead of an empty list

* lint

* dashboard/modules/logical_view

* fix test

* trigger build
2020-12-12 23:34:24 -08:00
Stephanie Wang a776209aec Revert "Fix dashboard agent check ppid is raylet pid (#12256)" (#12729)
This reverts commit 3ce9286977.
2020-12-09 17:20:38 -05:00
fyrestone 3ce9286977 Fix dashboard agent check ppid is raylet pid (#12256)
* Dashboard agent check ppid is raylet pid

* Improve implementation

* Refine code

* Make the RAY_NODE_PID environment required for dashboard agent

Co-authored-by: 刘宝 <po.lb@antfin.com>
2020-12-09 09:12:34 -05:00
Sumanth Ratna b7404e7955 [dashboard] Resolve npm vulnerabilities (#12620)
* npm audit fix

* npm dedupe
2020-12-08 10:26:49 -08:00
SangBin Cho 162f361dab [Logging] Fix log monitor issue (#12588)
* Try fixing issues.

* Verficiation.
2020-12-07 22:01:18 -08:00
Max Fitton cc2f43c826 [Dashboard][Bugfix] Fix bug in display of worker logs and errors in Dashboard (#12660)
* Fix bug with worker logs/errors not displaying in the dashboard

* Add error endpoint test.

* lint
2020-12-07 21:41:13 -08:00
Max Fitton 34b9c7449b [Dashboard] Fix object store memory display. (#12664) 2020-12-07 21:40:49 -08:00
Max Fitton a5c846c83b [Dashboard][Bugfix] Filter dead nodes from Machine View (fixes duplicate node issue) (#12579) 2020-12-02 14:08:14 -08:00
SangBin Cho 8223a33bff [Logging] Log rotation on all components (#12101)
* In Progress.

* Done.

* Fix the issue.

* Add wait for condition because logs are not written right away now.

* debug string.

* lint.

* Fix flaky test.

* Fix issues.

* Fix test.

* lint.
2020-11-30 19:03:55 -08:00
Max Fitton 2708b3abbc [Dashboard][Bug] Fix duplicate node total rows in dashboard (#12410)
* Fix duplicate node total rows in dashboard by changing the react key of the NodeTotalRow component from the node IP to the node ID (node IP can be duplicated in the case of docker).

* simplify a piece of test code and fix a flaky time out

* lint
2020-11-30 18:43:09 -08:00
SangBin Cho 753cda2f28 [Dashboard] Delete old dashboard (#12144)
* Delete old dashboard from repo.

* Delete old dashboard from repo. 2
2020-11-25 11:31:02 -08:00
Max Fitton 2e95552f0c [Dashboard] Defensive change to make sure we do not iterate over "None" in the case that workers is not present in node physical stats for a given node (#12358) 2020-11-25 11:06:45 -08:00
SangBin Cho 5fb410cfbf [Dashboard] New dashboard view data doesn't exist. (#12129)
* Fix.

* Fix the issue.
2020-11-19 11:04:59 -08:00
SangBin Cho 7d67af6c2a [Metrics] Add stats to measure process startup time + scheduling stats. (#12100)
* Add new stats.

* Fix issues.
2020-11-19 11:04:26 -08:00
fyrestone 0c6bb745cd Fix dashboard agent use incorrect ip (#12038) 2020-11-16 14:02:20 -06:00
Max Fitton f545418c3f [Dashboard] Fix dashboard regression caused by logCount and errCount being removed from worker payload (#11954) 2020-11-11 14:55:54 -08:00
Eric Liang 9b8218aabd [docs] Move all /latest links to /master (#11897)
* use master link

* remae

* revert non-ray

* more

* mre
2020-11-10 10:53:28 -08:00
Max Fitton 368b14a0da Stop dashboard from erroring when an actor does not have a corresponding core worker (#11870) 2020-11-09 11:36:34 -06:00
Max Fitton d352feadf0 [Dashboard] Memory Page Loading Wheel (#11651)
* Switch memory view loading message over to a loading wheel to make UX less confusing.

* lint

Co-authored-by: Max Fitton <max@semprehealth.com>
2020-11-03 09:37:30 -08:00
Max Fitton 3202ff74c2 [Dashboard] Don't show GPU columns if no GPU in cluster (#11704) 2020-11-02 18:07:27 -06:00
Max Fitton b4df42b027 [Dashboard] Make Infeasible Actor UX Less Scary (#11654)
* Update infeasible actor UI so that it only shows infeasible for an ActorClassGroup if at least one actor in the class is infeasible

* lint
2020-10-29 23:12:43 -07:00
Max Fitton d6628cdbfb [Dashboard] Fix null gpu utilization (#11650)
* update dashboard to work if GPU utilization field is missing from GPU payload

* lint

* lint
2020-10-29 23:11:50 -07:00
fyrestone 05ad4c7499 [Dashboard] Optimize dashboard datacenter (#11391)
* Optimize dashboard datacenter

* Fix tests

* Fix tests

* Fix

* Fix CI

* python/build-wheel-macos.sh

Co-authored-by: 刘宝 <po.lb@antfin.com>
Co-authored-by: Max Fitton <maxfitton@anyscale.com>
2020-10-27 23:49:31 -07:00
Max Fitton caf3b04b27 [Dashboard] Turn on new dashboard by default pt 2 (#11510) 2020-10-23 15:52:14 -05:00