7243 Commits

Author SHA1 Message Date
Kai Yang e0b81796c5 Revert "Revert "[Java] fix test hang occasionally when running FailureTest (#13934)" (#13992)" (#14008) 2021-02-09 12:43:26 -08:00
Simon Mo f51c26bae6 Revert "[Core]Fix ray.kill doesn't cancel pending actor bug (#13254)" (#14013)
This reverts commit 2092b097ea.
2021-02-09 11:36:38 -08:00
Alex Wu 1dcdfe9101 [autoscaler/dashboard] Publish resource usage in units of bytes (#14002) 2021-02-09 10:27:26 -08:00
Crissman Loomis 43083b9653 [docs] optuna variable typo (#14006)
* fix variable name typo

* align
2021-02-09 09:51:29 -08:00
Kai Fricke 3c8b164882 [tune] pass trainable function name when using tune.with_parameters (#14009) 2021-02-09 08:51:14 -08:00
Sven Mika d7301a51f4 [RLlib]: Trajectory View API: Keep env infos (e.g. for postprocessing callbacks), no matter what. (#13555) 2021-02-09 17:05:26 +01:00
fangfengbin 2092b097ea [Core]Fix ray.kill doesn't cancel pending actor bug (#13254) 2021-02-09 10:59:14 +08:00
Simon Mo 914696ac3f Skip placement tests on Windows (#14000) 2021-02-08 18:27:11 -08:00
Dmitri Gekhtman 081f3e5f07 [autoscaler][kubernetes] Ray client setup, example config simplification, example scripts. (#13920) 2021-02-08 20:00:34 -06:00
Ameer Haj Ali 1643bc5c4f Fix autoscaler wrong parameter names (#13966)
* prepare for head node

* move command runner interface outside _private

* remove space

* Eric

* flake

* min_workers in multi node type

* fixing edge cases

* eric not idle

* fix target_workers to consider min_workers of node types

* idle timeout

* minor

* minor fix

* test

* lint

* eric v2

* eric 3

* min_workers constraint before bin packing

* Update resource_demand_scheduler.py

* Revert "Update resource_demand_scheduler.py"

This reverts commit 818a63a2c86d8437b3ef21c5035d701c1d1127b5.

* reducing diff

* make get_nodes_to_launch return a dict

* merge

* weird merge fix

* auto fill instance types for AWS

* Alex/Eric

* Update doc/source/cluster/autoscaling.rst

* merge autofill and input from user

* logger.exception

* make the yaml use the default autofill

* docs Eric

* remove test_autoscaler_yaml from windows tests

* lets try changing the test a bit

* return test

* lets see

* edward

* Limit max launch concurrency

* commenting frac TODO

* move to resource demand scheduler

* use STATUS UP TO DATE

* Eric

* make logger of gc freed refs debug instead of info

* add cluster name to docker mount prefix directory

* grrR

* fix tests

* moving docker directory to sdk

* move the import to prevent circular dependency

* smallf fix

* ian

* fix max launch concurrency bug to assume failing nodes as pending and consider only load_metric's connected nodes as running

* small fix

* improve code readability

* lint

Co-authored-by: Ameer Haj Ali <ameerhajali@ameers-mbp.lan>
Co-authored-by: Alex Wu <alex@anyscale.io>
Co-authored-by: Alex Wu <itswu.alex@gmail.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Ameer Haj Ali <ameerhajali@Ameers-MacBook-Pro.local>
2021-02-08 13:19:33 -08:00
SongGuyang 09242e6d31 random a job id in c++ worker (#13982) 2021-02-08 12:57:25 -08:00
Simon Mo ec94214957 Revert "[Java] fix test hang occasionally when running FailureTest (#13934)" (#13992)
This reverts commit bcf9457abb.
2021-02-08 11:30:30 -08:00
SangBin Cho 0e07b5fa89 [Doc] Update actor resource information (#13909)
* in progress.

* Revert "in progress."

This reverts commit 21a91a47522797210bdc5db9477bd0b02ed9d926.

* done.

* done.
2021-02-08 10:23:57 -08:00
Sven Mika eb0038612f [RLlib] Extend on_learn_on_batch callback to allow for custom metrics to be added. (#13584) 2021-02-08 15:02:19 +01:00
Chace Ashcraft ebeee1d59a [RLlib] Pytorch MAML fix for more than two workers with discrete actions (#13835) 2021-02-08 12:06:02 +01:00
Sven Mika d001af3e59 [RLlib] Allow rllib rollout to run distributed via evaluation workers. (#13718) 2021-02-08 12:05:16 +01:00
Kai Yang bcf9457abb [Java] fix test hang occasionally when running FailureTest (#13934) 2021-02-08 18:21:50 +08:00
Xianyang Liu 918ad84f08 [core] Java worker should respect the user provided node_ip_address (#13732) 2021-02-08 11:59:06 +08:00
Richard Liaw 7231b6b91c [core/client] enable more tests (#13961) 2021-02-07 19:37:52 -08:00
Richard Liaw 3a230fa1a4 [ray_client] close ray connection upon client deactivation (#13919) 2021-02-07 13:11:38 -08:00
Kai Yang 4b4941435d [Java] fix actor restart failure when multi-worker is turned on (#13793) 2021-02-07 21:12:54 +08:00
Devin Petersohn 1412f3c546 [docs] page for using Modin with Ray (#13937)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-06 00:28:04 -08:00
Clark Zinzow f070b3c9a9 [dask-on-ray] Fix Dask-on-Ray test: Python 3 dictionary .values() is a view, and is not indexable (#13945) 2021-02-05 21:21:41 -08:00
Simon Mo ea4154df80 [Hotfix] Master compilation error on MacOS. (#13946) 2021-02-05 16:07:45 -08:00
Travis Addair cbd3598970 [tune] Fixed wait_for_gpu to handle str representations of ordinal IDs (#13936)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-05 15:41:24 -08:00
Hao Chen e1a5e5bad4 Fix test_actor_restart (#13901) 2021-02-05 14:08:43 -08:00
Simon Mo 4a3dd6858d Buildkite determine-to-run support (#13866) 2021-02-05 12:58:07 -08:00
Amog Kamsetty f44f368eae [Tune] Add try-except to FailureInjectorCallback (#13939) 2021-02-05 11:02:42 -08:00
Eric Liang f782ed59a0 Ray client version check strict eq (#13926) 2021-02-05 00:06:10 -08:00
fyrestone eee624cf5f Revert "Fix passing env on windows (#13253)" (#13828) 2021-02-05 13:03:16 +08:00
fangfengbin 8a5999c12a [GCS]Fix bug that gcs client does not set last_resource_usage_ (#13856) 2021-02-05 11:51:25 +08:00
DK.Pino fb89f9c2c8 [Placement Group] Support named placement group (#13755) 2021-02-05 11:04:51 +08:00
Dmitri Gekhtman 40bad86c7a [hotfix][test][windows] Exclude k8s operator mock test from build. (#13924) 2021-02-04 18:35:10 -08:00
Kathryn Zhou 982c606b86 Add more user-friendly error message upon async def remote task (#13915) 2021-02-04 18:33:33 -08:00
architkulkarni e89bbcbd44 [Serve] Revert "Revert "[Serve] Fix ServeHandle serialization"" and disable failing Windows test (#13771) 2021-02-04 14:50:01 -08:00
Edward Oakes 7af0c999f3 [serve] Built-in support for imported backends (#13867) 2021-02-04 15:09:12 -06:00
Dmitri Gekhtman db59736b1a [autoscaler][kubernetes] Add ability to not copy cluster config to head node when calling create_or_update_head_node. (#13720)
* Add option to skip bootstrapping head node autoscaling config

* don't close remote config before copying

* Type

* Type hints etc.

* test

* Test CR to config conversion

* comment
2021-02-04 10:30:03 -08:00
Kai Fricke 1e113d2e6e [tune/xgboost] Update release test docs (#13880)
* Update release test docs

* Update
2021-02-04 13:10:56 +01:00
Richard Liaw 6c77aeb98a [docs] ray slack remove banners (#13898)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-04 01:14:34 -08:00
Richard Liaw 0fc81e2393 [tune] fix gpu check (#13825)
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-02-04 01:13:58 -08:00
Eric Liang e79a380a7e Check in shuffle code as experimental (#13899) 2021-02-04 00:24:16 -08:00
Clark Zinzow 243f678ffd Fall back to random port instead of default port for non-primary Redis shards; attempt to cluster Redis shard ports close to each other. (#13847) 2021-02-03 22:00:15 -08:00
Alex Wu a13208f113 Scalability envelope readme typo (#13874) 2021-02-03 21:43:45 -08:00
Tao Wang 44aa9c173f Rename timeout to period with heartbeat interval (#13872) 2021-02-04 10:37:28 +08:00
Tao Wang e0d9c8f0a8 Always replace DEL with UNLINK (#13832) 2021-02-04 10:30:00 +08:00
Dmitri Gekhtman 1187d1dd3e [autoscaler][kubernetes][operator] Rudimentary error handling, make "MODIFIED" -> update event work. (#13756) 2021-02-03 20:07:11 -06:00
Eric Liang e8fce9f1f3 Check Ray client protocol version (#13886)
* wip

* wip

* fix tests
2021-02-03 16:44:09 -08:00
Clark Zinzow 407302f93a [Core] Ownership-based Object Directory - Changed infinite short-poll location subscription to long-poll. (#13841) 2021-02-03 14:16:42 -08:00
SangBin Cho cb9fa90203 [Object Spilling] Add consumed bytes to detect thrashing. (#13853) 2021-02-03 14:16:26 -08:00
Barak Michener 77ee2c569f [ray_client] convert things registered for ray into ray_client (#13639) 2021-02-03 13:30:05 -08:00