Commit Graph

3262 Commits

Author SHA1 Message Date
Ameer Haj Ali 8d599bb3f5 [autoscaler] Move fill out resources to bootstrap config to cache the resources and avoid expensive boto3 calls (#12028) 2020-11-16 13:28:57 -08:00
fyrestone 0c6bb745cd Fix dashboard agent use incorrect ip (#12038) 2020-11-16 14:02:20 -06:00
SangBin Cho f56d7c1a76 [Logging] Remove per worker job log file / support worker log rotation (#11927)
* In progress.

* MVP done.

* In Progress.

* Remove unnecessay code.

* Fix some issues.

* Fix test failures.

* Addressed code review + fix object spilling test failure.
2020-11-16 11:29:43 -08:00
Kai Fricke 8609e2dd90 [tune] refactor verbosity levels (#11767)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-16 10:32:53 -08:00
Keqiu Hu a50128079d [tune/placement group] dist. training placement group support (#11934)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-16 01:11:39 -08:00
fangfengbin 8fb926565c [Placement Group]Placement Group supports gcs failover (Part1) (#11933) 2020-11-16 14:42:56 +08:00
dHannasch d35de2272d [Core] Allow redis.ResponseError instead of redis.AuthenticationError (#12024)
* redis.ResponseError

* there really is no way to make this look good, is there
2020-11-15 15:04:56 -08:00
Simon Mo ac9610b19d [Autoscaler] Precisely match docker HOME (#12020)
* [Autoscaler] Precisely match docker HOME

The current grep will match any env variable keyed by HOME. This will
include some unwanted variables like PYTHONHOME, PROJECT_HOME, etc.
Depending on the order of the environment variable, the subsequent
docker setup command might fail.

* fstring
2020-11-15 11:49:50 -08:00
Richard Liaw 8b3f79f307 [tune] refactor and add examples (#11931) 2020-11-14 20:43:28 -08:00
dHannasch 5891759a3e Clarify get_node_ip_address docstring (#11881) 2020-11-14 15:20:58 -08:00
dHannasch 9fbeefd604 Distinguish a bad --redis-password from any other Redis error (#11893) 2020-11-13 17:39:44 -06:00
Simon Mo 277558895d [Serve] Introduce Long Polling (#11905) 2020-11-13 13:17:20 -08:00
Eric Liang 00ef1179c0 [object spilling] Autocreate dir if not exists (#11999) 2020-11-13 12:13:06 -08:00
Ian Rodney f936ea35fe [hotfix] Fix ResourceDemandScheduler (#11996)
* [hotfix] Fix ResourceDemandScheduler

* fix test_autoscaler
2020-11-13 00:42:16 -08:00
Ian Rodney 3b56a1a522 [docker] auto-populate shared memory size (#11953) 2020-11-12 17:22:42 -08:00
Barak Michener 272edcca94 [ray_client]: Implement function calls (#11922) 2020-11-12 16:49:34 -08:00
Eric Liang a6a8e777f3 [autoscaler] Interpret autoscaling_speed as 1/x-1 of previous target util fraction (#11961)
* tweak

* update
2020-11-12 16:23:50 -08:00
Ian Rodney 9254de0b02 [autoscaler] Fix custom node resources on head (#11896) 2020-11-12 10:30:04 -08:00
Gekho457 ad639f12d8 [autoscaler/k8s] Preliminary k8s operator (#11929) 2020-11-12 11:58:02 -06:00
Kai Fricke 02c02369ca [tune] Fix hpo randint limits (#11946)
Co-authored-by: Sumanth Ratna <sumanthratna@gmail.com>
2020-11-12 08:45:49 -08:00
Kristian Hartikainen 07f401d99d [tune] Fix unflatten dict (#11948) 2020-11-12 08:43:15 -08:00
Lee moon soo 9920933e31 [docker] Support non-root container (#11407) 2020-11-12 08:41:50 -08:00
SangBin Cho f80d812799 [Object Spilling] Introduce SpillWorker & RestoreWorker Pool to avoid IO worker deadlock. (#11885) 2020-11-11 18:20:14 -08:00
Edward Oakes 73a1cb702b Split _get_node_provider_cls off from _get_node_provider (#11949) 2020-11-11 16:10:46 -06:00
Ameer Haj Ali 85197deece [autoscaler] Remove legacy autoscaler (#11802) 2020-11-11 13:36:48 -08:00
dHannasch 396ae0b7c2 Add docstring for find_redis_address (#11884) 2020-11-11 12:24:36 -06:00
Siyuan (Ryans) Zhuang b8dda0e3d0 [Serialization] Fix buffer alignment issues (#11888)
* fix buffer alignment issues

* remove unused fields

* aligned memory allocation

* windows compat

* license. fix compiler warnings

* fix compilation error

* reinterpret_cast
2020-11-10 23:44:16 -08:00
Alex Wu 8afd2acdc1 [Autoscaler] simulator placement groups (#11777) 2020-11-10 18:10:36 -08:00
Eric Liang 46f3652102 Remove repeat push timeout from object manager (#11874) 2020-11-10 16:26:53 -08:00
Keqiu Hu 0c1bdaef59 [tune] TensorFlow Distributed Trainable (#11876)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-10 14:59:08 -08:00
Richard Liaw 50dbf1a307 [core] Support configurable number of "check for redis" attempts (#11902)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-10 14:57:57 -08:00
Ian Rodney 1d158dda32 [serve] Rename to use replicas, not workers (#11822) 2020-11-10 11:36:15 -08:00
Eric Liang 9b8218aabd [docs] Move all /latest links to /master (#11897)
* use master link

* remae

* revert non-ray

* more

* mre
2020-11-10 10:53:28 -08:00
Nikita Vemuri aba9288615 [Autoscaler] Introduce callback system (#11674)
Co-authored-by: Nikita Vemuri <nikitavemuri@Nikitas-MacBook-Pro.local>
Co-authored-by: Xiayue Charles Lin <xcl@anyscale.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-09 20:03:15 -08:00
Eric Liang ee2da0cf45 [Core] PushManager for reliable broadcast (#11869) 2020-11-09 18:01:47 -08:00
Benjamin Black 1999266bba Updated pettingzoo env to acomidate api changes and fixes (#11873)
* Updated pettingzoo env to acomidate api changes and fixes

* fixed test failure

* fixed linting issue

* fixed test failure
2020-11-09 16:09:49 -08:00
Eric Liang a9cf0141a0 [autoscaler] Fix semantics of request_resources (#11820) 2020-11-09 14:57:40 -08:00
Edward Oakes 1c132f2ff8 [serve] Improve DEBUG logging for understanding perf (#11838) 2020-11-09 14:10:42 -06:00
architkulkarni adcaabcd64 [Serve] Reconfigure backend class at runtime (#11709) 2020-11-09 14:04:51 -06:00
Kai Fricke 287aba6dc3 [tune] schedulers: Add test for context finalization (#11889) 2020-11-09 11:37:05 -08:00
Richard Liaw a09e49ee94 [core] Add retry for reading session name (#11844) 2020-11-09 11:22:50 -08:00
Kai Fricke 88be1ea20b [tune] Handle infinite and NaN values (#11835) 2020-11-09 11:18:31 -08:00
Eric Liang 0932320eb3 Move test_joblib back to new_scheduler_broken category (#11872) 2020-11-07 20:08:41 -08:00
Stephanie Wang 61e41257e7 [Object spilling] Queue failed object creation requests until objects have been spilled (#11796)
* Queue creation requests

* Cleanup disconnected clients

* Remove unused

* todo

* FIFO order for create requests, remove warmup for IO workers

* test and lint

* disable test

* lint

* Skip on windows
2020-11-06 18:22:19 -05:00
Amog Kamsetty 900a48c19c [Tune] Better warnings/exceptions for fail_fast='raise' (#11842) 2020-11-06 15:01:55 -08:00
Aaron Miller 045fed5cd2 [examples] comment out rsync_ settings for K8S (#11862) 2020-11-06 14:35:21 -08:00
Simon Mo 871cde989a Re-Revert: [Serialization] Update CloudPickle to 1.6.0 (#9694) (#11837) 2020-11-06 12:24:36 -08:00
Kishan Sagathiya c5e6c90e1e [Core] Add name of actor in the result of ray.actors() (#11828)
Added name field to `actor_info`

Fixes #11112
2020-11-06 10:45:44 -08:00
Philipp Moritz 28e7439cf0 [doc] Add documentation for Ray debugger (#11815)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-05 16:25:27 -08:00
Barak Michener 27c810a97e Basic protos for ray client (#11762) 2020-11-05 16:23:54 -08:00