Commit Graph

3301 Commits

Author SHA1 Message Date
Kai Fricke 608d0378c4 [tune] Add test for infinite trials (#12156) 2020-11-21 12:54:01 -08:00
Eric Liang 839517743d Support ray.* in remote functions for Ray client (#12177) 2020-11-20 13:28:46 -08:00
Richard Liaw 48042be8bb [tune] Avoid dependency on Kubernetes (#12188)
* fix-kubernetes

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

* kub

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-20 13:01:20 -08:00
Simon Mo d200f620ee Deflake test_router (#12175) 2020-11-19 18:37:46 -08:00
dHannasch 4b2c5daf45 State which IP addresses are failing to match. (#11957)
* State which IP addresses are failing to match.

* Use f-string.

* action item?

* I could swear swear this passed with length 80 before

* wait, this is how it wants f-strings

* reword

* action item

* f

Co-authored-by: SangBin Cho <rkooo567@gmail.com>

* f

Co-authored-by: SangBin Cho <rkooo567@gmail.com>

* f

Co-authored-by: SangBin Cho <rkooo567@gmail.com>

Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2020-11-19 17:25:25 -08:00
Eric Liang e72abcd0aa Enable even more new scheduler tests (#12096) 2020-11-19 16:47:18 -08:00
Kai Fricke f1ace386db [tune] detect docker and kubernetes syncers (#12108)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-19 12:17:17 -08:00
SangBin Cho 7d67af6c2a [Metrics] Add stats to measure process startup time + scheduling stats. (#12100)
* Add new stats.

* Fix issues.
2020-11-19 11:04:26 -08:00
Kai Fricke 6999075c75 [tune] Add seed parameter to BOHB (#12160) 2020-11-19 10:27:16 -08:00
Philipp Moritz ff82af1588 Clean up requirements.txt (#12136) 2020-11-19 09:27:09 -08:00
Xianyang Liu 9481ecd180 [data] MLDataset based on ParallelIterator (#11849) 2020-11-19 00:33:37 -08:00
Barak Michener 2fe1321c3f [ray_client] __getattr__ for the API Import interface (#12089)
* move all things that import real-ray into the server folder

* change the import line and have a __getattr__-able API stub

* formatting

* remove unused (duplicated) util file

* Remove module methods (but leave comment on why)
2020-11-18 22:42:02 -08:00
Ian Rodney a74f1885db Revert "[CLI] Fix ray commands when RAY_ADDRESS used (#11989)" (#12135)
* Revert "[CLI] Fix ray commands when RAY_ADDRESS used (#11989)"

This reverts commit d23d326560.

* only check environment for CLI commands

* use new fns

* fixing docs

* rename and return "auto"

* Update python/ray/_private/services.py

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* Update services.py

* Update services.py

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-11-18 22:41:10 -08:00
dHannasch 5bc4976550 More informative error message if ray start fails to connect to Redis (#11880)
* Chain original redis.ConnectionError. More importantly, print out the address so people don't have to dig out --logging-level debug to get the number wait_for_redis_to_start() already knows.

Check the Redis password.

* f
2020-11-18 19:28:10 -08:00
Richard Liaw 0d388c4d31 [autoscaler] remove unnecessary print output (#12131) 2020-11-18 18:33:48 -08:00
Richard Liaw 2bb6db5e64 [tune] temporary revert of verbosity changes (#12132) 2020-11-18 18:27:41 -08:00
Ameer Haj Ali 4717fcd9c0 [autoscaler] give max_workers precedence over min_workers in resource demand scheduler (#12106) 2020-11-18 16:24:48 -08:00
Ameer Haj Ali d826452e0b [autoscaler] fix max_workers bug in resource_demand_scheduler by counting the head node (#12123) 2020-11-18 15:24:38 -08:00
Ian Rodney e086ddc18f [core] Add Recursive task cancelation (#11923) 2020-11-18 15:18:40 -08:00
Alex Wu e9c9ba9c9f [New Scheduler] Don't start tasks if the owner is dead (#12050) 2020-11-18 11:34:19 -08:00
Ameer Haj Ali eef624750c [ray client] ray wait() implementation (#12072) 2020-11-18 11:33:57 -08:00
Kai Fricke 2b60c5774b [tune] cache checkpoint serialization (#12064) 2020-11-18 09:03:53 -08:00
Ian Rodney d23d326560 [CLI] Fix ray commands when RAY_ADDRESS used (#11989)
* [CLI] Fix ray commands when RAY_ADDRESS used

* erics suggestion
2020-11-17 23:44:59 -08:00
Philipp Moritz b96516e9d3 [core] Remove google dependency (#12085) 2020-11-17 19:01:00 -08:00
fangfengbin f400333841 [Placement Group]Placement Group supports gcs failover(Part2) (#12003)
* add testcase

* fix ut

* fix review comment

* fix review comment

* fix review comments

* fix ut bug

* add part code

* add part code

* add part code

* add testcase

* add part code

* fix ut bug

* fix ut timeout bug

* fix ut bug

Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-11-18 10:59:26 +08:00
Simon Mo c476037c97 [Core] Async API should raise on all RayError (#12043)
Before this PR we are raising just RayTaskError, this means errors
like RayActorError(Actor Died) won't be propogated and thrown at
`await object_ref`. This PR fixes that.
2020-11-17 17:20:30 -08:00
Stephanie Wang f6bdd5ab17 [New Scheduler] Spillback from the queue of tasks assigned to the local node (#12084) 2020-11-17 16:13:59 -08:00
Richard Liaw ca44222e03 [minor] log info instead of error upon ray.init rerun (#12025) 2020-11-17 12:59:24 -08:00
fangfengbin 7f050c706b [PlacementGroup]Skip flaky testcase (#12065)
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-11-17 12:21:34 -08:00
Simon Mo d7c95a4a90 [Serve] Rewrite Router to be Embeddable (#12019) 2020-11-17 08:28:18 -08:00
Maksim Smolin 23926f3e6e [CLI] Docker Support (#11761)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-17 00:04:39 -08:00
chaokunyang bea0031491 fix linux wheel build (#9896) 2020-11-17 15:49:42 +08:00
Amog Kamsetty f10cef93c7 [sgd] support operator.device (#12056) 2020-11-16 21:44:27 -08:00
Eric Liang 380df89069 Lazily initialize the global state accessor in Python workers (#12054)
* wip

* fix

* fix
2020-11-16 21:35:12 -08:00
Max Fitton 90574b66cc pin aiohttp to the 3.x.x version (#12051) 2020-11-16 21:54:16 -05:00
Richard Liaw 51d277f2e4 [tests] fix mock for test_cli (#12055)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-16 18:44:15 -08:00
Stephanie Wang c49554fb7a Abstract plasma store creation request queue (#12039) 2020-11-16 17:09:15 -08:00
Kai Fricke 9f5986ee58 [tune] logger migration to ExperimentLogger classes (#11984) 2020-11-16 15:08:37 -08:00
Alan Guo 3dc68533a9 make some private rsync, and exec_cluster arguments public (#11958)
* make some private rsync, and exec_cluster arguments public

* fix format issue

* undo make all_nodes public
2020-11-16 14:31:41 -08:00
Ameer Haj Ali 8d599bb3f5 [autoscaler] Move fill out resources to bootstrap config to cache the resources and avoid expensive boto3 calls (#12028) 2020-11-16 13:28:57 -08:00
fyrestone 0c6bb745cd Fix dashboard agent use incorrect ip (#12038) 2020-11-16 14:02:20 -06:00
SangBin Cho f56d7c1a76 [Logging] Remove per worker job log file / support worker log rotation (#11927)
* In progress.

* MVP done.

* In Progress.

* Remove unnecessay code.

* Fix some issues.

* Fix test failures.

* Addressed code review + fix object spilling test failure.
2020-11-16 11:29:43 -08:00
Kai Fricke 8609e2dd90 [tune] refactor verbosity levels (#11767)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-16 10:32:53 -08:00
Keqiu Hu a50128079d [tune/placement group] dist. training placement group support (#11934)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-16 01:11:39 -08:00
fangfengbin 8fb926565c [Placement Group]Placement Group supports gcs failover (Part1) (#11933) 2020-11-16 14:42:56 +08:00
dHannasch d35de2272d [Core] Allow redis.ResponseError instead of redis.AuthenticationError (#12024)
* redis.ResponseError

* there really is no way to make this look good, is there
2020-11-15 15:04:56 -08:00
Simon Mo ac9610b19d [Autoscaler] Precisely match docker HOME (#12020)
* [Autoscaler] Precisely match docker HOME

The current grep will match any env variable keyed by HOME. This will
include some unwanted variables like PYTHONHOME, PROJECT_HOME, etc.
Depending on the order of the environment variable, the subsequent
docker setup command might fail.

* fstring
2020-11-15 11:49:50 -08:00
Richard Liaw 8b3f79f307 [tune] refactor and add examples (#11931) 2020-11-14 20:43:28 -08:00
dHannasch 5891759a3e Clarify get_node_ip_address docstring (#11881) 2020-11-14 15:20:58 -08:00
dHannasch 9fbeefd604 Distinguish a bad --redis-password from any other Redis error (#11893) 2020-11-13 17:39:44 -06:00