Commit Graph

3184 Commits

Author SHA1 Message Date
Ameer Haj Ali 7aade469d0 [autoscaler] fix the autoscaling bug for continuously launching failed nodes (#11714) 2020-10-30 14:12:06 -07:00
Gekho457 8816d34541 Kubernetes rsync verbosity fixed (#11716) 2020-10-30 14:03:42 -07:00
Alan Guo 3c109b45aa Disable validation of cluster config on the cluster to allow for cluster configs with new properties. (#11693) 2020-10-30 14:02:00 -07:00
Eric Liang f9f372c327 [autoscaler] Clean up monitoring loop code (#11677) 2020-10-30 13:48:43 -07:00
SangBin Cho 6e2a1eac36 [Placement Group] Placement group automatic cleanup. (#11546)
* In progress. Done with all placement group manager code.

* It is working with job.

* Finished detached actor implementation.

* Fix minor issue.

* In progress.

* Addressed code review.

* Addressed code review.

* Addressed code reivew.

* Fix a build error.
2020-10-30 10:55:43 -07:00
architkulkarni 4175569d96 [Core] Add option to override environment variables for tasks and actors (#11619) 2020-10-29 14:22:44 -05:00
Simon Mo e82ff08b0c Fix asyncio plasma integration in cluster mode (#11665) 2020-10-29 11:53:10 -07:00
Simon Mo 46afec5660 Mute asyncio warning for Serve (#11682) 2020-10-28 17:05:42 -07:00
Kai Fricke ba63ded311 [tune] better error when metric or mode unset in search algorithms (#11646) 2020-10-28 13:17:59 -07:00
Richard Liaw 58891551d3 [tune] make tests faster + fix flaky test (#10264) 2020-10-28 13:14:54 -07:00
Gekho457 9e63f7ccc3 [autoscaler/k8s] ray up 409 error fix (#11660) 2020-10-28 14:19:57 -05:00
Tao Wang 1d5694ddea [GCS]Use direct getting instead of pub-sub to update load metrics in monitor.py (#11339) 2020-10-28 11:23:18 -07:00
Eric Liang c933477915 [new scheduler] Pass test_basic and add CI builds with flag on (#11635) 2020-10-28 11:02:43 -07:00
Richard Liaw 70ea1fbe30 [sgd] pin ptl to 1.0.3 (#11664)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-28 00:29:01 -07:00
fyrestone 05ad4c7499 [Dashboard] Optimize dashboard datacenter (#11391)
* Optimize dashboard datacenter

* Fix tests

* Fix tests

* Fix

* Fix CI

* python/build-wheel-macos.sh

Co-authored-by: 刘宝 <po.lb@antfin.com>
Co-authored-by: Max Fitton <maxfitton@anyscale.com>
2020-10-27 23:49:31 -07:00
yncxcw c3e246818a [Core] Fix doc string for ray.init() (#11657) 2020-10-27 18:27:22 -07:00
Ameer Haj Ali 1c40950877 [autoscaler] Add the cluster_name to docker file mounts directory prefix to make it more unique (#11600) 2020-10-27 15:33:11 -07:00
Scott Graham c4ae94d60b [autoscaler] Azure deployment fixes (#11613)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-27 15:27:18 -07:00
Richard Liaw 293483ed0b [k8s][minor] fix error handling (#11653)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-27 15:24:07 -07:00
Ian Rodney 3ce852d345 [docker] Synchronize Torch for Tune & RLlib (#11637) 2020-10-27 18:37:25 +01:00
Sven Mika d9f1874e34 [RLlib] Minor fixes (torch GPU bugs + some cleanup). (#11609) 2020-10-27 10:00:24 +01:00
Jack Parker-Holder e7aafd7d24 [tune] PB2 (#11466)
Co-authored-by: Sumanth Ratna <sumanthratna@gmail.com>
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-27 01:03:21 -07:00
Edward Oakes 349c3ec86b Remove errant "self" argument to NodeProvider static method 2020-10-26 22:22:41 -07:00
Simon Mo fe4a78b7c7 [Hotfix] Pin Pydantic Version (#11622) 2020-10-26 16:52:19 -07:00
Kai Fricke 1a1ff28d18 [tune] allow tune search spaces to be passed to search algorithms (#11503) 2020-10-26 12:33:13 -07:00
Richard Liaw 4ad8af9b0d [tune] More PTL example cleanup (#11585) 2020-10-26 12:26:14 -07:00
Sumanth Ratna 11f1bbf03c [tune] use isinstance instead of type for TBXLogger (#11595) 2020-10-25 16:12:44 -07:00
Richard Liaw 1b357533b1 [tune] Try to enable PTL, SKlearn tests (#11542) 2020-10-24 01:08:46 -07:00
Siyuan (Ryans) Zhuang 5ad5cb61ca Remove outdated numpy serializer (#11587) 2020-10-23 22:58:05 -07:00
Raoul Khouri c3c72db69b [tune] fixed validation for search metrics (#11583)
* fixed validation for search metrics

* formatting

* made error report better

* if only one metric is missing extract it from list

* any can take a generator
2020-10-23 17:04:21 -07:00
Clark Zinzow 0979589c7c [dask-on-ray] Convert tuple of object refs to list before ray.get() call. (#11582) 2020-10-23 16:39:22 -07:00
Ian Rodney d3405e74da [autoscaler] SDK fixes (#11517)
* [autoscaler] SDK Fxies

* add docs

* remove all_nodes
2020-10-23 14:09:47 -07:00
Ian Rodney aef96d17bf [yaml] HotFix for correct example full (#11584) 2020-10-23 15:55:07 -05:00
Max Fitton caf3b04b27 [Dashboard] Turn on new dashboard by default pt 2 (#11510) 2020-10-23 15:52:14 -05:00
Kai Fricke 8ee4f7eca3 [tune] fix pbt ptl example (#11573)
* [tune] fix pbt ptl example

* wider smoke test
2020-10-23 12:42:13 -07:00
architkulkarni 1ce0c4965b [Serve] Update front page of serve doc (#11421) 2020-10-23 12:01:04 -07:00
DK.Pino 9f804ade5f [Placement Group]Add get all placement group api (#11460)
* add get all interface for placement group

* add get all interface for placement group

* make it work

* fix lint

* fix lint

* fix comment

* add cpp test

* fix python lint
2020-10-23 11:46:48 -07:00
Richard Liaw e7aa6441b7 [tune] a tiny ptl example (#11497) 2020-10-22 18:50:34 -07:00
Gekho457 2d1f52c21c [autoscaler] Removed .cleanup() from NodeProvider and commands.py (#11543) 2020-10-22 14:46:49 -07:00
Frank Gu 73fa94731f [tune] Add HDFS as Cloud Sync Client (#11524) 2020-10-22 14:12:51 -07:00
Eric Liang 083737c63c Deprecate rsync to all nodes (#11563) 2020-10-22 13:45:42 -07:00
Allen cf2ee94e0c [Autoscaler] Allow users to set the names for security groups created by ray (#11405) 2020-10-22 12:28:59 -07:00
Simon Mo 7111a424af [Serve] Add regression test for #11437 (#11539) 2020-10-22 10:45:18 -07:00
Alex Wu d1182b827a [Autoscaler] Do not count unmanaged nodes in load metrics (#11458)
* fixedd

* lint

* fixed other test case

* .

Co-authored-by: Alex Wu <alex@anyscale.com>
2020-10-21 22:14:21 -07:00
Max Fitton 44fb60b4dd [hotfix] Pin node version (fix linux wheel build) (#11532)
Co-authored-by: Max Fitton <max@semprehealth.com>
2020-10-21 19:10:09 -07:00
Gekho457 155687e0c3 [autoscaler/AWS] Updated AWS Node Provider threading logic (#11422) 2020-10-21 18:42:38 -07:00
Eric Liang 920e4b2ef8 Try to raise ulimit for file descriptors to max allowed; warn if ulimit is still too low (#11515) 2020-10-21 14:29:43 -07:00
Eric Liang e8c77e2847 Remove memory quota enforcement from actors (#11480)
* wip

* fix

* deprecate
2020-10-21 14:29:03 -07:00
Alan Guo 8c82369cad [autoscaler] Add rsync_exclude and rsync_filter options to cluster config (#11512) 2020-10-21 14:28:33 -07:00
Edward Oakes 5d7f271e7d Add --worker-port-list option to ray start (#11481) 2020-10-21 14:46:45 -05:00