Commit Graph

3213 Commits

Author SHA1 Message Date
Barak Michener 27c810a97e Basic protos for ray client (#11762) 2020-11-05 16:23:54 -08:00
Eric Liang f86c4f992c Fix RAY_ENABLE_NEW_SCHEDULER=1 pytest test_advanced_2.py::test_zero_cpus_actor (#11817) 2020-11-05 16:02:04 -08:00
architkulkarni 347e871409 [Serve] Add dependency management (#11743) 2020-11-05 16:39:37 -06:00
Kai Yang ffc267f94b [Test] Ignore setproctitle for local mode (#11819) 2020-11-05 11:07:34 -08:00
Kai Fricke 603accf1c2 [tune] logger refactor part 3: Add ExperimentLogger class (#11749) 2020-11-05 08:55:38 -08:00
Richard Liaw f6717b8b03 [autoscaler] Support empty node list for kill node (#11810) 2020-11-04 22:40:07 -08:00
Richard Liaw efa07d5403 Revert "Revert "[tune] PB2 (#11466)" (#11795)" (#11812) 2020-11-04 20:47:12 -08:00
Eric Liang 69145d6215 [hotfix] Bazel candidates not found due to raising too early 2020-11-04 16:08:51 -08:00
Ian Rodney 22bbbc3171 [wheel] Fix Manylinux2014 Build (#11811) 2020-11-04 14:50:38 -08:00
Amog Kamsetty 92718de40c [SGD] Better support for custom DDP (#11771) 2020-11-04 13:58:51 -08:00
Ameer Haj Ali ebdf8ba3fa [autoscaler] Support legacy cluster configs with the new resource demand scheduler (#11751) 2020-11-04 12:05:48 -08:00
Kai Yang 31598338b3 [Core] Fix ray start failure to due to bug of redis address detection (#11735)
* Fix ray start failure to due redis address detection bug

* Address comment
2020-11-04 12:04:44 -08:00
Alex Wu 53aac55739 [autoscaler] Autoscaler simulator (#11690) 2020-11-04 12:04:11 -08:00
Akash Patel b7531fb4f5 [redis-py] change redis-py deprecated hmset usage to hset (#11776) 2020-11-03 22:23:02 -08:00
Amog Kamsetty 7248d5f4ae Revert "[tune] PB2 (#11466)" (#11795)
This reverts commit e7aafd7d24.
2020-11-03 21:05:00 -08:00
Kai Fricke 007634fd1b [tune] logger refactor part 2: Add SyncerCallback (#11748)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-03 21:04:40 -08:00
Barak Michener 05c4e3fb2a [build] Build wheels with manylinux2014 (#11621)
* necessary changes

* Split bazel install

* manylinux2014

* change references to manylinux2014

* Fix lint

* port alex's docker build changes

* fix config issue

* remove extra manylinux2010 requirement script

* revert SHA overwrite

* wip

* incompatible_linklibs

* fix nits
2020-11-03 19:36:32 -08:00
Ian Rodney 9527220a86 [serve] Fix Controller Crashes on Win (#11792) 2020-11-03 16:54:16 -08:00
Ian Rodney c3074f559c [serve] Split out metadata for checkpointing (#11533) 2020-11-03 12:41:24 -08:00
Philipp Moritz 39ce0eadbe Ray PDB support (#11739) 2020-11-03 09:49:23 -08:00
Ameer Haj Ali 08e0e8311a [autoscaler] Fixing AWS instance types autofill (#11758) 2020-11-03 09:34:14 -08:00
Kai Fricke f7b19c41e3 [tune] logger refactor part 1: move classes and utilities to own files (#11746)
* [tune] logger refactor part 1: move classes and utilities to own files

* Fix circular dependency

* Remove uneeded pretty print copy

* Apply suggestions from code review
2020-11-03 07:48:09 -08:00
Maksim Smolin 0a6d24a727 [cli] Remove the deprecated old_style logging calls (#10776)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-02 23:40:18 -08:00
Stephanie Wang 0ba777af99 [Object spilling] Add policy to automatically spill objects on OutOfMemory (#11673) 2020-11-02 12:42:02 -08:00
Ameer Haj Ali 8d74a04a42 [autoscaler] Flag flip for resource_demand_scheduler should take into account queue (#11615) 2020-11-02 12:41:22 -08:00
Ian Rodney 171e02c684 [serve] re-enable serve-controller-crash test (#11579) 2020-11-02 11:22:09 -08:00
Eric Liang 48dee789b3 Add random actor placement; fix cancellation callback; update test skips (#11684) 2020-10-30 18:36:35 -07:00
DK.Pino b10871a1f5 [Core]Fix get workrer table bug (#11516)
* fix get_worker_table bug

* fix lint

* fix comment

* remove actor table

* fix comment

* fix get alive worker

* remove unused python import
2020-10-30 14:48:29 -07:00
SangBin Cho 71c5089854 [Object Spilling] Initial Iteration of S3 adapter. (#11379)
* Finished the first iteration.

* Removed unnecessary code.

* Smartopen impl.

* Make sure tests passed.

* Addressed code review.

* Addressed code review.

* Fix issues.

* Fix issues.
2020-10-30 14:47:07 -07:00
Ameer Haj Ali 7aade469d0 [autoscaler] fix the autoscaling bug for continuously launching failed nodes (#11714) 2020-10-30 14:12:06 -07:00
Gekho457 8816d34541 Kubernetes rsync verbosity fixed (#11716) 2020-10-30 14:03:42 -07:00
Alan Guo 3c109b45aa Disable validation of cluster config on the cluster to allow for cluster configs with new properties. (#11693) 2020-10-30 14:02:00 -07:00
Eric Liang f9f372c327 [autoscaler] Clean up monitoring loop code (#11677) 2020-10-30 13:48:43 -07:00
SangBin Cho 6e2a1eac36 [Placement Group] Placement group automatic cleanup. (#11546)
* In progress. Done with all placement group manager code.

* It is working with job.

* Finished detached actor implementation.

* Fix minor issue.

* In progress.

* Addressed code review.

* Addressed code review.

* Addressed code reivew.

* Fix a build error.
2020-10-30 10:55:43 -07:00
architkulkarni 4175569d96 [Core] Add option to override environment variables for tasks and actors (#11619) 2020-10-29 14:22:44 -05:00
Simon Mo e82ff08b0c Fix asyncio plasma integration in cluster mode (#11665) 2020-10-29 11:53:10 -07:00
Simon Mo 46afec5660 Mute asyncio warning for Serve (#11682) 2020-10-28 17:05:42 -07:00
Kai Fricke ba63ded311 [tune] better error when metric or mode unset in search algorithms (#11646) 2020-10-28 13:17:59 -07:00
Richard Liaw 58891551d3 [tune] make tests faster + fix flaky test (#10264) 2020-10-28 13:14:54 -07:00
Gekho457 9e63f7ccc3 [autoscaler/k8s] ray up 409 error fix (#11660) 2020-10-28 14:19:57 -05:00
Tao Wang 1d5694ddea [GCS]Use direct getting instead of pub-sub to update load metrics in monitor.py (#11339) 2020-10-28 11:23:18 -07:00
Eric Liang c933477915 [new scheduler] Pass test_basic and add CI builds with flag on (#11635) 2020-10-28 11:02:43 -07:00
Richard Liaw 70ea1fbe30 [sgd] pin ptl to 1.0.3 (#11664)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-28 00:29:01 -07:00
fyrestone 05ad4c7499 [Dashboard] Optimize dashboard datacenter (#11391)
* Optimize dashboard datacenter

* Fix tests

* Fix tests

* Fix

* Fix CI

* python/build-wheel-macos.sh

Co-authored-by: 刘宝 <po.lb@antfin.com>
Co-authored-by: Max Fitton <maxfitton@anyscale.com>
2020-10-27 23:49:31 -07:00
yncxcw c3e246818a [Core] Fix doc string for ray.init() (#11657) 2020-10-27 18:27:22 -07:00
Ameer Haj Ali 1c40950877 [autoscaler] Add the cluster_name to docker file mounts directory prefix to make it more unique (#11600) 2020-10-27 15:33:11 -07:00
Scott Graham c4ae94d60b [autoscaler] Azure deployment fixes (#11613)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-27 15:27:18 -07:00
Richard Liaw 293483ed0b [k8s][minor] fix error handling (#11653)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-27 15:24:07 -07:00
Ian Rodney 3ce852d345 [docker] Synchronize Torch for Tune & RLlib (#11637) 2020-10-27 18:37:25 +01:00
Sven Mika d9f1874e34 [RLlib] Minor fixes (torch GPU bugs + some cleanup). (#11609) 2020-10-27 10:00:24 +01:00