Commit Graph

3481 Commits

Author SHA1 Message Date
Ian Rodney d6e243ad46 [serve] Refactor to full control loop design (#12537) 2020-12-20 13:03:57 -06:00
Richard Liaw 038a50af52 [tune] skopt fix-extra-import (#12970)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-20 01:01:09 -08:00
Amog Kamsetty 4c63917439 [Queue] Add options and shutdown to Queue (#12932)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-20 00:42:21 -08:00
Amog Kamsetty 51139ed37c [SGD] Fix process group timeout units (#12477) 2020-12-19 21:46:33 -08:00
Dmitri Gekhtman 4832b39066 Suggest mounting into home. Note non-root user. (#12987) 2020-12-19 16:09:24 -08:00
Eric Liang 64c97d25d3 Enable by default new scheduler (#12735) 2020-12-19 13:22:24 -08:00
Amog Kamsetty 5d3c9c8861 [Tune] Mlflow Integration (#12840)
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-19 00:40:02 -08:00
Eric Liang 5d987f5988 Revert "Increase the number of unique bits for actors to avoid handle collisions (#12894)" (#12988)
This reverts commit 3e492a79ec.
2020-12-18 23:51:44 -08:00
SangBin Cho 9d939e6674 [Object Spilling] Implement level triggered logic to make streaming shuffle work + additional cleanup (#12773) 2020-12-18 19:31:14 -08:00
Alex Wu 404161a3ff [Autoscaler/Core] Remove autoscaler spam (#12952) 2020-12-18 18:22:45 -08:00
Kai Yang ac5ea2c13d [Java] Fix output parsing in RunManager (#12968)
* Fix output parsing in RunManager

* change log level

Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-12-18 18:22:12 -08:00
Eric Liang 6ece291f35 Clean up block/unblock handling of resources in new scheduler (#12963) 2020-12-18 16:00:54 -08:00
Eric Liang 3e492a79ec Increase the number of unique bits for actors to avoid handle collisions (#12894) 2020-12-18 15:59:03 -08:00
Edward Oakes 3521e74f3a [serve] Support for imported backends (#12923) 2020-12-18 15:49:24 -06:00
Eric Liang 92812f2e8a Implement resource deadlock detection for new scheduler (#12961) 2020-12-18 12:17:54 -08:00
Barak Michener 5cfa1934e4 [ray_client]: Implement object retain/release and Data Streaming API (#12818) 2020-12-18 11:47:38 -08:00
Kai Fricke 55ae567f7a [tune] Fix and enable SigOpt tests (#12877)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-18 01:33:12 -08:00
Gekho457 bff50cfc37 [k8s] Read gpu resources properly (#12942)
* Read gpu resources properly

* Comments and docstrings

* Comment formatting
2020-12-18 01:32:12 -08:00
Kai Fricke 426f8a8d15 [tune] Fix tutorial training on GPU (#12914) 2020-12-18 01:31:40 -08:00
DK.Pino 6404f1e609 [Placement Group][New scheduler] New scheduler pg implementation (#12910) 2020-12-18 11:56:45 +08:00
Farzan Taj 53378170e0 [tune] Change pickle to ray.cloudpickle -- support large models (#12958)
* Change pickle to ray.cloudpickle

* Change pickle import to ray.cloudpickle
2020-12-17 19:17:08 -08:00
Kai Fricke 3d72000826 [tune] Add points_to_evaluate to BasicVariantGenerator (#12916)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-17 19:16:03 -08:00
Edward Oakes c7a59b239f Remove unused endpoints_to_remove (#12946) 2020-12-17 15:04:11 -06:00
Gekho457 82f9c7014e [K8s] Retry getting home directory in command runner. (#12925) 2020-12-17 09:41:48 -08:00
Yi Cheng 40032541dc [core] Introduce fetch_local to ray.wait (#12526) 2020-12-16 23:44:28 -08:00
SangBin Cho 057687e534 [New Scheduler] Fix test_failure.py by supporting infeasible tasks (#12738)
* Fix the first issue.

* ip

* In Progress.

* In progress.

* done.

* Remove unnecessary logs.

* Addressed code review + fix some test failures.

* Try fixing issues.

* Fix issues.

* Fix test issues.

* Fix issues.

* done.
2020-12-16 21:27:50 -08:00
Philipp Moritz ad036fd564 Fix continue for debugger (#12862) 2020-12-16 16:09:13 -08:00
Amog Kamsetty dd522a71a1 [SGD] Disable Elastic Training by default when using with Tune (#12927) 2020-12-16 15:37:44 -08:00
Alex Wu 8b783ecafa Fix pull manager retry (#12907) 2020-12-16 14:18:43 -08:00
Ameer Haj Ali c677b9e201 [autoscaler] Fix flaky autoscaler test (#12918) 2020-12-16 14:18:27 -08:00
Edward Oakes fdb4c6eb1c Better message for too little /dev/shm memory (#12896) 2020-12-16 10:30:20 -06:00
fangfengbin 91878d18b5 [PlacementGroup]Fix placement group wait api disorder bug (#12827)
* [PlacementGroup]Fix placment group wait api disorder bug

* fix review comment

* fix review comment

* fix review comment

* fix review comments

* increase num_heartbeats_timeout

Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-12-16 18:45:53 +08:00
Richard Liaw a7caa14d3d [k8s] avoid bad error messages (#12871) 2020-12-15 15:00:02 -08:00
Edward Oakes f4b5a8b2f7 [serve] Re-enable test_failure.py (#12891) 2020-12-15 16:02:04 -06:00
Richard Liaw 87cf1a97e5 [core] recover startup logs (#12876) 2020-12-15 13:49:45 -08:00
Edward Oakes 6795d7c75c [serve] Fix flaky test_api.py::test_backend_user_config (#12892) 2020-12-15 15:35:30 -06:00
Kai Fricke ea1228074d [tune] enable points_to_eval for all search algorithms (#12790)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-15 11:51:53 -08:00
Simon Mo fdd85e3af4 [Serve] Add benchmark for async handles (#12858) 2020-12-15 11:21:51 -08:00
Alex Wu 0031723ace [New scheduler] Object spilling (#12857) 2020-12-15 11:05:38 -08:00
architkulkarni ba12fb1451 Fix for RLIMIT patch (#12882)
Implement new soft limit introduced by https://github.com/ray-project/ray/pull/12853.
2020-12-15 10:38:46 -08:00
Max Fitton e077bc4206 [Release] Bump master to 1.2.0 for 1.1.0 release (#12856) 2020-12-15 09:40:26 -08:00
Simon Mo b291dd4486 [Metrics] Call GetMeasureDoubleByName to prevent override (#12860) 2020-12-15 09:39:39 -08:00
Gekho457 5a142d5bd6 Use nightly images in all kubernetes examples. (#12868) 2020-12-14 20:49:41 -08:00
Simon Mo b56db5a22f [Serve] Wait for actor name to be cleaned up (#12215) 2020-12-14 15:09:43 -08:00
architkulkarni 231518e86f [Serve] Support basic Starlette response types (#12811) 2020-12-14 17:03:56 -06:00
Eric Liang 1eb4ac12b1 Clip RLIMIT_NOFILE increase to avoid redis failing to start on Big Sur 2020-12-14 14:05:19 -08:00
SangBin Cho 69b0bc2132 [Logging] Use file handle temporalily (#12839) 2020-12-14 11:42:44 -08:00
Gekho457 11ce1dc743 Ray cluster CRD and example CR + multi-ray-cluster operator (#12098) 2020-12-14 10:26:01 -06:00
Tao Wang 35f7d84dbe Revert heartbeat interval to keep ci stable (#12836)
* Revert heartbeat interval to keep ci stable

* fix missing one
2020-12-14 16:58:40 +08:00
Eric Squires 22c1968d62 Runing -> Running (#12826) 2020-12-13 22:23:48 -08:00