Commit Graph

6727 Commits

Author SHA1 Message Date
Alex Wu 404161a3ff [Autoscaler/Core] Remove autoscaler spam (#12952) 2020-12-18 18:22:45 -08:00
Kai Yang ac5ea2c13d [Java] Fix output parsing in RunManager (#12968)
* Fix output parsing in RunManager

* change log level

Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-12-18 18:22:12 -08:00
Eric Liang 6ece291f35 Clean up block/unblock handling of resources in new scheduler (#12963) 2020-12-18 16:00:54 -08:00
Eric Liang 3e492a79ec Increase the number of unique bits for actors to avoid handle collisions (#12894) 2020-12-18 15:59:03 -08:00
Edward Oakes 3521e74f3a [serve] Support for imported backends (#12923) 2020-12-18 15:49:24 -06:00
Eric Liang 92812f2e8a Implement resource deadlock detection for new scheduler (#12961) 2020-12-18 12:17:54 -08:00
Barak Michener 5cfa1934e4 [ray_client]: Implement object retain/release and Data Streaming API (#12818) 2020-12-18 11:47:38 -08:00
Kai Fricke 55ae567f7a [tune] Fix and enable SigOpt tests (#12877)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-18 01:33:12 -08:00
Gekho457 bff50cfc37 [k8s] Read gpu resources properly (#12942)
* Read gpu resources properly

* Comments and docstrings

* Comment formatting
2020-12-18 01:32:12 -08:00
Kai Fricke 426f8a8d15 [tune] Fix tutorial training on GPU (#12914) 2020-12-18 01:31:40 -08:00
fangfengbin a442cd17e0 [GCS]Optimize gcs client reconnection (#12878)
* [GCS]Optimize gcs client reconnection

* fix review comment

* fix review comment

* add part code

Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-12-17 21:57:37 -08:00
dHannasch cfefd7c70e Test PingPort (#12954)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-17 21:15:42 -08:00
DK.Pino 6404f1e609 [Placement Group][New scheduler] New scheduler pg implementation (#12910) 2020-12-18 11:56:45 +08:00
Tao Wang 17152c84a7 [Tiny]Print raylet info after register (#12566) 2020-12-18 11:22:13 +08:00
Farzan Taj 53378170e0 [tune] Change pickle to ray.cloudpickle -- support large models (#12958)
* Change pickle to ray.cloudpickle

* Change pickle import to ray.cloudpickle
2020-12-17 19:17:08 -08:00
Kai Fricke 3d72000826 [tune] Add points_to_evaluate to BasicVariantGenerator (#12916)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-17 19:16:03 -08:00
Sven Mika 124c8318a8 [RLlib] Fix broken test_distributions.py (test_categorical) (#12915) 2020-12-17 17:44:26 -06:00
dHannasch d747071dd9 Test shard_context on already-created boost::asio::io_service. (#12917) 2020-12-17 14:26:30 -08:00
Edward Oakes c7a59b239f Remove unused endpoints_to_remove (#12946) 2020-12-17 15:04:11 -06:00
Gekho457 82f9c7014e [K8s] Retry getting home directory in command runner. (#12925) 2020-12-17 09:41:48 -08:00
Allen e6cb4f4bd7 [Core] Add log of address and port (#12908)
Co-authored-by: Allen Yin <allenyin@anyscale.io>
2020-12-17 00:25:29 -08:00
Yi Cheng 40032541dc [core] Introduce fetch_local to ray.wait (#12526) 2020-12-16 23:44:28 -08:00
Tao Wang 12231ec2a6 Optimize heartbeat manager initialization (#12911) 2020-12-17 14:24:23 +08:00
SangBin Cho 057687e534 [New Scheduler] Fix test_failure.py by supporting infeasible tasks (#12738)
* Fix the first issue.

* ip

* In Progress.

* In progress.

* done.

* Remove unnecessary logs.

* Addressed code review + fix some test failures.

* Try fixing issues.

* Fix issues.

* Fix test issues.

* Fix issues.

* done.
2020-12-16 21:27:50 -08:00
Philipp Moritz ad036fd564 Fix continue for debugger (#12862) 2020-12-16 16:09:13 -08:00
Amog Kamsetty dd522a71a1 [SGD] Disable Elastic Training by default when using with Tune (#12927) 2020-12-16 15:37:44 -08:00
Alex Wu 8b783ecafa Fix pull manager retry (#12907) 2020-12-16 14:18:43 -08:00
Ameer Haj Ali c677b9e201 [autoscaler] Fix flaky autoscaler test (#12918) 2020-12-16 14:18:27 -08:00
Edward Oakes aedcf0c9d9 Disable test_distributions (#12919) 2020-12-16 14:17:49 -08:00
Edward Oakes fdb4c6eb1c Better message for too little /dev/shm memory (#12896) 2020-12-16 10:30:20 -06:00
fangfengbin 91878d18b5 [PlacementGroup]Fix placement group wait api disorder bug (#12827)
* [PlacementGroup]Fix placment group wait api disorder bug

* fix review comment

* fix review comment

* fix review comment

* fix review comments

* increase num_heartbeats_timeout

Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-12-16 18:45:53 +08:00
Eric Liang 7ff314a5df [New scheduler] Also unsubscribe get dependencies on unblock 2020-12-15 20:29:44 -08:00
Richard Liaw a7caa14d3d [k8s] avoid bad error messages (#12871) 2020-12-15 15:00:02 -08:00
Edward Oakes f4b5a8b2f7 [serve] Re-enable test_failure.py (#12891) 2020-12-15 16:02:04 -06:00
Richard Liaw 87cf1a97e5 [core] recover startup logs (#12876) 2020-12-15 13:49:45 -08:00
Edward Oakes 6795d7c75c [serve] Fix flaky test_api.py::test_backend_user_config (#12892) 2020-12-15 15:35:30 -06:00
Kai Fricke ea1228074d [tune] enable points_to_eval for all search algorithms (#12790)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-15 11:51:53 -08:00
Simon Mo fdd85e3af4 [Serve] Add benchmark for async handles (#12858) 2020-12-15 11:21:51 -08:00
Alex Wu 0031723ace [New scheduler] Object spilling (#12857) 2020-12-15 11:05:38 -08:00
Edward Oakes cde711aaf1 Revert "[RLLib] Execution-Folder Type Annotations (#12760)" (#12886)
This reverts commit becca1424d.
2020-12-15 11:03:02 -08:00
architkulkarni ba12fb1451 Fix for RLIMIT patch (#12882)
Implement new soft limit introduced by https://github.com/ray-project/ray/pull/12853.
2020-12-15 10:38:46 -08:00
SangBin Cho de7848231c [Doc] Fix placement group doc (#12875) 2020-12-15 10:36:51 -08:00
Edward Oakes 261b2f9053 Check for raylet PID as ppid in dashboard agent fate-sharing (#12867) 2020-12-15 12:13:11 -06:00
Max Fitton e077bc4206 [Release] Bump master to 1.2.0 for 1.1.0 release (#12856) 2020-12-15 09:40:26 -08:00
Simon Mo b291dd4486 [Metrics] Call GetMeasureDoubleByName to prevent override (#12860) 2020-12-15 09:39:39 -08:00
Gekho457 5a142d5bd6 Use nightly images in all kubernetes examples. (#12868) 2020-12-14 20:49:41 -08:00
fangfengbin 43b9259d40 [GCS]GCS resource manager support scheduling resource (#12780)
* add part code

* add part code

* fix review comments

* rebase master

* add part code

* add part code

* fix review comments

* add part code

* fix code style

* fix ut bug

* fix ut bug

* fix review comments

* fix review comment

Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-12-15 10:27:55 +08:00
Gekho457 8cebe5cbe9 [docs][autoscaler][k8s][minor] quotes #12866 2020-12-14 18:24:13 -08:00
Gekho457 44f5be04ca [autoscaler][k8s][doc][minor] Fix typo in k8s doc. (#12865) 2020-12-14 17:30:43 -08:00
Simon Mo b56db5a22f [Serve] Wait for actor name to be cleaned up (#12215) 2020-12-14 15:09:43 -08:00