Commit Graph

3382 Commits

Author SHA1 Message Date
Kai Yang 21fcee28f9 [Java] Simplify Ray.init() by invoking ray start internally (#10762) 2020-12-04 14:33:45 +08:00
Eric Liang 8cebe1e79c [autoscaler] Fix worker capping fifo test in new scheduler (#12512) 2020-12-03 17:21:35 -08:00
Richard Liaw 1ce5e0e99f [tune] Fix file descriptor leak by syncer (#12590) 2020-12-03 13:39:04 -08:00
Eric Liang 36e46ed923 Revert "[autoscaler/k8s] Use ray node's HOME in Kubernetes command runner. (#12417)" (#12607)
This reverts commit f669830de6.
2020-12-03 12:57:59 -08:00
Simon Mo 1f7a4806ff [Serve] Fix Flask Request self reference (#12560)
* [Serve] Fix Flask Request self reference

* Working flag

* Fix
2020-12-03 10:45:04 -06:00
Gekho457 f669830de6 [autoscaler/k8s] Use ray node's HOME in Kubernetes command runner. (#12417) 2020-12-03 10:43:16 -06:00
fangfengbin ff34563539 [PlacementGroup]Fix bug that kill workers mistakenly when gcs restarts (#12568) 2020-12-03 17:50:48 +08:00
Richard Liaw 7c58a85fed [tune] fix Tensorboard file descriptor leak (#12425) 2020-12-03 00:06:54 -08:00
Eric Liang 62fbe63f34 Disable flaky test test_delete_objects_multi_node (#12584)
* update

* fix

* update
2020-12-02 19:19:12 -08:00
Edward Oakes 8058c1eb54 [serve] Add option to not start HTTP servers (#11627) 2020-12-02 16:49:34 -06:00
Kaushik B 7422abddb4 [tune] trim kwargs in shim instantiation functions (#12544) 2020-12-02 12:07:00 -08:00
Richard Liaw da42bf29d0 [tune] horovod release test (#12495) 2020-12-02 12:04:54 -08:00
Stephanie Wang 443339ab19 [core] Move out-of-memory handling into the plasma store and support async object creation (#12186)
* Refactor to extract creation request queue

* timer on oom

* move timer out

* Move evict_if_full and on_store_full into plasma store

* Remove client-side code

* revert

* Distinguish between transient and permanent OOM delays

* update

* Move out create request queue, unit test

* unit test

* Fix max retries

* test

* Do not pin restored objects

* First pass to add polling requests, unit test passes

* worker plasma client retries plasma requests

* cleanup

* Clean up after disconnected clients, check memory leaks

* Support immediate requests in request queue

* Option to try creating immediately

* lint

* Fix build, address comments

* doc

* fixes

* debug travis

* debug

* debug

* debug

* debug

* Revert "debug"

This reverts commit 6bf2f6ee5640e71630c4aecdb7ebf54911ea32db.

Revert "debug"

This reverts commit 73017099c9b06cdaae1217bf0e0f4d23ed68a9e5.

Revert "debug"

This reverts commit 5a155529e28cee9461a598b0cdf7b6a3cc194c93.

Revert "debug"

This reverts commit b50c2101afd45d4cf663daae857bfe1b40387703.

Revert "debug travis"

This reverts commit 012b8721dedf9bca46294ae75eee2815b160368b.

* Skip if new scheduler enabled

* error message

* merge
2020-12-02 13:25:54 -05:00
Richard Liaw a21523c709 [tune/core] serialization debugging utility (#12142)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-12-02 00:52:17 -08:00
Kai Fricke 63b85df828 [xgb] update docs (#12549) 2020-12-01 23:17:23 -08:00
Simon Mo e428134137 [Hotfix] Pin llvmlite for windows build (#12559) 2020-12-01 19:43:08 -08:00
Siyuan (Ryans) Zhuang 615f974313 Add context for "test_buffer_alignment" (#12519) 2020-12-01 19:27:14 -08:00
Sven Mika 19c8033df2 [RLlib] Fix most remaining RLlib algos for running with trajectory view API. (#12366)
* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* LINT and fixes.
MB-MPO and MAML not working yet.

* wip

* update

* update

* rmeove

* remove dep

* higher

* Update requirements_rllib.txt

* Update requirements_rllib.txt

* relpos

* no mbmpo

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-12-01 17:41:10 -08:00
Richard Liaw 4dc16730a7 [tune] with-params fix (#12522) 2020-12-01 16:47:03 -08:00
Simon Mo 7022278ce9 Deflake Serve tests (#12542) 2020-12-01 13:42:21 -08:00
Barak Michener 6412dfaf38 [ray_client] actors v0 (#12388) 2020-12-01 13:12:08 -08:00
SangBin Cho 0e892908f7 [Object Spilling] Delete spilled objects when references are gone out of scope. (#12341) 2020-12-01 13:10:39 -08:00
Simon Mo ef1b0c13c3 Async Future Throws RayError as well (#12419) 2020-12-01 13:07:43 -08:00
Richard Liaw bdf8ad3b5a fix (#12528)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-01 09:58:12 -08:00
Simon Mo f596113fc7 [Core] Actor Retries Out of Order Tasks on Restart (#12338) 2020-12-01 09:35:54 -08:00
SangBin Cho f6f3cc9af1 [Core]Remove checkpoint table (#12235)
* Delete an actor entry from node manager.

* Remove checkpoint table

* remote checkpoint interface

* remove checkpoint interface

* fix ExitActorTest

Co-authored-by: chaokunyang <shawn.ck.yang@gmail.com>
2020-12-01 08:58:36 -08:00
Sven Mika 9021f15b2a [RLlib] Fix setup-dev.py error when creating a softlink for new_dashboard. (#12442) 2020-12-01 11:46:59 +01:00
Edward Oakes e72147de38 Fix Serve typo (#12524) 2020-11-30 23:15:42 -08:00
Eric Liang fd8ae0697b [autoscaler] Fix test heartbeats single test (#12513)
* update

* update

* update
2020-11-30 21:24:45 -08:00
Amog Kamsetty f9a99f20dd Revert "Re-Revert "[Core] zero-copy serializer for pytorch (#12344)" (#12478)" (#12515)
This reverts commit 3f22448834.
2020-11-30 19:05:55 -08:00
SangBin Cho 8223a33bff [Logging] Log rotation on all components (#12101)
* In Progress.

* Done.

* Fix the issue.

* Add wait for condition because logs are not written right away now.

* debug string.

* lint.

* Fix flaky test.

* Fix issues.

* Fix test.

* lint.
2020-11-30 19:03:55 -08:00
Ian Rodney e422ace053 [serve] Create CurrentState & GoalState (#12369) 2020-11-30 17:34:30 -08:00
Eric Liang 234df9091e [autoscaler] Try to improve the request_resources() documentation (#12465) 2020-11-30 16:03:30 -08:00
Richard Liaw 9ce7ad17fd [tune] remove some bottlenecks in trialrunner (#12476) 2020-11-30 14:54:25 -08:00
Siyuan (Ryans) Zhuang 3f22448834 Re-Revert "[Core] zero-copy serializer for pytorch (#12344)" (#12478)
* [Core] zero-copy serializer for pytorch (#12344)

* zero-copy serializer for pytorch

* address possible bottleneck

* add tests & device support

(cherry picked from commit 0a505ca83d)

* add environmental variables

* update doc
2020-11-30 11:43:03 -08:00
Sven Mika bb03e2499b [RLlib] PyBullet Env native support via env str-specifier (if installed). (#12209) 2020-11-30 12:41:24 +01:00
Tao Wang b85c6abc3e Rename fields/variables from client id to node id (#12457) 2020-11-30 14:33:36 +08:00
SangBin Cho 3964defbe1 [Logging] Fix tensorflow logging issue. (#12225)
* in progress.

* ip

* In Progress

* done.

* fix lint.

* Addressed code review

* Addressed code review.
2020-11-29 22:16:52 -08:00
SangBin Cho 91d54ef621 [Core] Remove actor arg from executor to allow users to specify actor… (#12239)
* [Core] Remove actor arg from executor to allow users to specify actor arg in their Actor.remote.

* Addressed code review.
2020-11-29 22:15:48 -08:00
chaokunyang 17a6b9bbe7 Fix not cp jars (#12456) 2020-11-30 13:53:09 +08:00
Philipp Moritz cf73ccddae Allow more fields for object metadata (#12484) 2020-11-29 21:50:18 -08:00
Alex Wu f1cc33a6a6 Actor resource backlog hotfix (#12471)
* prepare implemented

* works?

* deflek

* git

* deflek round 2

* .

* improve the test

Co-authored-by: Alex <alex@anyscale.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-11-29 20:55:50 -08:00
Amog Kamsetty 8a406e1f9a [SGD] Add PTL Docs (#12440)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-28 10:09:38 -08:00
Kai Fricke 1d0ade1b93 Revert "[Core] zero-copy serializer for pytorch (#12344)" (#12469)
This reverts commit 0a505ca8
2020-11-28 10:06:02 -08:00
Eric Liang 569eee5e71 Enable more new scheduler tests (#12421) 2020-11-27 16:10:38 -08:00
Richard Liaw 7c009d22cf [docs] Add xgboost_ray to docs (#12184)
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2020-11-27 11:36:56 -08:00
Siyuan (Ryans) Zhuang 0a505ca83d [Core] zero-copy serializer for pytorch (#12344)
* zero-copy serializer for pytorch

* address possible bottleneck

* add tests & device support
2020-11-26 16:09:54 -08:00
Amog Kamsetty e0573df337 [CI] Fix windows build (#12415)
* attempt to fix windows

* fix syntax

* try again

* try again

* try again

* Revert "[ray_client] Support calling functions from other functions and correct the tests (#12141)"

This reverts commit 4066056a0d.

* Revert

* Revert "Revert "[ray_client] Support calling functions from other functions and correct the tests (#12141)""

This reverts commit bb27b87b6c8d780ad796f4d4aeaa20113c8eca79.

* please work

* works

* fix
2020-11-26 10:52:11 -08:00
Sven Mika c1d7826bb7 [RLlib] Move pettingzoo from requirements.txt into requirements_rllib.txt (#12400) 2020-11-26 19:30:35 +01:00
Ameer Haj Ali 9ccf5f6ccc [ray client] add metadata and secure options to Worker. (#12409) 2020-11-25 17:48:13 -08:00