Commit Graph

6534 Commits

Author SHA1 Message Date
Simon Mo f596113fc7 [Core] Actor Retries Out of Order Tasks on Restart (#12338) 2020-12-01 09:35:54 -08:00
SangBin Cho f6f3cc9af1 [Core]Remove checkpoint table (#12235)
* Delete an actor entry from node manager.

* Remove checkpoint table

* remote checkpoint interface

* remove checkpoint interface

* fix ExitActorTest

Co-authored-by: chaokunyang <shawn.ck.yang@gmail.com>
2020-12-01 08:58:36 -08:00
Sven Mika 9021f15b2a [RLlib] Fix setup-dev.py error when creating a softlink for new_dashboard. (#12442) 2020-12-01 11:46:59 +01:00
Sven Mika 3ad9365e1d [RLlib] Attention Net prep PR #2: Smaller cleanups. (#12449) 2020-12-01 08:21:45 +01:00
Edward Oakes e72147de38 Fix Serve typo (#12524) 2020-11-30 23:15:42 -08:00
Eric Liang fd8ae0697b [autoscaler] Fix test heartbeats single test (#12513)
* update

* update

* update
2020-11-30 21:24:45 -08:00
Amog Kamsetty 16ca748454 [CI] Use legacy resolver for some pip imports (#12517) 2020-11-30 21:18:21 -08:00
Amog Kamsetty f9a99f20dd Revert "Re-Revert "[Core] zero-copy serializer for pytorch (#12344)" (#12478)" (#12515)
This reverts commit 3f22448834.
2020-11-30 19:05:55 -08:00
SangBin Cho 8223a33bff [Logging] Log rotation on all components (#12101)
* In Progress.

* Done.

* Fix the issue.

* Add wait for condition because logs are not written right away now.

* debug string.

* lint.

* Fix flaky test.

* Fix issues.

* Fix test.

* lint.
2020-11-30 19:03:55 -08:00
Max Fitton 2708b3abbc [Dashboard][Bug] Fix duplicate node total rows in dashboard (#12410)
* Fix duplicate node total rows in dashboard by changing the react key of the NodeTotalRow component from the node IP to the node ID (node IP can be duplicated in the case of docker).

* simplify a piece of test code and fix a flaky time out

* lint
2020-11-30 18:43:09 -08:00
Ian Rodney e422ace053 [serve] Create CurrentState & GoalState (#12369) 2020-11-30 17:34:30 -08:00
Eric Liang 234df9091e [autoscaler] Try to improve the request_resources() documentation (#12465) 2020-11-30 16:03:30 -08:00
Richard Liaw 9ce7ad17fd [tune] remove some bottlenecks in trialrunner (#12476) 2020-11-30 14:54:25 -08:00
Ian Rodney f5fe3794c8 [Docker] Uninstall Typing (#12500) 2020-11-30 14:12:57 -08:00
Siyuan (Ryans) Zhuang 3f22448834 Re-Revert "[Core] zero-copy serializer for pytorch (#12344)" (#12478)
* [Core] zero-copy serializer for pytorch (#12344)

* zero-copy serializer for pytorch

* address possible bottleneck

* add tests & device support

(cherry picked from commit 0a505ca83d)

* add environmental variables

* update doc
2020-11-30 11:43:03 -08:00
Sven Mika bb03e2499b [RLlib] PyBullet Env native support via env str-specifier (if installed). (#12209) 2020-11-30 12:41:24 +01:00
Tao Wang b85c6abc3e Rename fields/variables from client id to node id (#12457) 2020-11-30 14:33:36 +08:00
SangBin Cho 3964defbe1 [Logging] Fix tensorflow logging issue. (#12225)
* in progress.

* ip

* In Progress

* done.

* fix lint.

* Addressed code review

* Addressed code review.
2020-11-29 22:16:52 -08:00
SangBin Cho 91d54ef621 [Core] Remove actor arg from executor to allow users to specify actor… (#12239)
* [Core] Remove actor arg from executor to allow users to specify actor arg in their Actor.remote.

* Addressed code review.
2020-11-29 22:15:48 -08:00
chaokunyang 17a6b9bbe7 Fix not cp jars (#12456) 2020-11-30 13:53:09 +08:00
Philipp Moritz cf73ccddae Allow more fields for object metadata (#12484) 2020-11-29 21:50:18 -08:00
Alex Wu f1cc33a6a6 Actor resource backlog hotfix (#12471)
* prepare implemented

* works?

* deflek

* git

* deflek round 2

* .

* improve the test

Co-authored-by: Alex <alex@anyscale.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-11-29 20:55:50 -08:00
Sven Mika fb318addcb [RLlib] Curiosity exploration module: tf/tf2.x/tf-eager support. (#11945) 2020-11-29 12:31:24 +01:00
Micah Yong a537b852e6 [docs][core] Documentation improvement in master/walkthrough.html (#12473) 2020-11-28 20:36:01 -08:00
Amog Kamsetty 8a406e1f9a [SGD] Add PTL Docs (#12440)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-28 10:09:38 -08:00
Pierre TASSEL 60a545ab57 [RLLib] Fix HyperOptSearch tuple to list conversion (#12462)
Co-authored-by: Sumanth Ratna <sumanthratna@gmail.com>
2020-11-28 10:07:54 -08:00
Kai Fricke 1d0ade1b93 Revert "[Core] zero-copy serializer for pytorch (#12344)" (#12469)
This reverts commit 0a505ca8
2020-11-28 10:06:02 -08:00
Eric Liang 9ad0f173d6 Prestart workers to avoid slow start when multi-tenancy is enabled (#12430) 2020-11-27 21:47:46 -08:00
Sven Mika 0df55a139c [RLlib] Attention Net prep PR #1: Smaller cleanups. (#12447)
* WIP.

* Fix.

* Fix.

* Fix.
2020-11-27 16:25:47 -08:00
Eric Liang 569eee5e71 Enable more new scheduler tests (#12421) 2020-11-27 16:10:38 -08:00
Richard Liaw affb0b776c Fix github issue template (#12464)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-27 14:13:29 -08:00
Richard Liaw 7c009d22cf [docs] Add xgboost_ray to docs (#12184)
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2020-11-27 11:36:56 -08:00
Siyuan (Ryans) Zhuang 0a505ca83d [Core] zero-copy serializer for pytorch (#12344)
* zero-copy serializer for pytorch

* address possible bottleneck

* add tests & device support
2020-11-26 16:09:54 -08:00
Amog Kamsetty e0573df337 [CI] Fix windows build (#12415)
* attempt to fix windows

* fix syntax

* try again

* try again

* try again

* Revert "[ray_client] Support calling functions from other functions and correct the tests (#12141)"

This reverts commit 4066056a0d.

* Revert

* Revert "Revert "[ray_client] Support calling functions from other functions and correct the tests (#12141)""

This reverts commit bb27b87b6c8d780ad796f4d4aeaa20113c8eca79.

* please work

* works

* fix
2020-11-26 10:52:11 -08:00
Sven Mika c1d7826bb7 [RLlib] Move pettingzoo from requirements.txt into requirements_rllib.txt (#12400) 2020-11-26 19:30:35 +01:00
Sven Mika 6475297bd3 [RLlib] Torch LR schedule not working. Fix and added test case. (#12396) 2020-11-26 13:14:11 +01:00
fangfengbin d5215745e4 [PlacementGroup] Introduce GcsResourceManager and avoid copying resources when scheduling placement groups (#12253) 2020-11-26 11:21:58 +08:00
Edward Oakes 90d7863eb3 Document that ray install-nightly command doesn't work for 1.0.1.post1 and older (#12429) 2020-11-25 19:55:12 -06:00
Ameer Haj Ali 9ccf5f6ccc [ray client] add metadata and secure options to Worker. (#12409) 2020-11-25 17:48:13 -08:00
Richard Liaw 751e13a41e [docs] redirect to discourse (#12427) 2020-11-25 17:10:10 -08:00
Richard Liaw 323941c745 [tune] fix pbt flakey test (#12418) 2020-11-25 16:58:37 -08:00
Eric Liang f6a5b733d5 Remove flaky object manager test that's no longer needed 2020-11-25 12:45:47 -08:00
Ian Rodney 679492a235 [serve] Use Long Polling in Backend Worker (#12093) 2020-11-25 12:11:38 -08:00
Ian Rodney ca6c2b2442 [docker] Use cuDNN7, not 8 (#12375) 2020-11-25 12:06:53 -08:00
SangBin Cho 753cda2f28 [Dashboard] Delete old dashboard (#12144)
* Delete old dashboard from repo.

* Delete old dashboard from repo. 2
2020-11-25 11:31:02 -08:00
Sven Mika b7dbbfbf41 [RLlib] Issue 11591: SAC loss does not use PR-weights in critic loss term. (#12394)
* WIP.

* Fix and LINT.
2020-11-25 11:28:46 -08:00
Sven Mika 592c161032 [RLlib] Issue 12118: LSTM prev-a/r should be separately configurable. Fix missing prev-a one-hot encoding. (#12397)
* WIP.

* Fix and LINT.
2020-11-25 11:27:46 -08:00
Sven Mika 841d93d366 [RLlib] Issue 12233 shared tf layers example not really shared (only works for tf1.x, not tf2.x). (#12399) 2020-11-25 11:27:19 -08:00
Sven Mika 95175a822f [RLlib] Issue 11974: Traj view API next-action (shift=+1) not working. (#12407)
* WIP.

* Fix and LINT.
2020-11-25 11:26:29 -08:00
Max Fitton 2e95552f0c [Dashboard] Defensive change to make sure we do not iterate over "None" in the case that workers is not present in node physical stats for a given node (#12358) 2020-11-25 11:06:45 -08:00