Commit Graph

6812 Commits

Author SHA1 Message Date
Sven Mika fb318addcb [RLlib] Curiosity exploration module: tf/tf2.x/tf-eager support. (#11945) 2020-11-29 12:31:24 +01:00
Micah Yong a537b852e6 [docs][core] Documentation improvement in master/walkthrough.html (#12473) 2020-11-28 20:36:01 -08:00
Amog Kamsetty 8a406e1f9a [SGD] Add PTL Docs (#12440)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-28 10:09:38 -08:00
Pierre TASSEL 60a545ab57 [RLLib] Fix HyperOptSearch tuple to list conversion (#12462)
Co-authored-by: Sumanth Ratna <sumanthratna@gmail.com>
2020-11-28 10:07:54 -08:00
Kai Fricke 1d0ade1b93 Revert "[Core] zero-copy serializer for pytorch (#12344)" (#12469)
This reverts commit 0a505ca8
2020-11-28 10:06:02 -08:00
Eric Liang 9ad0f173d6 Prestart workers to avoid slow start when multi-tenancy is enabled (#12430) 2020-11-27 21:47:46 -08:00
Sven Mika 0df55a139c [RLlib] Attention Net prep PR #1: Smaller cleanups. (#12447)
* WIP.

* Fix.

* Fix.

* Fix.
2020-11-27 16:25:47 -08:00
Eric Liang 569eee5e71 Enable more new scheduler tests (#12421) 2020-11-27 16:10:38 -08:00
Richard Liaw affb0b776c Fix github issue template (#12464)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-27 14:13:29 -08:00
Richard Liaw 7c009d22cf [docs] Add xgboost_ray to docs (#12184)
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2020-11-27 11:36:56 -08:00
Siyuan (Ryans) Zhuang 0a505ca83d [Core] zero-copy serializer for pytorch (#12344)
* zero-copy serializer for pytorch

* address possible bottleneck

* add tests & device support
2020-11-26 16:09:54 -08:00
Amog Kamsetty e0573df337 [CI] Fix windows build (#12415)
* attempt to fix windows

* fix syntax

* try again

* try again

* try again

* Revert "[ray_client] Support calling functions from other functions and correct the tests (#12141)"

This reverts commit 4066056a0d.

* Revert

* Revert "Revert "[ray_client] Support calling functions from other functions and correct the tests (#12141)""

This reverts commit bb27b87b6c8d780ad796f4d4aeaa20113c8eca79.

* please work

* works

* fix
2020-11-26 10:52:11 -08:00
Sven Mika c1d7826bb7 [RLlib] Move pettingzoo from requirements.txt into requirements_rllib.txt (#12400) 2020-11-26 19:30:35 +01:00
Sven Mika 6475297bd3 [RLlib] Torch LR schedule not working. Fix and added test case. (#12396) 2020-11-26 13:14:11 +01:00
fangfengbin d5215745e4 [PlacementGroup] Introduce GcsResourceManager and avoid copying resources when scheduling placement groups (#12253) 2020-11-26 11:21:58 +08:00
Edward Oakes 90d7863eb3 Document that ray install-nightly command doesn't work for 1.0.1.post1 and older (#12429) 2020-11-25 19:55:12 -06:00
Ameer Haj Ali 9ccf5f6ccc [ray client] add metadata and secure options to Worker. (#12409) 2020-11-25 17:48:13 -08:00
Richard Liaw 751e13a41e [docs] redirect to discourse (#12427) 2020-11-25 17:10:10 -08:00
Richard Liaw 323941c745 [tune] fix pbt flakey test (#12418) 2020-11-25 16:58:37 -08:00
Eric Liang f6a5b733d5 Remove flaky object manager test that's no longer needed 2020-11-25 12:45:47 -08:00
Ian Rodney 679492a235 [serve] Use Long Polling in Backend Worker (#12093) 2020-11-25 12:11:38 -08:00
Ian Rodney ca6c2b2442 [docker] Use cuDNN7, not 8 (#12375) 2020-11-25 12:06:53 -08:00
SangBin Cho 753cda2f28 [Dashboard] Delete old dashboard (#12144)
* Delete old dashboard from repo.

* Delete old dashboard from repo. 2
2020-11-25 11:31:02 -08:00
Sven Mika b7dbbfbf41 [RLlib] Issue 11591: SAC loss does not use PR-weights in critic loss term. (#12394)
* WIP.

* Fix and LINT.
2020-11-25 11:28:46 -08:00
Sven Mika 592c161032 [RLlib] Issue 12118: LSTM prev-a/r should be separately configurable. Fix missing prev-a one-hot encoding. (#12397)
* WIP.

* Fix and LINT.
2020-11-25 11:27:46 -08:00
Sven Mika 841d93d366 [RLlib] Issue 12233 shared tf layers example not really shared (only works for tf1.x, not tf2.x). (#12399) 2020-11-25 11:27:19 -08:00
Sven Mika 95175a822f [RLlib] Issue 11974: Traj view API next-action (shift=+1) not working. (#12407)
* WIP.

* Fix and LINT.
2020-11-25 11:26:29 -08:00
Max Fitton 2e95552f0c [Dashboard] Defensive change to make sure we do not iterate over "None" in the case that workers is not present in node physical stats for a given node (#12358) 2020-11-25 11:06:45 -08:00
ZhuSenlin dc55f6ba3a skip gcs fault tolerance test for the time being when new scheduler is enabled (#12393)
Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
2020-11-25 10:40:47 -08:00
Gekho457 2b293832e7 [Docker][minor] Update path in base-deps Dockerfile (#12391)
* Update path in base-deps Dockerfile

* development Dockerfile
2020-11-25 10:30:10 -08:00
SangBin Cho 2e4e285ef0 [Object Spilling] Fusion small objects (#12087) 2020-11-25 10:13:32 -08:00
karstenddwx 09d5413f70 [RLlib] rollout batch, handle rewards that are None (unknown) in a multi-agent env (#11858) (#11911) 2020-11-25 13:39:22 +01:00
danuo c009c178f6 [RLlib] Closes #11924: Add support for custom/ray environments in rollouts.py for agents without workers (#11926)
* Closes #11924
Formerly, rollout.py would only load environments from gym (with
gym.make() ) , if an agent without workers is employed (such as ES or
ARS). This will result in an error, if a custom environment is used. This
PR adds the possibility to load environments from the ray registry,
while maintaining the support for gym environments.

* Update rllib/rollout.py

Co-authored-by: Sven Mika <sven@anyscale.io>
2020-11-25 08:43:17 +01:00
Tomasz Wrona 82852f0ed2 [RLlib] Add ResetOnExceptionWrapper with tests for unstable 3rd party envs (#12353) 2020-11-25 08:41:58 +01:00
Ian Rodney c5845c3a4e [docker] Docker stop on each node (#12357) 2020-11-24 23:15:53 -08:00
Hao Chen 20eb217c55 Fix a zsh compatiblity issue in java/BUILD.bazel (#12199) 2020-11-25 14:49:49 +08:00
Barak Michener 4066056a0d [ray_client] Support calling functions from other functions and correct the tests (#12141)
* Add test mode and fix f calling g

* formatting

* remove unused functions

* fix tests -- which will be better in actor PR
2020-11-24 22:19:20 -08:00
Tao Wang 4dd0aa7822 [GCS]make thread number of gcs rpc server configurable (#12257) 2020-11-25 11:40:29 +08:00
Tao Wang 5d47d02f81 [GCS]add callback for RegisterSelf api, make it done first (#12252) 2020-11-25 11:36:44 +08:00
Tao Wang e025b9e788 [TEST]Move all WaitReady together (#12254) 2020-11-25 11:21:24 +08:00
Tao Wang 2af10c1b78 [GCS]Add new message ReportResourceUsage (#11848) 2020-11-25 11:18:26 +08:00
Tao Wang e1075c0a82 [GCS]Fill resource fields when re-report heartbeat after gcs restarted (#12097) 2020-11-25 11:07:02 +08:00
fangfengbin 1d909321c9 [PlacementGroup]Fix node manager release unused bundles bug (#12346) 2020-11-25 11:02:43 +08:00
fangfengbin 5934b20b96 [PlacementGroup]Fix destroy bundle resources bug (#12336)
* [PlacementGroup]Fix destroy bundle resources bug

* revert AddBundleLocations code change

* add comment

* fix review comments

Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-11-25 09:45:26 +08:00
Eric Liang 9f322db71d Add many_ppo long running test (#12364)
* add new tes

* update

* update
2020-11-24 16:00:33 -08:00
Edward Oakes dae137b919 Don't allow 'optional' files in setup.py (#12359) 2020-11-24 17:41:58 -06:00
Sven Mika 4afaa46028 [RLlib] Increase the scope of RLlib's regression tests. (#12200) 2020-11-24 22:18:31 +01:00
Eric Liang 5895554555 [autoscaler] Raise node "start" deadline to 900s, make configurable (#12316) 2020-11-24 12:16:01 -08:00
Edward Oakes 4ada3e4c99 [serve] Incremental change towards async control loop for replica startup (#12281) 2020-11-24 13:06:08 -06:00
roireshef 888357d251 added address resolution fix for running in docker containers (#11944)
* added address resolution fix for running in docker containers

* added address resolution fix for running in docker containers (java)

* Update RayNativeRuntime.java

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-11-24 10:34:56 -08:00