Commit Graph

4601 Commits

Author SHA1 Message Date
Sven Mika c7cb2f5416 [RLlib] IMPALA PyTorch GPU fixes (#8397) 2020-05-11 22:03:27 +02:00
Edward Oakes fdf0e5ceb1 Update README to say that python 2 is deprecated (#8404) 2020-05-11 14:49:49 -05:00
Jason McGhee 24ced808cd Fix config key in docs for using PyTorch (#8300)
Docs improperly suggest using "torch" when the actual flag is called "use_pytorch"
2020-05-11 12:41:21 -07:00
Stephanie Wang f97f466cec Fix test (#8391) 2020-05-11 10:15:53 -07:00
mehrdadn 66b3edccb9 Prefer built-in system compilers over Clang download (#8355)
Co-authored-by: Mehrdad <noreply@github.com>
2020-05-11 11:53:35 -05:00
fangfengbin 515afa6809 Fix AsyncGetAll miss override bug (#8402)
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-05-11 11:08:16 -05:00
fangfengbin 8d0c1b5e06 GCS adapts to actor table pub sub (#8347) 2020-05-11 13:53:53 +08:00
Simon Mo 501b936114 [Serve] Improve error message when result is not a list (#8378) 2020-05-10 17:18:06 -07:00
Stephanie Wang 3a25f5f5b4 Clean up actor state from the GCS (#8261)
* parametrize test

* Regression test and logging

* Test no restart after actor deletion

* Unit tests

* Refactor to subscribe to and lookup from worker failure table

* Refactor ActorManager to remove dependencies

* Revert "Regression test and logging"

This reverts commit 835e1a9091b51ca8efb00392d4cc4a665145de24.

* Revert "parametrize test"

This reverts commit f31272082831ba1a494816dd5511d87b24eca4c9.

* Revert "Test no restart after actor deletion"

This reverts commit 114a83de14329aa6ab787c80cd5757cf074a9072.

* doc

* merge

* Revert "Refactor to subscribe to and lookup from worker failure table"

This reverts commit 6aa13a05178d0b9aa1db9dee5c978c911b74fa3a.

* Revert "Revert "Test no restart after actor deletion""

This reverts commit 1bd92d09172aa8ab42632551cf9c56463f9598fe.

* Revert "Revert "parametrize test""

This reverts commit 639ba4d3b02167fb2b05e9878f9aa600bcec95b3.

* Revert "Revert "Regression test and logging""

This reverts commit f18b5f0db699a23cbccde32789e3639425e99ca4.

* Clean up actors that have gone out of scope

* Use actor ID instead of shared_ptr

* Clean up actors owned by dead workers

* Use actor ID instead of shared_ptr

* TODO and lint

* Fix unit tests

* Add unit tests for supervision and docs

* xx

* Fix tests

* Fix tests

* fix build
2020-05-09 18:43:49 -07:00
Thomas Lecat 4421f3a000 [tune] Close loggers after updating trial (#8307) (#8366) 2020-05-09 13:26:59 -07:00
Edward Oakes 2677b71003 Implement named actors using the GCS service (#8328) 2020-05-09 08:58:10 -05:00
Hao Chen 93138e617a Fix a bad usage of std::move (#8364) 2020-05-09 14:24:24 +08:00
Eric Liang 1126fe4d23 [tune] Add UUID back to trial names (#8377) 2020-05-08 20:20:36 -07:00
fangfengbin 7fec602f2e GCS adapts to node resource table pub sub (#8305) 2020-05-09 10:31:35 +08:00
A Kharitonov 304e31b7e5 Fixed: contrib/MADDPG MADDPGTFPolicy missing self.config assignment (#8343) 2020-05-08 12:05:06 -07:00
Sven Mika 754290daad [RLlib] Add light-weight Trainer.compute_action() tests for all Algos. (#8356) 2020-05-08 16:31:31 +02:00
Sven Mika d946f58fd0 LINT fixes. (#8370) 2020-05-08 16:24:20 +02:00
gehring 7f14fb577d [RLlib] Added TransformerXL and "stabilized for RL" variant, GTrXL (#6470) 2020-05-08 14:10:23 +02:00
Eric Liang 2c599dbf05 [rllib] Port QMIX, MADDPG to new execution API (#8344) 2020-05-07 23:41:10 -07:00
Eric Liang 9f04a65922 [rllib] Add PPO+DQN two trainer multiagent workflow example (#8334) 2020-05-07 23:40:29 -07:00
Sven Mika d7eaacb5fe [RLlib] Issue 8319 DDPG (MA or num_envs_per_worker > 1) broken. (#8324) 2020-05-08 08:26:32 +02:00
Sven Mika 5f278c6411 [RLlib] Examples folder restructuring (models) part 1 (#8353) 2020-05-08 08:20:18 +02:00
Eric Liang 413db0902d Trigger global GC when resources may be occupied by deleted actors 2020-05-07 14:57:21 -07:00
Edward Oakes f2f118df9e [serve] Clear serve cluster state between tests. (#8357) 2020-05-07 16:45:20 -05:00
Eric Liang 30db920787 [rllib] Fix centralized critic example to use right policy (#8341)
* update

* update
2020-05-07 10:47:55 -07:00
Philipp Moritz 325aec81bd Hide aliased autoscaler commands (#8348) 2020-05-07 10:17:59 -07:00
Sven Mika 2b0817cbd3 [RLlib] Retry pip installs (after waiting n seconds) in install-dependencies.sh (#8354) 2020-05-07 17:39:35 +02:00
fangfengbin dd3c050168 GCS adapts to batch heartbeat table pub sub (#8346) 2020-05-07 20:33:36 +08:00
fangfengbin 620ea94873 Fix node manager miss object info bug (#8337) 2020-05-07 20:16:42 +08:00
Eric Liang bc8b606ad7 [rllib] All test suites show up as RLLIB_TESTING=1 only. 2020-05-06 23:11:13 -07:00
Simon Mo c5a5a5de89 [Serve] Refactor Metric System: Counter + Measure Support (#8114) 2020-05-06 17:44:02 -07:00
Eric Liang 1f312debbe Document all ray commands. (#8340) 2020-05-06 16:49:37 -07:00
SangBin Cho e631827a9f [Core] Show_webui segfault fix. (#8323) 2020-05-06 11:45:07 -05:00
Alex Wu 04813c2ef5 [Parallel Iterator] Foreach concur (#8140) 2020-05-06 10:00:01 -05:00
Thomas Desrosiers ec9357b486 [autoscaler] Fix filesystem permission race conditions (#8327) 2020-05-05 17:22:03 -07:00
Eric Liang b14cc16616 [rllib] Enable functional execution workflow API by default (#8221) 2020-05-05 12:36:42 -07:00
mehrdadn 4bdef78e2e Various CI fixes and cleanup (#8289) 2020-05-05 10:47:49 -07:00
fangfengbin 97430b2d0f GCS adapts to node table pub sub (#8209) 2020-05-05 18:34:41 +08:00
Eric Liang ee0eb44a32 Rename async_queue_depth -> num_async (#8207)
* rename

* lint
2020-05-05 01:38:10 -07:00
Eric Liang f48da50e1c [rllib] observation function api for multi-agent (#8236) 2020-05-04 22:13:49 -07:00
Simon Mo 1480bf4295 [Serve] Improve batch size inconsistency error (#8315) 2020-05-04 20:32:12 -07:00
Simon Mo ca929671b6 [Serve] Simplify Validation (#8316) 2020-05-04 20:31:23 -07:00
Rüdiger Busche e93ec3134a Use kubectl delete pod in example (#8295)
Co-authored-by: rbusche <rbusche@inserve.de>
2020-05-04 21:39:30 -05:00
Rüdiger Busche 5dd9dbf74f Add ipython as dependency for autoscaler container (#8297)
Co-authored-by: rbusche <rbusche@inserve.de>
2020-05-04 21:22:38 -05:00
fangfengbin 14d03a0869 GCS adapts to task lease table pub sub (#8299) 2020-05-05 10:16:56 +08:00
ijrsvt cc7bd6650a [core] Enabling Remote Task Cancelation (#8225) 2020-05-04 15:24:22 -07:00
Sven Mika 6c2b9a4cfa [RLlib] Remove tf.py_function from all Schedule classes (not differentiable and causes other bugs in MA setups). (#8304)
Remove tf.py_function from all Schedule classes (not differentiable and causes other bugs in MA setups). (#8304)
2020-05-04 23:53:38 +02:00
Sven Mika a00144f746 [RLlib] Fix issue 8135 (DDPG inf actions when using [-inf,inf] action space). (#8302) 2020-05-04 22:27:30 +02:00
Stephanie Wang 8625e09067 Actor manager refactor and unit tests (#8224)
* parametrize test

* Regression test and logging

* Test no restart after actor deletion

* Unit tests

* Refactor to subscribe to and lookup from worker failure table

* Refactor ActorManager to remove dependencies

* Revert "Regression test and logging"

This reverts commit 835e1a9091b51ca8efb00392d4cc4a665145de24.

* Revert "parametrize test"

This reverts commit f31272082831ba1a494816dd5511d87b24eca4c9.

* Revert "Test no restart after actor deletion"

This reverts commit 114a83de14329aa6ab787c80cd5757cf074a9072.

* doc

* merge

* Revert "Refactor to subscribe to and lookup from worker failure table"

This reverts commit 6aa13a05178d0b9aa1db9dee5c978c911b74fa3a.

* Use actor ID instead of shared_ptr

* TODO and lint

* Update src/ray/gcs/gcs_server/gcs_actor_scheduler.h

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>

* Fix build

* doc

* Build

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-05-04 10:16:52 -07:00
Sven Mika b95e28faea [RLlib] APEX_DDPG (PyTorch) test case and docs. (#8288)
APEX_DDPG (PyTorch) test case and docs.
2020-05-04 09:36:27 +02:00