Commit Graph

6812 Commits

Author SHA1 Message Date
Eric Liang f48da50e1c [rllib] observation function api for multi-agent (#8236) 2020-05-04 22:13:49 -07:00
Simon Mo 1480bf4295 [Serve] Improve batch size inconsistency error (#8315) 2020-05-04 20:32:12 -07:00
Simon Mo ca929671b6 [Serve] Simplify Validation (#8316) 2020-05-04 20:31:23 -07:00
Rüdiger Busche e93ec3134a Use kubectl delete pod in example (#8295)
Co-authored-by: rbusche <rbusche@inserve.de>
2020-05-04 21:39:30 -05:00
Rüdiger Busche 5dd9dbf74f Add ipython as dependency for autoscaler container (#8297)
Co-authored-by: rbusche <rbusche@inserve.de>
2020-05-04 21:22:38 -05:00
fangfengbin 14d03a0869 GCS adapts to task lease table pub sub (#8299) 2020-05-05 10:16:56 +08:00
ijrsvt cc7bd6650a [core] Enabling Remote Task Cancelation (#8225) 2020-05-04 15:24:22 -07:00
Sven Mika 6c2b9a4cfa [RLlib] Remove tf.py_function from all Schedule classes (not differentiable and causes other bugs in MA setups). (#8304)
Remove tf.py_function from all Schedule classes (not differentiable and causes other bugs in MA setups). (#8304)
2020-05-04 23:53:38 +02:00
Sven Mika a00144f746 [RLlib] Fix issue 8135 (DDPG inf actions when using [-inf,inf] action space). (#8302) 2020-05-04 22:27:30 +02:00
Stephanie Wang 8625e09067 Actor manager refactor and unit tests (#8224)
* parametrize test

* Regression test and logging

* Test no restart after actor deletion

* Unit tests

* Refactor to subscribe to and lookup from worker failure table

* Refactor ActorManager to remove dependencies

* Revert "Regression test and logging"

This reverts commit 835e1a9091b51ca8efb00392d4cc4a665145de24.

* Revert "parametrize test"

This reverts commit f31272082831ba1a494816dd5511d87b24eca4c9.

* Revert "Test no restart after actor deletion"

This reverts commit 114a83de14329aa6ab787c80cd5757cf074a9072.

* doc

* merge

* Revert "Refactor to subscribe to and lookup from worker failure table"

This reverts commit 6aa13a05178d0b9aa1db9dee5c978c911b74fa3a.

* Use actor ID instead of shared_ptr

* TODO and lint

* Update src/ray/gcs/gcs_server/gcs_actor_scheduler.h

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>

* Fix build

* doc

* Build

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-05-04 10:16:52 -07:00
Sven Mika b95e28faea [RLlib] APEX_DDPG (PyTorch) test case and docs. (#8288)
APEX_DDPG (PyTorch) test case and docs.
2020-05-04 09:36:27 +02:00
fangfengbin 5f351a05fe GCS adapts to task table pub sub (#8210) 2020-05-04 10:23:55 +08:00
Ion e24276e3c1 New scheduler int capacities (#8192)
* working version

* working version

* done

* done

* done

* addressing most of Philipp comments

* addressing most of Philipp comments
2020-05-03 18:47:30 -07:00
Sven Mika 2c71559b3a [RLlib] Fixes duplicate learning tests in travis.yaml. (#8293)
Fixes duplicate learning tests in travis.yaml.
2020-05-03 20:27:29 +02:00
fangfengbin b7bbc3bc83 [GCS]GCS adapts to object table pub sub (#8180) 2020-05-03 21:44:33 +08:00
Sven Mika 166bb5d690 [RLlib] IMPALA PyTorch (#8287)
This PR adds an IMPALA PyTorch implementation.

- adds compilation tests for LSTM and w/o LSTM.
- adds learning test for CartPole.
2020-05-03 13:44:25 +02:00
Eric Liang 1228369a87 Remove "This tab is experimental" (#8281) 2020-05-02 22:41:28 -07:00
Simon Mo ec6631ae58 Pin redis-py version (#8290) 2020-05-02 22:09:02 -07:00
Stephanie Wang b4ef500675 [core] Disable GCS actor management (#8271) 2020-05-02 22:02:56 -07:00
SangBin Cho 0f54d5ab65 Async actor microbenchmark Script (#8275) 2020-05-02 21:51:00 -07:00
Richard Liaw 40dfb337bf [tune] Hotfix Ax breakage when fixing backwards-compat (#8285) 2020-05-02 20:42:50 -07:00
Xianyang Liu eda526c154 [SGD] Support multiple input model (#8246) 2020-05-02 16:49:09 -07:00
Maksim Smolin c2acb7ffe2 [SGD] Add imagenet example CI (#8150) 2020-05-02 16:48:35 -07:00
Edward Oakes 518ef4c0b3 [serve] Increase timeout waiting for HTTP server (#8286) 2020-05-02 16:55:13 -05:00
mehrdadn ff68fb8c7c Try to fix tests running all the time (#8280)
Co-authored-by: Mehrdad <noreply@github.com>
2020-05-02 15:37:52 -05:00
Edward Oakes 22cab930cd Retry actor failures in serve failure test (#8282) 2020-05-02 10:19:44 -05:00
Edward Oakes 8d3236f1d0 Lower test_utils.wait_for_condition default timeout to 30s (#8283) 2020-05-02 10:19:00 -05:00
Sven Mika 76e1a4df9e Fix TD3 torch via GaussianNoise torch bug. (#8276) 2020-05-02 08:12:21 +02:00
Edward Oakes 019030cb4d Add long-running serve failure test (#8277) 2020-05-01 21:07:14 -05:00
Edward Oakes d4e64709ba Shorten test_joblib (#8273) 2020-05-01 17:11:32 -05:00
mehrdadn bf074073e7 Deploy Windows wheels to Amazon S3 (#8237)
* Deploy to Amazon S3

* Install specifically requested Python version

Co-authored-by: Mehrdad <noreply@github.com>
2020-05-01 14:08:57 -07:00
Sven Mika 42991d723f [RLlib] rllib/examples folder restructuring (#8250)
Cleans up of the rllib/examples folder by moving all example Envs into rllibexamples/env (so they can be used by other scripts and tests as well).
2020-05-01 22:59:34 +02:00
Edward Oakes 13f718846d [serve] Always use internal KV store (#8270) 2020-05-01 14:18:18 -05:00
Richard Liaw 07daff8794 [tune] Avoid breakage - soft deprecation warning for search algs (#8258) 2020-05-01 10:36:43 -07:00
Edward Oakes 3aec683f61 Avoid fate sharing with owner for detached actors (#8267) 2020-05-01 11:58:47 -05:00
Edward Oakes 63bc7dc522 service -> endpoint in router (#8269) 2020-05-01 11:55:34 -05:00
Edward Oakes 421b3c9d8b Fix serve long running test (#8268) 2020-05-01 11:54:27 -05:00
Eric Liang 2a0ad0b8ce [rllib] [hotfix] Remove assert that trips on pytorch multiagent (#8241) 2020-05-01 06:32:54 +02:00
Edward Oakes 6373c70661 [serve] Refactor BackendConfig (#8202) 2020-04-30 22:31:07 -05:00
Edward Oakes 95d187e556 [serve] Add delete_endpoint call (#8256) 2020-04-30 20:59:07 -05:00
Edward Oakes 484f68765c Fix resource_ids_ data race (#8253) 2020-04-30 18:55:54 -05:00
Edward Oakes 43be73e4cf [serve] Add delete_backend call (#8252) 2020-04-30 13:10:39 -05:00
Sven Mika c593fb09b7 [RLlib] Remove all f-strings to keep py3.5 compatibility. 2020-04-30 11:10:16 -07:00
Sven Mika eea75ac623 [RLlib] Beta distribution. (#8229) 2020-04-30 11:09:33 -07:00
Sven Mika b23b6addfc [RLlib] Stabilize Pendulum-v0 regression test cases. (#8232)
Stabilize Pendulum regression test cases.
2020-04-30 15:48:11 +02:00
Richard Liaw 05df80afad Extend timeout for test_tune_server (#8233) 2020-04-30 08:39:46 -05:00
Eric Liang baadbdf8d4 [rllib] Execute PPO using training workflow (#8206)
* wip

* add kl

* kl

* works now

* doc update

* reorg

* add ddppo

* add stats

* fix fetch

* comment

* fix learner stat regression

* test fixes

* fix test
2020-04-30 01:18:09 -07:00
Richard Liaw 35eac2671e [sgd] Resource limit lift for GPU test (#8238) 2020-04-30 00:24:48 -07:00
mehrdadn 254b1ec370 Set up testing and wheels for Windows on GitHub Actions (#8131)
* Move some Java tests into ci.sh

* Move C++ worker tests into ci.sh

* Define run()

* Prepare to move Python tests into ci.sh

* Fix issues in install-dependencies.sh

* Reload environment for GitHub Actions

* Move wheels to ci.sh and fix related issues

* Don't bypass failures in install-ray.sh anymore

* Make CI a little quieter

* Move linting into ci.sh

* Add vitals test right after build

* Fix os.uname() unavailability on Windows

Co-authored-by: Mehrdad <noreply@github.com>
2020-04-29 21:19:02 -07:00
Eric Liang ae54e0dc0a [rllib] Copy plasma memory before adding data to replay buffer 2020-04-29 14:17:54 -07:00