Commit Graph

6812 Commits

Author SHA1 Message Date
Philipp Moritz 295099b863 Fix release docs (#4225) 2019-03-02 22:01:43 -08:00
Philipp Moritz fbdd5da9c1 Fix application stress tests (#4228)
Fixes https://github.com/ray-project/ray/issues/4227
2019-03-02 21:57:27 -08:00
Richard Liaw a27cb225b6 Modularize Tune tests from multi-node tests (#4204) 2019-03-02 19:21:08 -08:00
Philipp Moritz 180414710e Make sure Bazel generated files get overwritten (#4205) 2019-03-02 13:38:37 -08:00
Robert Nishihara 4b89eebfc7 Move test folders under rllib/tune from test -> tests. (#4214) 2019-03-02 13:37:16 -08:00
Yuhong Guo 6f46edca51 Skip dead nodes to avoid connection timeout. (#4154) 2019-03-02 13:11:19 -08:00
Eric Liang 9950f63e8c Send task error instead of raw exception for signal (#4150) 2019-03-01 23:59:29 -08:00
Robert Nishihara c4aa90314d Add script for shutting down tests. (#4203) 2019-03-01 19:56:30 -08:00
Robert Nishihara f21e6a2cff Update documentation regarding UI and timeline. (#4189) 2019-03-01 19:54:33 -08:00
bjg2 962b17f567 [wingman -> rllib] IMPALA MultiDiscrete changes (#3967) 2019-03-01 19:47:06 -08:00
Antoine Galataud 8288deb92d Add multi agent support in rollout.py (#4114) 2019-03-01 19:45:39 -08:00
Hao Chen 48f6cd3e5d Release GIL in prepare_actor_checkpoint (#4208) 2019-03-01 19:43:28 -08:00
Hao Chen 6f1a29ad3f Consodiate CI Python tests and fix bug about multiple ray.init (#4195) 2019-03-01 14:38:28 -08:00
bjg2 9c48cc27aa [wingman -> rllib] Removed remote evaluators assert (#4165) 2019-03-01 13:27:27 -08:00
Eric Liang b5799b5286 [rllib] Set PPO observation filter to NoFilter by default (#4191) 2019-03-01 13:19:33 -08:00
adoda 11a28834fa [tune] Reduce the times for flushing json object to file (#4198)
<!--
Thank you for your contribution!

Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request.
-->

## What do these changes do?

<!-- Please give a short brief about these changes. -->
When we write one result using JsonLogger, it will call 'flush' many times, which may cost a lot of time when writing  to a remote distributed filesystem.

## Related issue number
#4197 
<!-- Are there any issues opened that will be resolved by merging this change? -->
2019-03-01 02:15:48 -08:00
Hao Chen 14ff402d70 Make ray stop command also kill Java workers (#4179) 2019-03-01 11:05:19 +08:00
Robert Nishihara d9bcaa20b5 Turn UI off by default. (#4188) 2019-02-28 17:29:52 -08:00
Richard Liaw c695402dc3 [tune] Introduce ability to turn off default logging. (#4104) 2019-02-28 17:02:41 -08:00
Eric Liang b809ef0107 [rllib] Silent tests (#4151) 2019-02-28 16:32:22 -08:00
Ion 88e14feb53 Reset signal counters when a task finishes (#4173) 2019-02-28 15:15:03 -08:00
Philipp Moritz 4dc683d39e Use latest arrow wheels (#4182) 2019-02-28 12:17:32 -08:00
Ion 44afcf4fa8 [doc] Document experimental signal API. (#4019)
* [doc] Document signal API.

* minor

* resolve conflicts
2019-02-28 11:05:28 -08:00
Robert Nishihara 9c5fdbb63c Release gil when doing ray.wait. (#4190) 2019-02-28 00:32:07 -08:00
Hao Chen 484708d44d Fix JNI throwing exception (#4178) 2019-02-28 15:11:25 +08:00
Robert Nishihara 387c98cf01 Make sure dashboard is packaged with wheels. (#4175) 2019-02-27 18:36:49 -08:00
Ion 7395c86a50 A few fixes in receive() signal. (#4142) 2019-02-27 18:00:59 -08:00
Philipp Moritz 9ca9691cdc Fix mnist sgd jenkins tests on master (#4168) 2019-02-27 16:02:18 -08:00
Robert Nishihara 75504b9586 Add script for running infinitely long stress tests. (#4163)
Running `./ci/long_running_tests/start_workloads.sh` will start several workloads running (each in their own EC2 instance).
- The workloads run forever.
- The workloads all simulate multiple nodes but use a single machine.
- You can get the tail of each workload by running `./ci/long_running_tests/check_workloads.sh`.
- You have to manually shut down the instances.

As discussed with @ericl @richardliaw, the idea here is to optimize for the debuggability of the tests. If one of them fails, you can ssh to the relevant instance and see all of the logs.
2019-02-27 14:33:06 -08:00
Yuhong Guo 41b81af11b Downgrade six to 1.0.0 (#4180) 2019-02-27 13:05:25 -08:00
Yuhong Guo 0a11b27971 Fix the case of use decorator directly to raw class and add test case (#4177) 2019-02-28 00:09:42 +08:00
Wang Qing db5c3b22b7 Fix the issue about starting cross-lang cluster (#4176) 2019-02-27 20:11:58 +08:00
Richard Liaw 5bfcfa8ec8 [autoscaler] Fix Submit (#4174) 2019-02-27 00:02:50 -08:00
Robert Nishihara 641f703879 Update installation instructions to include bazel and remove outdated… (#4171) 2019-02-26 23:07:43 -08:00
Hao Chen d583edb07c Skip test_multithreading in Python 2 (#4107) 2019-02-27 14:06:12 +08:00
Adi Zimmerman 5cf388f29d [tune] Support RESTful API for the Web Server (#4080)
Change the client/server API to RESTful design. This includes resource modeling, model URI's, and correct HTTP methods.
2019-02-26 21:56:02 -08:00
Kristian Hartikainen 33663bef94 Ignore bazel and pyarrow files (#4172) 2019-02-26 17:27:36 -08:00
justinwyang 19b8793b6a Updated test script paths in documentation (#4170) 2019-02-26 16:14:55 -08:00
Richard Liaw f7450dbdd7 [tests] Stress tests for Jenkins (#3789)
Stress testing for Jenkins.

<!--
Thank you for your contribution!

Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request.
-->


<!-- Please give a short brief about these changes. -->
TODO:
 - [x] Enable a common keypair for autoscaling 
 - [x] Add automatic timeouts?
 - [x] Switch out key pair one last time before merge
2019-02-26 14:24:37 -08:00
John Liagouris 89ce4c56aa Initial Skeleton for Streaming API (#4126) 2019-02-26 12:15:08 -08:00
Hao Chen 62055cc01c Cleanup depulicated code of Cython ID types (#4162) 2019-02-26 16:19:12 +08:00
Eric Liang 60dbc771a2 Revert "[autoscaler] Fix redirects, fix submit (#4085)" (#4158)
This reverts commit acf4d53b55.
2019-02-25 17:00:59 -08:00
Eric Liang 3896b726dd Dynamically adjust redis memory usage (#4152)
* f

* Update services.py
2019-02-25 16:21:37 -08:00
Hao Chen 49dc85e54b Fix wrong ID type in prepare_checkpoint (#4124)
* Fix wrong ID type in prepare_checkpoint

* fix

* fix eq
2019-02-25 11:53:09 -08:00
Kristian Hartikainen 524e69a82d [autoscaler] Change the get behavior of node providers' _get_node (#4132)
* Change the get behavior of GCPNodeProvider._get_node

* Add lock around the GCPNodeProvider._get_node call

* rename nodes

* lint

* Update GCPNodeProvider._get_node to match aws implementation

* assert

* log

* log highest heartbeats

* rename

* bringup to connected

* prune heartbeat times

* fix bringup
2019-02-24 18:43:35 -08:00
Eric Liang d9da183c7d [rllib] Custom supervised loss API (#4083) 2019-02-24 15:36:13 -08:00
Robert Nishihara 7b04ed059e Move TensorFlowVariables to ray.experimental.tf_utils. (#4145) 2019-02-24 14:26:46 -08:00
Philipp Moritz 615d5516d1 Compile valgrind tests with Bazel (#4144) 2019-02-24 00:00:49 -08:00
Eric Liang 05d96ce81b [rllib] Raise an error if multi-agent envs terminate without a last observation for agents (#4139)
* fix it

* lint

* Update rllib-training.rst
2019-02-23 21:23:40 -08:00
Robert Nishihara 688a0d17e6 Kill dashboard and reporter in ray stop. (#4116) 2019-02-23 12:08:39 -08:00