Commit Graph

42 Commits

Author SHA1 Message Date
Edward Oakes 786f12edfd [serve] Serve client refactor (#10409) 2020-09-04 12:02:23 -05:00
Eric Liang 519354a39a [api] Initial API deprecations for Ray 1.0 (#10325) 2020-08-28 15:03:50 -07:00
SangBin Cho c00742f103 [Release] Fix release tests (#9733) 2020-07-28 10:44:06 -07:00
Simon Mo a078a21437 [Serve] Allow multiple HTTP servers. (#9523) 2020-07-24 12:41:20 -07:00
Ian Rodney b8fc259796 [serve] Rename to Controller (#9566) 2020-07-20 12:50:29 -07:00
Hao Chen d49dadf891 Change Python's ObjectID to ObjectRef (#9353) 2020-07-10 17:49:04 +08:00
Ian Rodney 9172f8c3a6 [core] Store Internal Config in GCS (#8921) 2020-07-08 11:22:08 -05:00
Max Fitton ad09aa985c Make Dashboard Port Configurable (#8999) 2020-06-19 16:26:22 -05:00
Ian Rodney 5208f8db12 [testing] Adding useful error messages for long_running_tests (#8960) 2020-06-17 18:32:35 -07:00
Stephanie Wang 19d44d4fa9 Use no_restart=False for ray.kill in Serve failure test (#8952) 2020-06-15 15:34:56 -07:00
SangBin Cho 890cb8cb70 Node failure test fix (#8882) 2020-06-10 16:21:27 -05:00
SangBin Cho 2b3fa35fbe [Serve] Serve long running test fix (#8864) 2020-06-09 12:17:18 -05:00
Edward Oakes f8d20d25bd [serve] Fix long running failure test (#8863) 2020-06-09 12:13:20 -05:00
krfricke e5b6566d28 Remove blocking flag from serve.init() (#8654) 2020-05-29 13:25:35 -07:00
Lingxuan Zuo e594524ed3 [GCS] global state query node info table from GCS. (#8498) 2020-05-28 16:39:13 +08:00
Edward Oakes 22cab930cd Retry actor failures in serve failure test (#8282) 2020-05-02 10:19:44 -05:00
Edward Oakes 019030cb4d Add long-running serve failure test (#8277) 2020-05-01 21:07:14 -05:00
Edward Oakes 13f718846d [serve] Always use internal KV store (#8270) 2020-05-01 14:18:18 -05:00
Edward Oakes 421b3c9d8b Fix serve long running test (#8268) 2020-05-01 11:54:27 -05:00
Simon Mo 1b1fe0cc5b Fix Serve long running test (#8223) 2020-04-29 09:32:39 -07:00
Eric Liang dd70720578 [rllib] Rename sample_batch_size => rollout_fragment_length (#7503)
* bulk rename

* deprecation warn

* update doc

* update fig

* line length

* rename

* make pytest comptaible

* fix test

* fi sys

* rename

* wip

* fix more

* lint

* update svg

* comments

* lint

* fix use of batch steps
2020-03-14 12:05:04 -07:00
Stephanie Wang 7c174d0ffe Make the ref counting test more stressful (#7473) 2020-03-05 20:51:24 -08:00
Simon Mo 29b08ddc09 Improve release process from 0.8.2 (#7303) 2020-02-24 21:18:53 -08:00
Stephanie Wang 2c1f4fd82c [core] Add long running regression test for distributed ref counting and fix memory leak (#7302)
* Add long running test for serialized IDs and fix mem leak

* comment
2020-02-24 17:58:42 -08:00
Eric Liang 5df801605e Add ray.util package and move libraries from experimental (#7100) 2020-02-18 13:43:19 -08:00
Simon Mo bec92a8946 [Hotfix] Fix flake8 lint failing (#7118) 2020-02-10 19:57:21 -08:00
Simon Mo f6c09ff614 Add serve stress test (#7076) 2020-02-10 09:37:39 -08:00
Edward Oakes b750bd7fc9 Use 2xlarge instances in long running tests (#6802) 2020-01-15 19:47:59 -06:00
Sven 60d4d5e1aa Remove future imports (#6724)
* Remove all __future__ imports from RLlib.

* Remove (object) again from tf_run_builder.py::TFRunBuilder.

* Fix 2xLINT warnings.

* Fix broken appo_policy import (must be appo_tf_policy)

* Remove future imports from all other ray files (not just RLlib).

* Remove future imports from all other ray files (not just RLlib).

* Remove future import blocks that contain `unicode_literals` as well.
Revert appo_tf_policy.py to appo_policy.py (belongs to another PR).

* Add two empty lines before Schedule class.

* Put back __future__ imports into determine_tests_to_run.py. Fails otherwise on a py2/print related error.
2020-01-09 00:15:48 -08:00
Edward Oakes 032e8553c7 use numpy in long-running tests (#6448) 2019-12-11 17:53:30 -08:00
Philipp Moritz a454c815f1 Fix long running stress tests (#6374) 2019-12-05 18:29:41 -08:00
Eric Liang 53641f1f74 Move more unit tests to bazel (#6250)
* move more unit tests to bazel

* move to avoid conflict

* fix lint

* fix deps

* seprate

* fix failing tests

* show tests

* ignore mismatch

* try combining bazel runs

* build lint

* remove tests from install

* fix test utils

* better config

* split up

* exclusive

* fix verbosity

* fix tests class

* cleanup

* remove flaky

* fix metrics test

* Update .travis.yml

* no retry flaky

* split up actor

* split basic test

* split up trial runner test

* split stress

* fix basic test

* fix tests

* switch to pytest runner for main

* make microbench not fail

* move load code to py3

* test is no longer package

* bazel to end
2019-11-24 11:43:34 -08:00
Eric Liang a101812b9f Replace --redis-address with --address in test, docs, tune, rllib (#5602)
* wip

* add tests and tune

* add ci

* test fix

* lint

* fix tests

* wip

* sugar dep
2019-09-01 16:53:02 -07:00
Philipp Moritz ccee77aafd fix node_failures.py (#5167) 2019-07-11 11:40:13 -07:00
Hersh Godse 89722ff003 [tune] Directional metrics for components (#4120) (#4915) 2019-06-02 22:13:40 -07:00
bjg2 77005d1814 [rllib] Make batch timeout for remote workers tunable (#4435) 2019-03-29 13:19:42 -07:00
William Ma 11580fb7dc Changes where actor resources are assigned (#4323) 2019-03-24 15:49:36 -07:00
William Ma f423909aec Temporary fix for many_actor_task.py (#4315) 2019-03-09 00:07:45 -08:00
Robert Nishihara fd2d8c2c06 Remove Jenkins backend tests and add new long running stress test. (#4288) 2019-03-08 15:29:39 -08:00
Robert Nishihara f151aa8723 Update long running stress tests and add actor death test. (#4275) 2019-03-06 14:26:45 -08:00
Eric Liang 6e3384a719 [rllib] Add three new long-running stress tests {APEX, IMPALA, PBT} (#4215) 2019-03-04 14:05:42 -08:00
Robert Nishihara 75504b9586 Add script for running infinitely long stress tests. (#4163)
Running `./ci/long_running_tests/start_workloads.sh` will start several workloads running (each in their own EC2 instance).
- The workloads run forever.
- The workloads all simulate multiple nodes but use a single machine.
- You can get the tail of each workload by running `./ci/long_running_tests/check_workloads.sh`.
- You have to manually shut down the instances.

As discussed with @ericl @richardliaw, the idea here is to optimize for the debuggability of the tests. If one of them fails, you can ssh to the relevant instance and see all of the logs.
2019-02-27 14:33:06 -08:00