Commit Graph

2389 Commits

Author SHA1 Message Date
Edward Oakes 30ed20405a [autoscaler] Support creating services in k8s backend (#8659) 2020-05-29 15:19:21 -05:00
Simon Mo 6b04664645 [Serve] Add Tutorial for Batch Inference (#8490) 2020-05-29 09:55:47 -07:00
fangfengbin 35eeec5647 Add C++ global state for actor table (#8501)
* add global state actors

* fix code style

* fix GcsActorManagerTest bug

* rebase master

* add jni code

* add get checkpoint id code

* add debug code

* add debug code

* change log level

* fix compile bug

* return null in jni

* fix crash bug

* change import seq

Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
Co-authored-by: Hao Chen <chenh1024@gmail.com>
2020-05-29 21:10:42 +08:00
Sven Mika d483ed28ba [RLlib] Fix broken tune tests in master due to framework=auto errors. (#8672) 2020-05-29 11:55:47 +02:00
Edward Oakes c64b694560 Update RaySGD test to use ray.kill instead of __ray_kill__ (#8662) 2020-05-28 22:38:05 -05:00
Patrick Ames 76450c8d47 [autoscaler] Honor separate head and worker node subnet IDs (#8374) 2020-05-28 18:16:46 -07:00
Ian Rodney 99cc2e28b3 [Core] Fix test_cancel (#8644) 2020-05-28 09:40:37 -07:00
Hao Chen 08fee00bc8 Increase rayelt client connect timeout to fix test_debug_tools (#8605) 2020-05-28 20:57:30 +08:00
Lingxuan Zuo e594524ed3 [GCS] global state query node info table from GCS. (#8498) 2020-05-28 16:39:13 +08:00
Ujval Misra e958d261b6 Fix ray.available_resources bug (#8537) 2020-05-27 17:55:08 -07:00
Simon Mo 38399c9885 [Hotfix] [Serve] Disable Deployment Tutorial Test (#8641) 2020-05-27 10:40:40 -07:00
Bill Chambers fadd47e44e [docs] Ray Serve Documentation Overhaul (#8524) 2020-05-27 11:03:28 -05:00
fyrestone b0bb0584fb Fix fix_test_actor_method_metadata_cache (#8633) 2020-05-27 15:49:40 +08:00
mehrdadn 79a4eac48c Make more tests run on Windows (#8553) 2020-05-26 18:43:34 -05:00
Edward Oakes 137519e19d [serve] Remove start_server flag (#8620) 2020-05-26 14:34:18 -05:00
Amog Kamsetty ae2e1f0883 [Parallel Iterators] Batching + Pipelining optimizations (#7931)
* batching + get_shard pipelining

* duplicate fix

* formatting

* adding performance benchmark

* minor changes

* turn batching off by default
2020-05-26 00:37:57 -07:00
fyrestone f39760a4d3 Use uuid4() for actor creation function id hash (#8589) 2020-05-26 15:20:03 +08:00
fangfengbin 765d470c40 Add gcs object manager (#8298) 2020-05-25 17:21:35 +08:00
Edward Oakes 860eb6f13a Update named actor API (#8559) 2020-05-24 20:08:03 -05:00
Tao Wang 92c2e41dfd [GCS]profile info getting implementation based gcs service (#8536) 2020-05-24 22:23:01 +08:00
Luca Cappelletti 822de1b7f7 [Tune] Introduced preliminary random search to BayesOpt (#8541) 2020-05-23 12:20:43 -07:00
Kai Yang 2e5e789294 Allow enabling logging in core worker with empty log_dir (#8529) 2020-05-22 18:02:37 +08:00
Siyuan (Ryans) Zhuang 83a819572b Update the pickle5 revision to match the upstream candidate (#8493) 2020-05-21 18:21:37 -07:00
SangBin Cho aa1cbe8abc [Dashboard] Ray memory dashboard backend (#8461) 2020-05-21 12:22:28 -07:00
Eric Liang 9a83908c46 [rllib] Deprecate policy optimizers (#8345) 2020-05-21 10:16:18 -07:00
Hao Chen d27e6da1b2 Fix a lint issue (#8530) 2020-05-21 16:12:44 +08:00
Sven Mika 3a234ed9e3 [RLlib] Error: "Unknown trainable [some rllib algo name]" (#8525) 2020-05-21 08:59:32 +02:00
fangfengbin e261b4778e Adjust the state initialization sequence and put it after core worker google logging initialization (#8511) 2020-05-21 11:30:28 +08:00
Simon Mo ed2f434593 [Serve] Start Replicas in Parallel (#8433) 2020-05-20 19:46:03 -07:00
Edward Oakes a76434ccde Add ability to specify worker and driver ports (#8071) 2020-05-20 15:31:13 -05:00
mehrdadn ebf060d484 Make more tests run on Windows (#8446)
* Remove worker Wait() call due to SIGCHLD being ignored

* Port _pid_alive to Windows

* Show PID as well as TID in glog

* Update TensorFlow version for Python 3.8 on Windows

* Handle missing Pillow on Windows

* Work around dm-tree PermissionError on Windows

* Fix some lint errors on Windows with Python 3.8

* Simplify torch requirements

* Quiet git clean

* Handle finalizer issues

* Exit with the signal number

* Get rid of wget

* Fix some Windows compatibility issues with tests

Co-authored-by: Mehrdad <noreply@github.com>
2020-05-20 12:25:04 -07:00
Eric Liang aa7a58e92f [rllib] Support training intensity for dqn / apex (#8396) 2020-05-20 11:22:30 -07:00
Luca Cappelletti c9898eff24 [Tune] Added method to integrate previous analysis in BO (#8486) 2020-05-19 23:26:43 -07:00
Edward Oakes 85cb721f19 [serve] Fix worker replica leak (#8506) 2020-05-19 20:51:50 -05:00
Ian Rodney 1163ddbe45 Remove timeouts in test_cancel (#8272) 2020-05-19 12:35:16 -05:00
internetcoffeephone a73c488c74 Change tf_utils.py get_weights to evaluate all tensors at once rather than calling tensor.eval per-tensor. (#8491) 2020-05-18 22:06:03 -07:00
Luca Cappelletti 5b330de182 [Tune] Introduced patience to early stopping (#8484) 2020-05-18 13:12:16 -07:00
Luca Cappelletti d1ef70da16 [Tune] Added default values for utility kwargs (#8488) 2020-05-18 13:10:43 -07:00
Robert Nishihara 14aeb30473 [Serve] Require traffic weights to sum more closely to 1. (#8476) 2020-05-18 11:46:34 -07:00
Max Fitton 0fadc11437 [dashboard] Only show workers from the correct cluster (#8434) 2020-05-18 13:30:41 -05:00
Max Fitton 13231ba63b Rename redis-port to port and add default (#8406) 2020-05-18 13:25:34 -05:00
Robert Nishihara 2cff471d2c Don't print Redis connection warning in ray.init(). (#8475) 2020-05-18 11:19:13 -07:00
fangfengbin 9347a5d10c Add global state accessor of jobs (#8401) 2020-05-18 20:32:05 +08:00
Richard Liaw 87cbf2aedd [docs][tune] Make search algorithm, scheduler docs better! (#8179) 2020-05-17 12:19:44 -07:00
Luca Cappelletti 2ff26f13d2 [tune] Added EarlyStopping and relative test suite (#8459) 2020-05-17 12:18:59 -07:00
Joseph Lucas 42c9fa19d1 [autoscaler] Ray Up url-arg (#8279) 2020-05-17 12:18:00 -07:00
Edward Oakes 16f48078d9 Remove use of ObjectID transport flag (#7699) 2020-05-17 11:29:49 -05:00
Edward Oakes fb23bd6fc0 [serve] Optionally namespace serve clusters (#8447) 2020-05-17 00:14:42 -05:00
Richard Liaw 67c01455fe [tune] tune.track -> tune.report (#8388) 2020-05-16 12:55:08 -07:00
Stephanie Wang bd169749e0 Option to retry failed actor tasks (#8330)
* Python

* Consolidate state in the direct actor transport, set the caller starts at

* todo

* Remove unused

* Update and unit tests

* Doc

* Remove unused

* doc

* Remove debug

* Update src/ray/core_worker/transport/direct_actor_transport.h

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* Update src/ray/core_worker/transport/direct_actor_transport.cc

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* lint and fix build

* Update

* Fix build

* Fix tests

* Unit test for max_task_retries=0

* Fix java?

* Fix bad test

* Cross language fix

* fix java

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-05-15 20:15:15 -07:00