Commit Graph

6915 Commits

Author SHA1 Message Date
fyrestone 4853aa96cb [Dashboard] Fix missing actor pid (#13229) 2021-01-13 16:45:12 +08:00
Barak Michener 0b22341bc9 [ray_client]: Wait for ready and retry on ray.connect() (#13376)
* [ray_client]: wait until connection ready

Change-Id: Ie443be60c33ab7d6da406b3dcaa57fbb7ba57dd6

* lint

Change-Id: I30f8e870bbd5f8859a9f11ae244e210f077cedd0

* docs and retry minimum

Change-Id: I43f5378322029267ddd69f518ce8206876e2129d
2021-01-13 00:19:15 -08:00
Sven Mika d49c3fae0b [RLlib] Trajectory View API: Atari framestacking. (#13315) 2021-01-13 08:53:34 +01:00
Eric Liang 912d0cbbf9 Enable Ray client server by default (#13350)
* update

* fix

* fix test

* update
2021-01-12 21:31:01 -08:00
Simon Mo 8e0a2f669b [Doc] Remove trailing whitespaces (#13390) 2021-01-12 20:35:38 -08:00
Tao Wang f587b9a50c Remove unimplemented GetAll method in actor info accessor (#13362) 2021-01-13 09:55:27 +08:00
SangBin Cho 0428537d0b [Object Spilling] Long running object spilling test (#13331)
* done.

* formatting.
2021-01-12 16:53:13 -08:00
Amog Kamsetty 4d83003992 trigger doc build for serve updates (#13373) 2021-01-12 13:08:55 -08:00
Ian Rodney 2e70743077 [Serve] Backend state unit tests (#13319) 2021-01-12 14:54:04 -06:00
Maltimore 3a3e4aed86 [RLlib] Add __len__() method to SampleBatch (#13371) 2021-01-12 20:15:23 +01:00
architkulkarni e560933f9c [Serve] Add dependency management support for driver not running in a conda env (#13269) 2021-01-12 09:57:15 -08:00
Kai Fricke 518427627b [tune] buffer trainable results (#13236)
* Working prototype

* Pass buffer length, fix tests

* Don't buffer per default

* Dispatch and process save in one go, added tests

* Fix tests

* Pass adaptive seconds to train_buffered, stop result processing after STOP decision

* Fix tests, add release test

* Update tests

* Added detailed logs for slow operations

* Update python/ray/tune/trial_runner.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Apply suggestions from code review

* Revert tests and go back to old tuning loop

* nit

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-01-12 18:52:47 +01:00
Amog Kamsetty 9eebd090cf [Dependabot] [CI] Re-configure Dependabot and disable duplicate builds (#13359) 2021-01-12 09:28:58 -08:00
Kai Fricke 25f10a947a Revert "[RLlib] Make TFModelV2 behave more like TorchModelV2: Obsolete register_variables. Unify variable dicts. (#13339)" (#13361)
This reverts commit e2b2abb88b.
2021-01-12 12:33:57 +01:00
Dmitri Gekhtman 7166949194 [Kubernetes][Docs] GPU usage (#13325)
* gpu-note

* gpu-note

* More info

* lint?

* Update doc/source/cluster/kubernetes.rst

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Update doc/source/cluster/kubernetes.rst

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Update doc/source/cluster/kubernetes.rst

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Update doc/source/cluster/kubernetes.rst

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* GKE->Kubernetes

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-01-11 21:36:31 -08:00
Edwin Goh a5ddc27bab Fix typo in Tune Docs (Checkpointing) (#13348)
See issue #13299
2021-01-11 20:27:18 -08:00
Eric Liang 470fda190a Forgot overwrite parameter in Ray client internal kv 2021-01-11 17:50:06 -08:00
Amog Kamsetty 0452a3a435 [Tune] Rename MLFlow to MLflow (#13301) 2021-01-11 17:36:55 -08:00
Eric Liang de5bc24c60 Implement internal kv in ray client (#13344)
* kv internal

* fix
2021-01-11 14:54:52 -08:00
Eric Liang fbb9795374 [client] Report number of currently active clients on connect (#13326)
* wip

* update

* update

* reset worker

* fix conn

* fix

* disable pycodestyle
2021-01-11 14:53:12 -08:00
Sven Mika e2b2abb88b [RLlib] Make TFModelV2 behave more like TorchModelV2: Obsolete register_variables. Unify variable dicts. (#13339) 2021-01-11 22:42:30 +01:00
architkulkarni c43fa12e73 [Serve] Support Starlette streaming response (#13328) 2021-01-11 13:27:44 -08:00
ZhuSenlin c39658f368 fix removal of task dependencies (#13333)
Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
2021-01-11 09:55:48 -08:00
Edward Oakes 62e1ad3973 [serve] Cleanup backend state, move checkpointing and async goal logic inside (#13298) 2021-01-11 11:45:43 -06:00
Sven Mika 5d50d37f45 [RLlib] Issue 13330: No TF installed causes crash in ModelCatalog.get_action_shape() (#13332) 2021-01-11 13:19:46 +01:00
Edward Oakes 93006c2ba5 Use wait_for_condition to reduce flakiness in test_queue.py::test_custom_resources (#13210) 2021-01-10 19:32:59 -06:00
Barak Michener 6f0083ed10 add the method annotation and a comment explaining what's happening (#13306)
Change-Id: I848cc2f0beaed95340d9de7cca19a50c78d9da9a
2021-01-10 15:54:10 -08:00
Akash Patel 94a873fc4d remove empty extras streaming deps (#12933) 2021-01-10 12:09:27 -08:00
Kai Fricke d4b0a9fadf [tune] convert search spaces: parse spec before flattening (#12785)
* Parse spec before flattening

* flatten after parse

* Test for ValueError if grid search is passed to search algorithms
2021-01-09 18:21:49 +01:00
Sven Mika 9dd9f72111 [RLlib] Add more detailed Documentation on Model building API (#13261) 2021-01-09 12:38:29 +01:00
Michael Luo 67229bf350 [RLlib] SlateQ Documentation (#13266) 2021-01-09 11:21:51 +01:00
Edward Oakes d434ba6518 [serve] Clean up EndpointState interface, move checkpointing inside of EndpointState (#13215) 2021-01-08 22:36:19 -06:00
Philipp Moritz c5ae30d1d4 Do not give an error if both RAY_ADDRESS and address is specified on initialization (#13305)
* Finalize handling of RAY_ADDRESS

* lint
2021-01-08 18:31:32 -08:00
Barak Michener eb6f403b97 [ray_client]: first draft of documentation (#13216) 2021-01-08 15:38:36 -08:00
Ian Rodney f916549602 [Cancellation] Make Test Cancel Easier to Debug (#13243)
* first commit

* lint-fix
2021-01-08 14:52:43 -08:00
Alex Wu 6ca4fb1054 [Pull manager] Only pull once per retry period (#13245)
* .

* docs

* cleanup

* .

* .

* .

* .

Co-authored-by: Alex <alex@anyscale.com>
2021-01-08 14:51:11 -08:00
Edward Oakes 66daed99f5 Remove top-level ray.connect() and ray.disconnect() APIs (#13273) 2021-01-08 15:26:20 -06:00
dependabot[bot] 300a22d8f7 [tune](deps): Bump gluoncv from 0.9.0 to 0.9.1 in /python/requirements (#13287) 2021-01-08 11:42:58 -08:00
dependabot[bot] 3569b78237 [tune](deps): Bump mlflow from 1.13.0 to 1.13.1 in /python/requirements (#13286) 2021-01-08 11:42:18 -08:00
Sven Mika 6f342a2221 [RLlib] Preparatory PR for: Documentation on Model Building. (#13260) 2021-01-08 10:56:09 +01:00
Philipp Moritz a247c71e2e [ray_client] Add metadata to gRPC requests (#13167) 2021-01-07 23:58:15 -08:00
Hao Chen 77cd0d5a21 Fix a crash problem caused by GetActorHandle in ActorManager (#13164) 2021-01-08 12:11:08 +08:00
fyrestone a6d135a072 [Dashboard] Add GET /log_proxy API (#13165) 2021-01-08 11:45:07 +08:00
Tao Wang ab2229dcb7 [GCS] Remove old lightweight resource usage report code path (#13192) 2021-01-08 10:30:00 +08:00
Ian Rodney 4aef3d6836 [docker] Pull if image is not present (#13136) 2021-01-07 17:17:00 -08:00
Amog Kamsetty 0f5d36ce5e [Dependabot] Add Dependabot (#13278)
Co-authored-by: Ian <ian.rodney@gmail.com>
2021-01-07 16:33:02 -08:00
Amog Kamsetty 43f70faa25 [Tune] Pin Tune Dependencies (#13027)
Co-authored-by: Ian <ian.rodney@gmail.com>
2021-01-07 14:03:06 -08:00
Amog Kamsetty f68922d043 [Tune] Improve error message for Session Detection (#13255)
* Improve error message

* log once
2021-01-07 22:40:44 +01:00
Sven Mika a5b39ef8e2 [RLlib] Fix missing "info_batch" arg (None) in compute_actions calls. (#13237) 2021-01-07 21:25:02 +01:00
Simon Mo c32ad2fef5 [Release] Use ray-ml image for logn running test (#13267) 2021-01-07 10:31:46 -08:00