Commit Graph

6879 Commits

Author SHA1 Message Date
Edward Oakes 66daed99f5 Remove top-level ray.connect() and ray.disconnect() APIs (#13273) 2021-01-08 15:26:20 -06:00
dependabot[bot] 300a22d8f7 [tune](deps): Bump gluoncv from 0.9.0 to 0.9.1 in /python/requirements (#13287) 2021-01-08 11:42:58 -08:00
dependabot[bot] 3569b78237 [tune](deps): Bump mlflow from 1.13.0 to 1.13.1 in /python/requirements (#13286) 2021-01-08 11:42:18 -08:00
Sven Mika 6f342a2221 [RLlib] Preparatory PR for: Documentation on Model Building. (#13260) 2021-01-08 10:56:09 +01:00
Philipp Moritz a247c71e2e [ray_client] Add metadata to gRPC requests (#13167) 2021-01-07 23:58:15 -08:00
Hao Chen 77cd0d5a21 Fix a crash problem caused by GetActorHandle in ActorManager (#13164) 2021-01-08 12:11:08 +08:00
fyrestone a6d135a072 [Dashboard] Add GET /log_proxy API (#13165) 2021-01-08 11:45:07 +08:00
Tao Wang ab2229dcb7 [GCS] Remove old lightweight resource usage report code path (#13192) 2021-01-08 10:30:00 +08:00
Ian Rodney 4aef3d6836 [docker] Pull if image is not present (#13136) 2021-01-07 17:17:00 -08:00
Amog Kamsetty 0f5d36ce5e [Dependabot] Add Dependabot (#13278)
Co-authored-by: Ian <ian.rodney@gmail.com>
2021-01-07 16:33:02 -08:00
Amog Kamsetty 43f70faa25 [Tune] Pin Tune Dependencies (#13027)
Co-authored-by: Ian <ian.rodney@gmail.com>
2021-01-07 14:03:06 -08:00
Amog Kamsetty f68922d043 [Tune] Improve error message for Session Detection (#13255)
* Improve error message

* log once
2021-01-07 22:40:44 +01:00
Sven Mika a5b39ef8e2 [RLlib] Fix missing "info_batch" arg (None) in compute_actions calls. (#13237) 2021-01-07 21:25:02 +01:00
Simon Mo c32ad2fef5 [Release] Use ray-ml image for logn running test (#13267) 2021-01-07 10:31:46 -08:00
Max Fitton 5094734205 Update autoscaler-cluster yaml files for release tests (#13114) 2021-01-07 11:44:57 -06:00
Simon Mo 01dcb993c7 [Serve] Rescale Serve's Long Running Test to Cluster Mode (#13247)
Now that `HeadOnly` becomes the new default HTTP location, we can
re-enable the long running tests to use local multi-clusters.
(also fixed the controller's API to match up to date, we should
have caught these, I will open issues for this.)
2021-01-07 08:57:24 -08:00
Sven Mika bcaff63909 [RLlib] SquashedGaussians should throw error when entropy or kl are called. (#13126) 2021-01-07 15:07:35 +01:00
Tao Wang 82c54c67ee Publish job/worker info with Hex format instead of Binary (#13235) 2021-01-07 20:31:58 +08:00
fangfengbin 3669c02821 [GCS]Add gcs actor schedule strategy (#13156) 2021-01-07 15:44:33 +08:00
Philipp Moritz 9872fc1801 Start ray client server with 'ray start' (#13217) 2021-01-06 21:04:14 -08:00
fangfengbin 9ae5bba7cf [GCS]Fix gcs table storage GetAll and GetByJobId api bug (#13195) 2021-01-07 10:37:00 +08:00
Siyuan (Ryans) Zhuang dde49b8d48 [Serialization] Fix cloudpickle (#13242) 2021-01-06 17:21:17 -08:00
Siyuan (Ryans) Zhuang 02ae6c5a9a [Core] Fix incorrect comment (#13228) 2021-01-06 11:37:29 -08:00
Max Fitton 0d61ea9b06 [Release] Add 1.1.0 release test logs (#13054)
* Add microbenchmark to release logs

* check in many_tasks stress test result

* Add results of placement group stress test for 1.1.0

* Add result for test_dead_actors test and correct the name of test_many_tasks.txt

* Add rllib regression test result

* Add pytorch test results for rllib

* remove extraneous log entries
2021-01-06 11:03:16 -08:00
Lingxuan Zuo 01d4638b49 [Log] fix spdlog init race (#12973)
* fix spdlog init race

* use global logger

* refine logger name and constructor
2021-01-06 11:02:54 -08:00
dHannasch 695833082d [Redis] Note that each Redis Connect retry takes two minutes (#12183)
* Slightly alter error message so it's the same in both cases.

* Each retry takes about two minutes.
2021-01-06 11:00:58 -08:00
Kai Fricke 97211a6170 [Tune] Fix tune serve integration example (#13233) 2021-01-06 17:02:04 +01:00
SangBin Cho 32dc5676b4 [Metrics] Record per node and raylet cpu / mem usage (#12982)
* Record per node and raylet cpu / mem usage

* Add comments.

* Addressed code review.
2021-01-05 21:57:21 -08:00
fangfengbin 779b3876f6 [GCS]Fix TestActorSubscribeAll bug (#13193) 2021-01-06 13:52:39 +08:00
fangfengbin dd14e5a3b3 [BugFix][GCS]Fix gcs_actor_manager_test multithreading bug (#13158) 2021-01-06 10:47:06 +08:00
Ian Rodney 92963800f6 [tests] Fix Autoscaler Test failure on Windows (#13211)
* skip create_or_update tests

* Update python/ray/tests/test_autoscaler.py

Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>

Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>
2021-01-05 16:48:32 -08:00
Simon Mo 39813ff6b0 [Serve] HTTPOptions for deployment modes (#13142) 2021-01-05 16:41:52 -08:00
Amog Kamsetty bd19ed31e7 [Tune] Fix PBT Transformers Example (#13174) 2021-01-05 16:31:11 -08:00
Hao Zhang 7e52351ae5 [Collective] Some necessary abstraction of collective calls before introducing stream management (#13162) 2021-01-05 16:20:12 -08:00
Basu Jindal 4e569ee20b Update multi_agent_independent_learning.py (#13196)
pettingzoo.utils.error.DeprecatedEnv: waterworld_v0 is now depreciated, use waterworld_v2 instead
2021-01-05 13:44:54 -08:00
Edward Oakes dc101fd087 [serve] Move controller state into separate files (#13204) 2021-01-05 14:37:16 -06:00
Edward Oakes d738610dc9 Disable atexit test on windows (#13207) 2021-01-05 14:33:51 -06:00
Kai Fricke 96c2d3d2b5 [tune] better signature check for tune.sample_from (#13171)
* [tune] better signature check for `tune.sample_from`

* Update python/ray/tune/sample.py

Co-authored-by: Sumanth Ratna <sumanthratna@gmail.com>

Co-authored-by: Sumanth Ratna <sumanthratna@gmail.com>
2021-01-05 08:04:18 -08:00
Edward Oakes e8162f1b1f [serve] Merge ActorReconciler and BackendState (#13139) 2021-01-05 09:56:22 -06:00
Hao Zhang 4150970226 [Collective][PR 2/6] Driver program declarative interfaces (#12874)
* scaffold of the code

* some scratch and options change

* NCCL mostly done, supporting API#1

* interface 2.1 2.2 scratch

* put code into ray and fix some importing issues

* add an addtional Rendezvous class to safely meet at named actor

* fix some small bugs in nccl_util

* some small fix

* scaffold of the code

* some scratch and options change

* NCCL mostly done, supporting API#1

* interface 2.1 2.2 scratch

* put code into ray and fix some importing issues

* add an addtional Rendezvous class to safely meet at named actor

* fix some small bugs in nccl_util

* some small fix

* add a Backend class to make Backend string more robust

* add several useful APIs

* add some tests

* added allreduce test

* fix typos

* fix several bugs found via unittests

* fix and update torch test

* changed back actor

* rearange a bit before importing distributed test

* add distributed test

* remove scratch code

* auto-linting

* linting 2

* linting 2

* linting 3

* linting 4

* linting 5

* linting 6

* 2.1 2.2

* fix small bugs

* minor updates

* linting again

* auto linting

* linting 2

* final linting

* Update python/ray/util/collective_utils.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Update python/ray/util/collective_utils.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Update python/ray/util/collective_utils.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* added actor test

* lint

* remove local sh

* address most of richard's comments

* minor update

* remove the actor.option() interface to avoid changes in ray core

* minor updates

Co-authored-by: YLJALDC <dal177@ucsd.edu>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-01-04 20:57:37 -08:00
Tao Wang c617291b27 [build]Update description and add some keywords (#13163) 2021-01-05 11:34:03 +08:00
Tao Wang a0bbf2bfc2 Notify listeners after registered node stored (#13069) 2021-01-05 11:18:03 +08:00
fangfengbin 88eaa87e3a Remove unused file(object_manager_integration_test.cc) (#12989) 2021-01-05 11:09:36 +08:00
Barak Michener 9643e44af6 [ray_client]: Move from experimental to util (#13176)
Change-Id: I9f054881f0429092d265cd6944d89804cce9d946
2021-01-04 17:51:56 -08:00
Eric Liang dfb326d4b5 Surface object store spilling statistics in ray memory (#13124) 2021-01-04 17:35:39 -08:00
Stephanie Wang b765914a1b Revert "Enabling the cancellation of non-actor tasks in a worker's queue (#12117)" (#13178)
This reverts commit b4d688b4a6.
2021-01-04 17:27:48 -08:00
Amog Kamsetty e181515dff [SGD] Fix Docstring for as_trainable (#13173) 2021-01-04 17:21:24 -08:00
Amog Kamsetty 15e86581bd [XGboost] Update Documentation (#13017)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-01-04 17:21:04 -08:00
Siyuan (Ryans) Zhuang 46cf433f0e [Core] Remove Arrow dependencies (#13157)
* remove arrow ubsan

* remove arrow build depend

* remove arrow buffer
2021-01-04 11:19:09 -08:00
Max Fitton d018212db5 [Release] Update Release Process Documentation (#13123) 2021-01-04 11:09:43 -08:00