Commit Graph

  • 6c9088eb62 [core] refactor disconnect message processing and enrich WorkExitType (#13527) Keqiu Hu 2021-01-19 22:09:46 -08:00
  • e544c008df Fix restoration request dedup issues. (#13546) SangBin Cho 2021-01-19 15:28:54 -08:00
  • bfe147a6a8 Debug info to GCS pub sub (#13564) Stephanie Wang 2021-01-19 14:55:23 -08:00
  • a0d08c2cc6 Pipe monitor.err logs to driver Eric Liang 2021-01-19 12:27:07 -08:00
  • c963cbc038 Fix Docker Permission for Serve release test again (#13543) Simon Mo 2021-01-19 12:23:30 -08:00
  • 7b4a97c610 Make AWSNodeProvider.create_node return nodes created (#13498) Dmitri Gekhtman 2021-01-19 12:17:46 -08:00
  • 20016c983f [Tune] MLflow Credentials (#13533) Amog Kamsetty 2021-01-19 11:55:13 -08:00
  • 9b071eb449 [metrics] Better validation for tags (#13421) Edward Oakes 2021-01-19 13:26:51 -06:00
  • 99375c4cfc [Object Spilling] Remove retries and use a timer instead. (#13175) SangBin Cho 2021-01-19 11:01:45 -08:00
  • 86d5000047 Fix passing env on windows (#13253) fyrestone 2021-01-20 00:04:38 +08:00
  • 2e3655e8a9 [RLlib] Issue 9071 A3C w/ RNN not working due to VF assuming no RNN. (#13238) Sven Mika 2021-01-19 14:22:36 +01:00
  • e74947cc94 [RLlib] Env directory cleanup and tests. (#13082) Sven Mika 2021-01-19 10:09:39 +01:00
  • 93c0a5549b [RLlib] Deprecate vf_share_layers in top-level PPO/MAML/MB-MPO configs. (#13397) Sven Mika 2021-01-19 09:51:35 +01:00
  • a65ee92b69 [RLlib] MARWIL loss function test case and cleanup. (#13455) Sven Mika 2021-01-19 09:51:05 +01:00
  • 2506a6cd0e Remove PYTHON_MODE that is not defined in Ray so that import * will work from other packages. (#13544) Todd A. Anderson 2021-01-18 23:07:01 -08:00
  • 701038e410 Fix typo (#13098) SameerF 2021-01-18 19:28:10 -08:00
  • 7a2997ea8c [tune] support experiment checkpointing for grid search (#13357) Richard Liaw 2021-01-18 19:24:36 -08:00
  • 1fbc3ddfac Add ability to not start Monitor when calling ray start (#13505) Ameer Haj Ali 2021-01-19 04:31:53 +02:00
  • fb16dd5265 Add Dashboard Python Test to Buildkite (#13530) Simon Mo 2021-01-18 17:20:45 -08:00
  • 6341f1fa2e [Serve] Allow ObjectRef for Composition (#12592) Simon Mo 2021-01-18 15:26:35 -08:00
  • dc42abb2f5 [tune] placement group support (#13370) Kai Fricke 2021-01-18 20:58:57 +01:00
  • 1f00f834ac [RLlib] Solve PyTorch/TF-eager A3C async race condition between calling model and its value function. (#13467) Sven Mika 2021-01-18 19:29:03 +01:00
  • e3fc7729ac Fix fix_pending_workers Qing Wang 2021-01-19 01:24:01 +08:00
  • 516eb77080 [GCS] Remove task info publish as nowhere uses it (#13509) Tao Wang 2021-01-18 17:15:03 +08:00
  • 1e2adb335e [CI] Buildkite PR Environment for Simple Tests (#13130) Simon Mo 2021-01-18 00:44:24 -08:00
  • 3a0710130c [GCS]Only publish changed field when node dead (#13364) Tao Wang 2021-01-18 13:28:35 +08:00
  • 17ad2f64cf fix lint 12018 khu 2021-01-17 20:53:13 -08:00
  • 5251508548 fix test_error_isolation khu 2021-01-17 20:51:27 -08:00
  • 416b45e78e [core] remove pushing RayTaskError to driver khu 2021-01-17 16:48:09 -08:00
  • a4ebdbd7da Refactor node manager to eliminate new_scheduler_enabled_ (#12936) ZhuSenlin 2021-01-18 00:15:35 +08:00
  • 2cd51ce608 sync write internal config in gcs (#13197) ZhuSenlin 2021-01-17 12:00:01 +08:00
  • 8c8af2616e Minimal version of piping autoscaler events to driver logs (#13434) Eric Liang 2021-01-16 10:06:20 -08:00
  • 7e54911093 move message to debug (#13472) Dmitri Gekhtman 2021-01-16 10:04:41 -08:00
  • 86387504ee [tune] fix small docs typo (#13355) Richard Liaw 2021-01-16 00:49:17 -08:00
  • 1d3941e41a [Tests] Skip failing windows tests (#13495) Amog Kamsetty 2021-01-15 20:51:33 -08:00
  • 1179db1fc2 Remove an unnecessary file (#13499) SangBin Cho 2021-01-15 18:29:12 -08:00
  • ee6332dbb0 Bump dev branch to 2.0 to avoid endless version bump toil (#13497) Eric Liang 2021-01-15 17:41:17 -08:00
  • 68e3a0e0e1 [ray_client]: fix wrong reference in server_pickler (#13474) Barak Michener 2021-01-15 15:49:38 -08:00
  • d09df55b14 Update ID specification doc (#13356) SangBin Cho 2021-01-15 15:15:51 -08:00
  • 4aeb0ea550 Return version info from Ray client connect, to allow for discovering version mismatches Eric Liang 2021-01-15 14:27:26 -08:00
  • 7a0597d03f [CI] Fix Windows Bazel Upload (#13436) Simon Mo 2021-01-15 13:27:11 -08:00
  • 0ec9ddabc1 [docker/dashboard] Fix ray dashboard (#12899) Ian Rodney 2021-01-15 10:03:01 -08:00
  • dac8b3d58a [CI] Enable Dashboard tests for master (#13425) Simon Mo 2021-01-15 09:43:34 -08:00
  • f6d9996874 [Object Spilling] Dedup restore objects (#13470) SangBin Cho 2021-01-14 23:51:11 -08:00
  • ce1b208e41 [GCS]Remove unused class variable (#13454) fangfengbin 2021-01-15 14:48:18 +08:00
  • 84e110a949 [ray_client]: Support runtime_context as metadata (#13428) Barak Michener 2021-01-14 14:37:00 -08:00
  • 9a658b568f [Core] Ownership-based Object Directory: Consolidate location table and reference table. (#13220) Clark Zinzow 2021-01-14 14:48:10 -07:00
  • d1e9887be2 [Serialization] New custom serialization API (#13291) Siyuan (Ryans) Zhuang 2021-01-14 13:15:31 -08:00
  • 07e97fe4c2 [xgb] re-enable xgboost_ray tests (#13416) Amog Kamsetty 2021-01-14 13:14:44 -08:00
  • 7ba87b8abe Fix getting runtime context dict in driver (#13417) Edward Oakes 2021-01-14 14:41:53 -06:00
  • 411e37ce3f [serve] Properly obey SERVE_LOG_DEBUG=0 (#13460) Ian Rodney 2021-01-14 12:24:22 -08:00
  • 16e8c4a69f [Release] Fix Serve release test (#13303) Simon Mo 2021-01-14 12:23:53 -08:00
  • 321bbe1ffb [Dashboard] Fix GPU resource rendering issue (#13388) Simon Mo 2021-01-14 12:23:21 -08:00
  • e63da54931 [docs] Add more guideline on using ray in slurm cluster (#12819) PENG Zhenghao 2021-01-15 04:17:53 +08:00
  • d98235cc84 [RLlib] Deflake 2x remote & local inference tests (external env). (#13459) Sven Mika 2021-01-14 20:44:26 +01:00
  • c89ebdd94a [Core][CLI] ray status and ray memory no longer starts a new job (#13391) Micah Yong 2021-01-14 10:12:16 -08:00
  • 2d772a5a6d [kubernetes][minor] Operator garbage collection fix (#13392) Dmitri Gekhtman 2021-01-14 08:40:15 -08:00
  • 9c6d892eec [ray_client]: fix exceptions raised while executing on the server on behalf of the client (#13424) Barak Michener 2021-01-14 08:38:01 -08:00
  • 2f7ba25efb [joblib] joblib strikes again but this time on windows (#13212) Ameer Haj Ali 2021-01-14 18:36:52 +02:00
  • 4a6c53da46 [Core]Fix raylet scheduling bug (#13452) fangfengbin 2021-01-14 21:50:32 +08:00
  • 56878221ed [RLlib] Redo: Make TFModelV2 fully modular like TorchModelV2 (soft-deprecate register_variables, unify var names wrt torch). (#13363) Sven Mika 2021-01-14 14:44:33 +01:00
  • 33b092de28 [GCS]Add gcs resource scheduler (#13072) fangfengbin 2021-01-14 20:05:55 +08:00
  • b296642646 Fix linter error (#13451) Kai Fricke 2021-01-14 10:28:44 +01:00
  • 560299972c Revert "Enable Ray client server by default (#13350)" (#13429) Amog Kamsetty 2021-01-13 21:28:54 -08:00
  • 8697d67791 Fix raylet::MockWorker::GetProcess crashes (#13440) fyrestone 2021-01-14 12:19:21 +08:00
  • ad015cb7df Split out the part of get_node_ip_address for which the docstring is correct (#12796) dHannasch 2021-01-13 20:32:56 -07:00
  • 3f42e6bafe [Tune] Pin Transitive Dependencies (#13358) Amog Kamsetty 2021-01-13 19:10:21 -08:00
  • 062b7efc93 Remove unused handler methods (#13394) Tao Wang 2021-01-14 10:51:31 +08:00
  • 12e1175dd1 Revert "[Dashboard] Fix missing actor pid (#13229)" revert-13229-fix_missing_actor_pid Amog Kamsetty 2021-01-13 18:03:15 -08:00
  • 602c103eae Make request_resources() use internal kv instead of redis pub sub (#13410) Eric Liang 2021-01-13 17:30:43 -08:00
  • 9ef48b16b6 [serve] Pull out goal management logic into AsyncGoalManager class (#13341) Edward Oakes 2021-01-13 18:35:25 -06:00
  • c6fc7124d1 [tune] Fix f-string in error message (#13423) Edward Oakes 2021-01-13 18:34:21 -06:00
  • b257cb7d98 Add bazel logs upload to GHA (#13251) Simon Mo 2021-01-13 15:17:11 -08:00
  • 15501a4151 Fix Serve release test (#13385) Simon Mo 2021-01-13 15:06:23 -08:00
  • 1968b2f9d8 [autoscaler/k8s] [CI] Kubernetes test ray up, exec, down (#12514) Dmitri Gekhtman 2021-01-13 15:03:56 -08:00
  • 44acbdd82a [Serve] [Doc] Improve batching doc (#13389) Simon Mo 2021-01-13 14:39:42 -08:00
  • 6de5711690 Plumb retries update (#13411) Eric Liang 2021-01-13 13:49:57 -08:00
  • 8f48c64507 [ray_client]: Fix multiple attempts at checking connection (#13422) Barak Michener 2021-01-13 13:36:01 -08:00
  • 4853aa96cb [Dashboard] Fix missing actor pid (#13229) fyrestone 2021-01-13 16:45:12 +08:00
  • 0b22341bc9 [ray_client]: Wait for ready and retry on ray.connect() (#13376) Barak Michener 2021-01-13 00:19:15 -08:00
  • d49c3fae0b [RLlib] Trajectory View API: Atari framestacking. (#13315) Sven Mika 2021-01-13 08:53:34 +01:00
  • 912d0cbbf9 Enable Ray client server by default (#13350) Eric Liang 2021-01-12 21:31:01 -08:00
  • 8e0a2f669b [Doc] Remove trailing whitespaces (#13390) Simon Mo 2021-01-12 20:35:38 -08:00
  • f587b9a50c Remove unimplemented GetAll method in actor info accessor (#13362) Tao Wang 2021-01-13 09:55:27 +08:00
  • 0428537d0b [Object Spilling] Long running object spilling test (#13331) SangBin Cho 2021-01-12 16:53:13 -08:00
  • 4d83003992 trigger doc build for serve updates (#13373) Amog Kamsetty 2021-01-12 13:08:55 -08:00
  • 2e70743077 [Serve] Backend state unit tests (#13319) Ian Rodney 2021-01-12 12:54:04 -08:00
  • 3a3e4aed86 [RLlib] Add __len__() method to SampleBatch (#13371) Maltimore 2021-01-12 20:15:23 +01:00
  • e560933f9c [Serve] Add dependency management support for driver not running in a conda env (#13269) architkulkarni 2021-01-12 09:57:15 -08:00
  • 518427627b [tune] buffer trainable results (#13236) Kai Fricke 2021-01-12 18:52:47 +01:00
  • 9eebd090cf [Dependabot] [CI] Re-configure Dependabot and disable duplicate builds (#13359) Amog Kamsetty 2021-01-12 09:28:58 -08:00
  • 25f10a947a Revert "[RLlib] Make TFModelV2 behave more like TorchModelV2: Obsolete register_variables. Unify variable dicts. (#13339)" (#13361) Kai Fricke 2021-01-12 12:33:57 +01:00
  • 7166949194 [Kubernetes][Docs] GPU usage (#13325) Dmitri Gekhtman 2021-01-11 21:36:31 -08:00
  • a5ddc27bab Fix typo in Tune Docs (Checkpointing) (#13348) Edwin Goh 2021-01-11 23:27:18 -05:00
  • 470fda190a Forgot overwrite parameter in Ray client internal kv Eric Liang 2021-01-11 17:50:06 -08:00
  • 0452a3a435 [Tune] Rename MLFlow to MLflow (#13301) Amog Kamsetty 2021-01-11 17:36:55 -08:00
  • de5bc24c60 Implement internal kv in ray client (#13344) Eric Liang 2021-01-11 14:54:52 -08:00
  • fbb9795374 [client] Report number of currently active clients on connect (#13326) Eric Liang 2021-01-11 14:53:12 -08:00
  • e2b2abb88b [RLlib] Make TFModelV2 behave more like TorchModelV2: Obsolete register_variables. Unify variable dicts. (#13339) Sven Mika 2021-01-11 22:42:30 +01:00
  • c43fa12e73 [Serve] Support Starlette streaming response (#13328) architkulkarni 2021-01-11 13:27:44 -08:00