Commit Graph

  • f545418c3f [Dashboard] Fix dashboard regression caused by logCount and errCount being removed from worker payload (#11954) Max Fitton 2020-11-11 14:55:54 -08:00
  • 73a1cb702b Split _get_node_provider_cls off from _get_node_provider (#11949) Edward Oakes 2020-11-11 16:10:46 -06:00
  • 85197deece [autoscaler] Remove legacy autoscaler (#11802) Ameer Haj Ali 2020-11-11 23:36:48 +02:00
  • 72fc79740c [RLlib] Issue with pickle versions (breaks rollout test cases in RLlib). (#11939) Sven Mika 2020-11-11 21:52:21 +01:00
  • 396ae0b7c2 Add docstring for find_redis_address (#11884) dHannasch 2020-11-11 11:24:36 -07:00
  • 291c172d83 [RLlib] Support Simplex action spaces for SAC (torch and tf). (#11909) Sven Mika 2020-11-11 18:45:28 +01:00
  • 4735c032ed [Core] Fix C++ worker test (#11941) Kai Yang 2020-11-12 01:04:45 +08:00
  • 92286660e4 [Core] Lazy create node manager clients, and destroy then (#11928) Tao Wang 2020-11-12 00:51:40 +08:00
  • 7b8bd15702 [Stalebot] Fix issues. (#11930) SangBin Cho 2020-11-11 00:28:02 -08:00
  • b8dda0e3d0 [Serialization] Fix buffer alignment issues (#11888) Siyuan (Ryans) Zhuang 2020-11-10 23:44:16 -08:00
  • 1979ea9c0a fix disable javadoc lint (#11907) chaokunyang 2020-11-11 13:40:50 +08:00
  • 29cb32539e [Core] If failed to connect to redis, try to say why. (#11916) dHannasch 2020-11-10 19:22:10 -07:00
  • 433e4f32da [GCS]Reduce get operations of worker table (#11599) fangfengbin 2020-11-11 10:11:25 +08:00
  • 8afd2acdc1 [Autoscaler] simulator placement groups (#11777) Alex Wu 2020-11-10 18:10:36 -08:00
  • 46f3652102 Remove repeat push timeout from object manager (#11874) Eric Liang 2020-11-10 16:26:53 -08:00
  • 0c1bdaef59 [tune] TensorFlow Distributed Trainable (#11876) Keqiu Hu 2020-11-10 14:59:08 -08:00
  • 50dbf1a307 [core] Support configurable number of "check for redis" attempts (#11902) Richard Liaw 2020-11-10 14:57:57 -08:00
  • 1d158dda32 [serve] Rename to use replicas, not workers (#11822) Ian Rodney 2020-11-10 11:36:15 -08:00
  • 9b8218aabd [docs] Move all /latest links to /master (#11897) Eric Liang 2020-11-10 10:53:28 -08:00
  • 543f7809a6 [GCS]Add gcs dump log(Part1) (#11727) fangfengbin 2020-11-10 14:10:03 +08:00
  • aba9288615 [Autoscaler] Introduce callback system (#11674) Nikita Vemuri 2020-11-09 20:03:15 -08:00
  • ee2da0cf45 [Core] PushManager for reliable broadcast (#11869) Eric Liang 2020-11-09 18:01:47 -08:00
  • 1999266bba Updated pettingzoo env to acomidate api changes and fixes (#11873) Benjamin Black 2020-11-09 19:09:49 -05:00
  • a9cf0141a0 [autoscaler] Fix semantics of request_resources (#11820) Eric Liang 2020-11-09 14:57:40 -08:00
  • 1c132f2ff8 [serve] Improve DEBUG logging for understanding perf (#11838) Edward Oakes 2020-11-09 14:10:42 -06:00
  • adcaabcd64 [Serve] Reconfigure backend class at runtime (#11709) architkulkarni 2020-11-09 12:04:51 -08:00
  • 287aba6dc3 [tune] schedulers: Add test for context finalization (#11889) Kai Fricke 2020-11-09 20:37:05 +01:00
  • a09e49ee94 [core] Add retry for reading session name (#11844) Richard Liaw 2020-11-09 11:22:50 -08:00
  • 88be1ea20b [tune] Handle infinite and NaN values (#11835) Kai Fricke 2020-11-09 20:18:31 +01:00
  • 904f48ebd9 [Core] Multi-tenancy: Pass job ID from Raylet to worker via env variable (#11829) Kai Yang 2020-11-10 03:02:15 +08:00
  • 77e3163630 [GCS]Only pass node id to node failure detector (#11886) Tao Wang 2020-11-10 02:52:33 +08:00
  • 368b14a0da Stop dashboard from erroring when an actor does not have a corresponding core worker (#11870) Max Fitton 2020-11-09 09:36:34 -08:00
  • 2feba4409c [serve] Fix long running failure test (#11805) Edward Oakes 2020-11-09 11:21:03 -06:00
  • 407a212816 [GCS]Fix TestActorTableResubscribe bug (#11830) fangfengbin 2020-11-09 15:50:05 +08:00
  • 64ca30c060 [doc] Troubleshooting --dashboard-port (#11816) dHannasch 2020-11-08 16:53:50 -07:00
  • 0932320eb3 Move test_joblib back to new_scheduler_broken category (#11872) Eric Liang 2020-11-07 20:08:41 -08:00
  • 61e41257e7 [Object spilling] Queue failed object creation requests until objects have been spilled (#11796) Stephanie Wang 2020-11-06 18:22:19 -05:00
  • 2a86943b13 [Test] Ignore setproctitle for local mode (#11819) ray-1.0.1 Kai Yang 2020-11-06 03:07:34 +08:00
  • 6e6cb6b1d0 [Core] Fix ray start failure to due to bug of redis address detection (#11735) Kai Yang 2020-11-05 04:04:44 +08:00
  • 8f3c315a99 [Metrics] Implement basic metrics changes (#11769) SangBin Cho 2020-11-05 11:07:05 -08:00
  • 900a48c19c [Tune] Better warnings/exceptions for fail_fast='raise' (#11842) Amog Kamsetty 2020-11-06 15:01:55 -08:00
  • 045fed5cd2 [examples] comment out rsync_ settings for K8S (#11862) Aaron Miller 2020-11-06 14:35:21 -08:00
  • e0ecf5d79d Revert "[GCS]Open light heartbeat by default (#11689)" (#11861) SangBin Cho 2020-11-06 14:34:59 -08:00
  • 871cde989a Re-Revert: [Serialization] Update CloudPickle to 1.6.0 (#9694) (#11837) Simon Mo 2020-11-06 12:24:36 -08:00
  • c5e6c90e1e [Core] Add name of actor in the result of ray.actors() (#11828) Kishan Sagathiya 2020-11-07 00:15:44 +05:30
  • 12ae0f20c6 [Metrics] Fix prometheus configuration doc (#11856) bermaker 2020-11-07 02:34:33 +08:00
  • 6b7a4dfaa0 [rllib] Forgot to pass ioctx to child json readers (#11839) Eric Liang 2020-11-05 22:07:57 -08:00
  • 28e7439cf0 [doc] Add documentation for Ray debugger (#11815) Philipp Moritz 2020-11-05 16:25:27 -08:00
  • 27c810a97e Basic protos for ray client (#11762) Barak Michener 2020-11-05 16:23:54 -08:00
  • f86c4f992c Fix RAY_ENABLE_NEW_SCHEDULER=1 pytest test_advanced_2.py::test_zero_cpus_actor (#11817) Eric Liang 2020-11-05 16:02:04 -08:00
  • 347e871409 [Serve] Add dependency management (#11743) architkulkarni 2020-11-05 14:39:37 -08:00
  • 1c0a52d0df rllib regression results Alex Wu 2020-11-05 14:10:17 -08:00
  • ffc267f94b [Test] Ignore setproctitle for local mode (#11819) Kai Yang 2020-11-06 03:07:34 +08:00
  • 3cd1d7f44a [Metrics] Implement basic metrics changes (#11769) SangBin Cho 2020-11-05 11:07:05 -08:00
  • 049df70289 [OSS] Introduce Stale bot (#11790) SangBin Cho 2020-11-05 11:02:37 -08:00
  • 603accf1c2 [tune] logger refactor part 3: Add ExperimentLogger class (#11749) Kai Fricke 2020-11-05 17:55:38 +01:00
  • f6717b8b03 [autoscaler] Support empty node list for kill node (#11810) Richard Liaw 2020-11-04 22:40:07 -08:00
  • d0f3befd9c Add --redis-shard-ports to the list of ports that need to be open on the head node. (#11808) dHannasch 2020-11-04 22:26:09 -07:00
  • efa07d5403 Revert "Revert "[tune] PB2 (#11466)" (#11795)" (#11812) Richard Liaw 2020-11-04 20:47:12 -08:00
  • 612ddb2dd1 [GCS]Open light heartbeat by default (#11689) Tao Wang 2020-11-05 12:11:00 +08:00
  • 50110b934c [Placement Group]Enhance create placement group java api (#11702) DK.Pino 2020-11-05 09:59:36 +08:00
  • 69145d6215 [hotfix] Bazel candidates not found due to raising too early Eric Liang 2020-11-04 16:08:51 -08:00
  • 22bbbc3171 [wheel] Fix Manylinux2014 Build (#11811) Ian Rodney 2020-11-04 14:50:38 -08:00
  • 92718de40c [SGD] Better support for custom DDP (#11771) Amog Kamsetty 2020-11-04 13:58:51 -08:00
  • 6147b6a1a3 [docs] Note that the printed IP address can be incorrect. (#11804) dHannasch 2020-11-04 14:48:03 -07:00
  • ebdf8ba3fa [autoscaler] Support legacy cluster configs with the new resource demand scheduler (#11751) Ameer Haj Ali 2020-11-04 22:05:48 +02:00
  • 31598338b3 [Core] Fix ray start failure to due to bug of redis address detection (#11735) Kai Yang 2020-11-05 04:04:44 +08:00
  • 53aac55739 [autoscaler] Autoscaler simulator (#11690) Alex Wu 2020-11-04 12:04:11 -08:00
  • d6c7c7c675 [RLlib] Make sure, DQN torch actions are of type=long before torch.nn.functional.one_hot() op. (#11800) Sven Mika 2020-11-04 18:04:03 +01:00
  • 9073e6507c WIP: Update to support the Food Collector environment (#11373) heng2j 2020-11-04 06:29:16 -05:00
  • 66605cfcbd [RLLib] Random Parametric Trainer (#11366) Pierre TASSEL 2020-11-04 11:12:51 +01:00
  • 4518fe790f [RLLIB] Convert torch state arrays to tensors during compute log likelihoods (#11708) mvindiola1 2020-11-04 03:33:56 -05:00
  • b7531fb4f5 [redis-py] change redis-py deprecated hmset usage to hset (#11776) Akash Patel 2020-11-04 06:23:02 +00:00
  • 7248d5f4ae Revert "[tune] PB2 (#11466)" (#11795) Amog Kamsetty 2020-11-03 21:05:00 -08:00
  • 007634fd1b [tune] logger refactor part 2: Add SyncerCallback (#11748) Kai Fricke 2020-11-04 06:04:40 +01:00
  • 4433015295 Release testing things Alex Wu 2020-11-03 19:37:19 -08:00
  • 05c4e3fb2a [build] Build wheels with manylinux2014 (#11621) Barak Michener 2020-11-03 19:36:32 -08:00
  • 9527220a86 [serve] Fix Controller Crashes on Win (#11792) Ian Rodney 2020-11-03 16:54:16 -08:00
  • 2ef707e440 Update advanced.rst (#11793) architkulkarni 2020-11-03 16:16:36 -08:00
  • 5b788ccb13 [RLlib] Trajectory view API (prep PR for switching on by default across all RLlib; plumbing only) (#11717) Sven Mika 2020-11-03 21:53:34 +01:00
  • c3074f559c [serve] Split out metadata for checkpointing (#11533) Ian Rodney 2020-11-03 12:41:24 -08:00
  • 39ce0eadbe Ray PDB support (#11739) Philipp Moritz 2020-11-03 09:49:23 -08:00
  • 952b71dc94 Fix windows build (#11786) Stephanie Wang 2020-11-03 12:38:45 -05:00
  • d352feadf0 [Dashboard] Memory Page Loading Wheel (#11651) Max Fitton 2020-11-03 09:37:30 -08:00
  • 08e0e8311a [autoscaler] Fixing AWS instance types autofill (#11758) Ameer Haj Ali 2020-11-03 19:34:14 +02:00
  • f7b19c41e3 [tune] logger refactor part 1: move classes and utilities to own files (#11746) Kai Fricke 2020-11-03 16:48:09 +01:00
  • 5af745c90d [RLlib] Implement the SlateQ algorithm (#11450) desktable 2020-11-03 00:52:04 -08:00
  • e735add268 [RLlib] Integration with SUMO Simulator (#11710) Lara Codeca 2020-11-03 08:45:03 +00:00
  • 0a6d24a727 [cli] Remove the deprecated old_style logging calls (#10776) Maksim Smolin 2020-11-02 23:40:18 -08:00
  • e7f7cb29c4 [docs] Show expected terminal output for manual cluster setup (#11752) dHannasch 2020-11-02 21:59:14 -07:00
  • 4d272dd35b [docker] Disable Readme push to avoid errors (#11770) Ian Rodney 2020-11-02 19:12:51 -08:00
  • 0de1776e1e [docker] Push to DockerHub in CI (#11442) Ian Rodney 2020-10-23 12:02:15 -07:00
  • 1f2ad54294 [Placement Group] Placement group automatic cleanup. (#11546) SangBin Cho 2020-10-30 10:55:43 -07:00
  • 6e89702508 [docker] Disable Readme push to avoid errors (#11770) Ian Rodney 2020-11-02 19:12:51 -08:00
  • 3202ff74c2 [Dashboard] Don't show GPU columns if no GPU in cluster (#11704) Max Fitton 2020-11-02 16:07:27 -08:00
  • 0ba777af99 [Object spilling] Add policy to automatically spill objects on OutOfMemory (#11673) Stephanie Wang 2020-11-02 15:42:02 -05:00
  • 8d74a04a42 [autoscaler] Flag flip for resource_demand_scheduler should take into account queue (#11615) Ameer Haj Ali 2020-11-02 22:41:22 +02:00
  • ffeaae9f8e [GCS]Decouple node failure detector with resoure related operations (#11465) Tao Wang 2020-10-28 06:52:42 +08:00
  • a18e84e338 [docker] Fix docker regex (#11726) Alex Wu 2020-11-02 11:23:06 -08:00
  • 10c2089061 [Hotfix] Pin Pydantic Version (#11622) Simon Mo 2020-10-26 16:52:19 -07:00