Commit Graph

  • e72abcd0aa Enable even more new scheduler tests (#12096) Eric Liang 2020-11-19 16:47:18 -08:00
  • dac09bd569 Fix actor_registry_ copied on each heartbeat; Improve receive object chunk debug messages (#12187) Eric Liang 2020-11-19 16:45:37 -08:00
  • 7bf5145d36 Lint plasma source files (#12171) Stephanie Wang 2020-11-19 19:08:18 -05:00
  • dfc796b8ec Add gdb stack dump command to docs (#12147) Eric Liang 2020-11-19 16:02:11 -08:00
  • 67544992b5 Remove the old operator directory (#12143) Eric Liang 2020-11-19 15:37:28 -08:00
  • d07ffc152b [rllib] Rrk/12079 custom filters (#12095) Raoul Khouri 2020-11-19 16:20:20 -05:00
  • f1ace386db [tune] detect docker and kubernetes syncers (#12108) Kai Fricke 2020-11-19 21:17:17 +01:00
  • de86d5aff7 ActorStatisticalData() debug metrics bog down raylet with 100% CPU (#12148) Eric Liang 2020-11-19 11:38:44 -08:00
  • 5fb410cfbf [Dashboard] New dashboard view data doesn't exist. (#12129) SangBin Cho 2020-11-19 11:04:59 -08:00
  • 7d67af6c2a [Metrics] Add stats to measure process startup time + scheduling stats. (#12100) SangBin Cho 2020-11-19 11:04:26 -08:00
  • 7fcce785ed [hotfix] Fix windows build (#12146) Ian Rodney 2020-11-19 11:00:19 -08:00
  • 96c1caccaf Use Ubuntu 18.04 so that a newer version of Docker will be available by default. (#12139) dHannasch 2020-11-19 11:40:07 -07:00
  • 6999075c75 [tune] Add seed parameter to BOHB (#12160) Kai Fricke 2020-11-19 19:27:16 +01:00
  • dab241dcc6 [RLlib] Fix inconsistency wrt batch size in SampleCollector (traj. view API). Makes DD-PPO work with traj. view API. (#12063) Sven Mika 2020-11-19 19:01:14 +01:00
  • ff82af1588 Clean up requirements.txt (#12136) Philipp Moritz 2020-11-19 09:27:09 -08:00
  • 9481ecd180 [data] MLDataset based on ParallelIterator (#11849) Xianyang Liu 2020-11-19 16:33:37 +08:00
  • 2fe1321c3f [ray_client] __getattr__ for the API Import interface (#12089) Barak Michener 2020-11-18 22:42:02 -08:00
  • a74f1885db Revert "[CLI] Fix ray commands when RAY_ADDRESS used (#11989)" (#12135) Ian Rodney 2020-11-18 22:41:10 -08:00
  • 5bc4976550 More informative error message if ray start fails to connect to Redis (#11880) dHannasch 2020-11-18 20:28:10 -07:00
  • 0d388c4d31 [autoscaler] remove unnecessary print output (#12131) Richard Liaw 2020-11-18 18:33:48 -08:00
  • 4490de356d Fix issues in release process doc (#12130) Eric Liang 2020-11-18 18:32:27 -08:00
  • 2bb6db5e64 [tune] temporary revert of verbosity changes (#12132) Richard Liaw 2020-11-18 18:27:41 -08:00
  • 4717fcd9c0 [autoscaler] give max_workers precedence over min_workers in resource demand scheduler (#12106) Ameer Haj Ali 2020-11-19 02:24:48 +02:00
  • d826452e0b [autoscaler] fix max_workers bug in resource_demand_scheduler by counting the head node (#12123) Ameer Haj Ali 2020-11-19 01:24:38 +02:00
  • e086ddc18f [core] Add Recursive task cancelation (#11923) Ian Rodney 2020-11-18 15:18:40 -08:00
  • e2a147d5fb [docs] Remove DL AMi reference (#12120) Ian Rodney 2020-11-18 12:40:19 -08:00
  • 8f2b447ba4 [docker pipeline] Base-Deps, Dataclasses & Releases (#12119) Ian Rodney 2020-11-18 12:34:04 -08:00
  • b343db9ad5 [docker] Modify script to allow for arbitrary name changes (#12092) Ian Rodney 2020-11-18 12:14:44 -08:00
  • 66c30e3ad0 Set version to 1.0.1.post1 ray-1.0.1.post1 releases/1.0.1.post1 Edward Oakes 2020-11-18 14:07:43 -06:00
  • 4b5769dab2 1.0.1 release logs (#12127) Alex Wu 2020-11-18 12:04:05 -08:00
  • e9c9ba9c9f [New Scheduler] Don't start tasks if the owner is dead (#12050) Alex Wu 2020-11-18 11:34:19 -08:00
  • eef624750c [ray client] ray wait() implementation (#12072) Ameer Haj Ali 2020-11-18 21:33:57 +02:00
  • 9ba8f72ff1 [autoscaler] Add the cluster_name to docker file mounts directory prefix to make it more unique (#11600) Ameer Haj Ali 2020-10-28 00:33:11 +02:00
  • 2b60c5774b [tune] cache checkpoint serialization (#12064) Kai Fricke 2020-11-18 18:03:53 +01:00
  • 6da4342822 [RLlib] Add on_learn_on_batch (Policy) callback to DefaultCallbacks. (#12070) Sven Mika 2020-11-18 15:39:23 +01:00
  • b41f4fdec2 Extract the connection logic to reduce duplication. (#12016) dHannasch 2020-11-18 01:12:58 -07:00
  • d23d326560 [CLI] Fix ray commands when RAY_ADDRESS used (#11989) Ian Rodney 2020-11-17 23:44:59 -08:00
  • d87af0da88 [PlacementGroup]Add gcs placement group manager debug info (#12061) fangfengbin 2020-11-18 11:15:38 +08:00
  • b96516e9d3 [core] Remove google dependency (#12085) Philipp Moritz 2020-11-17 19:01:00 -08:00
  • f400333841 [Placement Group]Placement Group supports gcs failover(Part2) (#12003) fangfengbin 2020-11-18 10:59:26 +08:00
  • c476037c97 [Core] Async API should raise on all RayError (#12043) Simon Mo 2020-11-17 17:20:30 -08:00
  • e8c018e8fc [C++ API] tests for the C++ API. (#12076) Ameer Haj Ali 2020-11-18 03:07:52 +02:00
  • f6bdd5ab17 [New Scheduler] Spillback from the queue of tasks assigned to the local node (#12084) Stephanie Wang 2020-11-17 19:13:59 -05:00
  • b5dfdb2a21 Log the Redis shard addresses as originally received from the head GCS. (#12011) dHannasch 2020-11-17 14:11:17 -07:00
  • 010e6cef3f Allow setting the RAY_BACKEND_LOG_LEVEL to trace. (#12012) dHannasch 2020-11-17 14:10:23 -07:00
  • f0dcf01807 Clarify that Ray is not yet retrying to connect. (#12013) dHannasch 2020-11-17 14:01:42 -07:00
  • ca44222e03 [minor] log info instead of error upon ray.init rerun (#12025) Richard Liaw 2020-11-17 12:59:24 -08:00
  • 7f050c706b [PlacementGroup]Skip flaky testcase (#12065) fangfengbin 2020-11-18 04:21:34 +08:00
  • bcc92f59fd [Dashboard] Patch issue in 1.0.1 release where worker stats are not present for a node (#12062) Max Fitton 2020-11-17 10:54:57 -08:00
  • d7c95a4a90 [Serve] Rewrite Router to be Embeddable (#12019) Simon Mo 2020-11-17 08:28:18 -08:00
  • 23926f3e6e [CLI] Docker Support (#11761) Maksim Smolin 2020-11-17 00:04:39 -08:00
  • bea0031491 fix linux wheel build (#9896) chaokunyang 2020-11-17 15:49:42 +08:00
  • 09d6ea5784 Clarify official releases vs nightly wheels Eric Liang 2020-11-16 23:30:40 -08:00
  • f10cef93c7 [sgd] support operator.device (#12056) Amog Kamsetty 2020-11-16 21:44:27 -08:00
  • 380df89069 Lazily initialize the global state accessor in Python workers (#12054) Eric Liang 2020-11-16 21:35:12 -08:00
  • 0f9e2fec12 [Placement Group] Add get / get all / remove interface for Placement Group Java api. (#11821) DK.Pino 2020-11-17 12:32:39 +08:00
  • 90574b66cc pin aiohttp to the 3.x.x version (#12051) Max Fitton 2020-11-16 18:54:16 -08:00
  • 51d277f2e4 [tests] fix mock for test_cli (#12055) Richard Liaw 2020-11-16 18:44:15 -08:00
  • d525e61288 [GCS]Open light heartbeat by default (#11968) Tao Wang 2020-11-17 10:21:47 +08:00
  • c49554fb7a Abstract plasma store creation request queue (#12039) Stephanie Wang 2020-11-16 20:09:15 -05:00
  • 9f5986ee58 [tune] logger migration to ExperimentLogger classes (#11984) Kai Fricke 2020-11-17 00:08:37 +01:00
  • 3dc68533a9 make some private rsync, and exec_cluster arguments public (#11958) Alan Guo 2020-11-16 14:31:41 -08:00
  • df2c2a7ce5 [cpp worker] support pass by reference on cluster mode (#11753) SongGuyang 2020-11-17 06:30:35 +08:00
  • 8d599bb3f5 [autoscaler] Move fill out resources to bootstrap config to cache the resources and avoid expensive boto3 calls (#12028) Ameer Haj Ali 2020-11-16 23:28:57 +02:00
  • 0c6bb745cd Fix dashboard agent use incorrect ip (#12038) fyrestone 2020-11-17 04:02:20 +08:00
  • f56d7c1a76 [Logging] Remove per worker job log file / support worker log rotation (#11927) SangBin Cho 2020-11-16 11:29:43 -08:00
  • b6b54f1c81 [RLlib] Trajectory view API: enable by default for SAC, DDPG, DQN, SimpleQ (#11827) Sven Mika 2020-11-16 19:54:35 +01:00
  • 8609e2dd90 [tune] refactor verbosity levels (#11767) Kai Fricke 2020-11-16 19:32:53 +01:00
  • a50128079d [tune/placement group] dist. training placement group support (#11934) Keqiu Hu 2020-11-16 01:11:39 -08:00
  • 8fb926565c [Placement Group]Placement Group supports gcs failover (Part1) (#11933) fangfengbin 2020-11-16 14:42:56 +08:00
  • d35de2272d [Core] Allow redis.ResponseError instead of redis.AuthenticationError (#12024) dHannasch 2020-11-15 16:04:56 -07:00
  • ac9610b19d [Autoscaler] Precisely match docker HOME (#12020) Simon Mo 2020-11-15 11:49:50 -08:00
  • 8b3f79f307 [tune] refactor and add examples (#11931) Richard Liaw 2020-11-14 20:43:28 -08:00
  • 5891759a3e Clarify get_node_ip_address docstring (#11881) dHannasch 2020-11-14 16:20:58 -07:00
  • 9fbeefd604 Distinguish a bad --redis-password from any other Redis error (#11893) dHannasch 2020-11-13 16:39:44 -07:00
  • 4f5d6274af [docs] Add links to Ray design patterns whitepaper (#12014) Eric Liang 2020-11-13 14:16:51 -08:00
  • 8bcb0bddc9 [serve] Fix API calls in global README (#12015) Edward Oakes 2020-11-13 16:05:00 -06:00
  • effa553077 [Doc] Explain how to know whether RAY_BACKEND_LOG_LEVEL worked (#12010) dHannasch 2020-11-13 15:02:57 -07:00
  • 277558895d [Serve] Introduce Long Polling (#11905) Simon Mo 2020-11-13 13:17:20 -08:00
  • 00ef1179c0 [object spilling] Autocreate dir if not exists (#11999) Eric Liang 2020-11-13 12:13:06 -08:00
  • f936ea35fe [hotfix] Fix ResourceDemandScheduler (#11996) Ian Rodney 2020-11-13 00:42:16 -08:00
  • f6f9b15299 . (#11998) SangBin Cho 2020-11-12 21:33:00 -08:00
  • 3b56a1a522 [docker] auto-populate shared memory size (#11953) Ian Rodney 2020-11-12 17:22:42 -08:00
  • 59bc1e6c09 [RLLib] MAML extension for all models except RNNs (#11337) Michael Luo 2020-11-12 16:51:40 -08:00
  • 272edcca94 [ray_client]: Implement function calls (#11922) Barak Michener 2020-11-12 16:49:34 -08:00
  • a6a8e777f3 [autoscaler] Interpret autoscaling_speed as 1/x-1 of previous target util fraction (#11961) Eric Liang 2020-11-12 16:23:50 -08:00
  • 0bd69edd71 [RLlib] Trajectory view API: enable by default for ES and ARS (#11826) Sven Mika 2020-11-12 19:33:10 +01:00
  • 6e6c680f14 MBMPO Cartpole (#11832) Michael Luo 2020-11-12 10:30:41 -08:00
  • 9254de0b02 [autoscaler] Fix custom node resources on head (#11896) Ian Rodney 2020-11-12 10:30:04 -08:00
  • ad639f12d8 [autoscaler/k8s] Preliminary k8s operator (#11929) Gekho457 2020-11-12 12:58:02 -05:00
  • 4744ed01f7 Queueing non-actor tasks at the workers (#11051) Gabriele Oliaro 2020-11-12 12:44:13 -05:00
  • 02c02369ca [tune] Fix hpo randint limits (#11946) Kai Fricke 2020-11-12 17:45:49 +01:00
  • 07f401d99d [tune] Fix unflatten dict (#11948) Kristian Hartikainen 2020-11-12 16:43:15 +00:00
  • 9920933e31 [docker] Support non-root container (#11407) Lee moon soo 2020-11-12 08:41:50 -08:00
  • 62c7ab5182 [RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C (tf and torch). (#11747) Sven Mika 2020-11-12 16:27:34 +01:00
  • 59ccbc0fc7 [RLlib] Model Annotations: Tensorflow (#11964) Michael Luo 2020-11-12 03:18:50 -08:00
  • b2984d1c34 [RLlib] Model Annotations to Torch Models (#9749) Michael Luo 2020-11-12 03:16:12 -08:00
  • 3fbd8be851 [Placement Group]Do not really subtract resources, just count (#11894) Tao Wang 2020-11-12 16:01:19 +08:00
  • f80d812799 [Object Spilling] Introduce SpillWorker & RestoreWorker Pool to avoid IO worker deadlock. (#11885) SangBin Cho 2020-11-11 18:20:14 -08:00
  • de6df51bd2 [redis, docs]: Bump redis and docs/Pillow dependencies (#11371) Barak Michener 2020-11-11 18:15:27 -08:00