Commit Graph

  • a62c5f40f6 [rllib] Document ModelV2 and clean up the models/ directory (#5277) Eric Liang 2019-07-27 02:08:16 -07:00
  • 9c00616cdc Retry and exception for hang on memory store full (#5143) Richard Liaw 2019-07-27 01:20:13 -07:00
  • 5e15b36d6e [tune] experiment_analysis split to Analysis (#5115) Richard Liaw 2019-07-27 01:10:52 -07:00
  • 7e715520e5 [sgd] Example for Training (#5292) Richard Liaw 2019-07-27 01:10:25 -07:00
  • 06fec63c87 [autoscaler] Add a 'request_cores' function for manual autoscaling (#4754) Daniel Edgecumbe 2019-07-27 01:14:45 +01:00
  • d9e81da3b8 [tune] configurable maximum length of trial identifier (#5287) lanlin 2019-07-27 08:09:54 +08:00
  • 6f737e6a50 Add CODEOWNERS file (#5259) Hao Chen 2019-07-26 12:40:07 +08:00
  • 827618254a [rllib] Configure learner queue timeout (#5270) Antoine Galataud 2019-07-26 06:18:05 +02:00
  • 6f682db99d avoid copying ActorTableData when NodeMananger updates an actor to GCS (#5244) micafan 2019-07-26 11:17:24 +08:00
  • 3321555975 Increase timeout for ray.wait test (#5273) Stephanie Wang 2019-07-25 14:23:46 -07:00
  • bf9199ad77 [rllib] ModelV2 support for pytorch (#5249) Eric Liang 2019-07-25 11:02:53 -07:00
  • 40395acadf [gRPC] Migrate raylet client implementation to grpc (#5120) Joey Jiang 2019-07-25 14:48:56 +08:00
  • 60f59639c1 [rllib] Port DDPG to the build_tf_policy pattern (#5242) Eric Liang 2019-07-24 13:55:55 -07:00
  • 690b374581 [rllib] Add Keras LSTM example with ModelV2 (#5258) Eric Liang 2019-07-24 13:09:41 -07:00
  • 5b76238bce Fix two types of eviction hangs (#5225) Eric Liang 2019-07-23 21:20:17 -07:00
  • 97c43284a6 [rllib] Fix trainer state restore (#5257) Eric Liang 2019-07-23 21:18:58 -07:00
  • 9c651f47bb Add regression test for actor load balancing (#5224) Stephanie Wang 2019-07-23 15:11:55 -07:00
  • 15959b0f0d Leave ray.wait calls open until the task or actor exits (#5234) Stephanie Wang 2019-07-23 11:55:28 -07:00
  • a3d4f9f16d Fix the issue when passing multiple options in one string (#5241) Qing Wang 2019-07-23 12:28:54 +08:00
  • fc589050c9 [sgd] Deprecate old distributed SGD implementation (#5160) Peter Schafhalter 2019-07-22 15:47:10 -07:00
  • 80b976efcb Ray namespace added for k8s (#4111) Vince Jankovics 2019-07-22 23:45:05 +01:00
  • 7fc15dbf7f [autoscaler] Clean up error messages on setup failure (#5210) Richard Liaw 2019-07-22 11:27:51 -07:00
  • 53fb876a5f Improved KeyboardInterrupt Exception Handling (#5237) Richard Liaw 2019-07-22 02:29:56 -07:00
  • f9043cc49a [rllib] Remove experimental eager support Eric Liang 2019-07-21 12:27:17 -07:00
  • b0c0de49a2 [tune] Fixup exception messages (#5238) Richard Liaw 2019-07-20 22:36:27 -07:00
  • d58b986858 [rllib] MultiCategorical shouldn't return array for kl or entropy (#5215) Eric Liang 2019-07-19 12:12:04 -07:00
  • da7676c925 Removed the implicit sync barrier at the end of each training iteration (#5217) Jones Wong 2019-07-19 13:59:52 +08:00
  • 28e5c5555d [rllib] Move some inline defs to avoid deserialization errors (#5228) Eric Liang 2019-07-18 21:01:16 -07:00
  • aa42328874 [direct call] add local plasma provider (#5184) Zhijun Fu 2019-07-19 11:29:12 +08:00
  • b5b8c1d361 [GCS] introduce new gcs client and refactor actor table (#5058) micafan 2019-07-19 11:28:34 +08:00
  • 0af07bd493 Enable seeding actors for reproducible experiments (#5197) Jones Wong 2019-07-18 14:31:34 +08:00
  • 63f49f95dd Improve memory check (#5216) Qingqing Mao 2019-07-17 23:30:02 -07:00
  • 81d297f87e Remove redundant scaler of l2 reg (#5172) Jones Wong 2019-07-18 06:11:27 +08:00
  • ae03c42dd6 Fixed inconsistent action placeholder (#5213) Jones Wong 2019-07-18 01:55:14 +08:00
  • 214f09d969 [rllib] Make RLLib handle zero-length observation arrays (#5208) Sam Toyer 2019-07-16 22:37:57 -07:00
  • 3e0ad11ae0 Add heartbeat test + Fix monitor.py (#5191) Richard Liaw 2019-07-16 21:59:48 -07:00
  • 4fa2a6006c [rllib] Remove nested import (#5204) Eric Liang 2019-07-16 10:52:56 -07:00
  • 047f4ccd61 [rllib] Fix rollout.py with tuple action space (#5201) Eric Liang 2019-07-16 10:52:35 -07:00
  • 806524384b [Java worker] Refactor object store and worker context on top of core worker (#5079) Kai Yang 2019-07-16 20:58:02 +08:00
  • e5be5fd46d Remove dependencies from TaskExecutionSpecification (#5166) Edward Oakes 2019-07-15 18:15:21 -07:00
  • fd71ffde2f Improve release process 0.7.2 (#5187) Simon Mo 2019-07-15 14:46:54 -07:00
  • ea6aa6409a Reconstruct failed actors without sending tasks. (#5161) Hao Chen 2019-07-16 01:25:09 +08:00
  • 7342117710 Fix a multithreading bug in grpc ClientCall (#5196) Hao Chen 2019-07-15 14:49:53 +08:00
  • 5b13a7eb90 Keep parameter space noise consistent with action space noise (Fix 5173) (#5193) Jones Wong 2019-07-15 03:20:35 +08:00
  • 322b5166ad Update arrow to include user defined status for plasma (#5156) Philipp Moritz 2019-07-12 22:51:14 -07:00
  • f5a87b88a3 Fix: ServerCallFactory's destructor not marked as virtual (#5185) Hao Chen 2019-07-13 09:38:47 +08:00
  • b6509f46b0 Update wheels to 0.8.0dev2 (#5186) Richard Liaw 2019-07-12 17:27:03 -07:00
  • 1530389822 [tune] Fast Node Recovery (#5053) Richard Liaw 2019-07-12 13:47:30 -07:00
  • 0ec3a16bbd Fix Java MultithreadingTest (#5182) Hao Chen 2019-07-12 19:00:13 +08:00
  • f46c555e9e Only get actor ID if actor task (#5180) Stephanie Wang 2019-07-11 23:31:21 -07:00
  • 3b42d5ccb1 Track newly created actor's parent actor (#5098) vipulharsh 2019-07-11 14:52:04 -07:00
  • 3456afdea7 [autoscaler] Fix missing body argument in GCP getIamPolicy #5169 Kristian Hartikainen 2019-07-11 13:03:51 -07:00
  • ccee77aafd fix node_failures.py (#5167) Philipp Moritz 2019-07-11 11:40:13 -07:00
  • 1649f1370e [direct call] changes raylet to push tasks to worker (#5140) Zhijun Fu 2019-07-12 02:01:32 +08:00
  • fd835d107e Move task to common module and add checks in getter methods (#5147) Hao Chen 2019-07-11 17:07:04 +08:00
  • d8b50a5018 Fix GcsClient resource map (#5171) Kai Yang 2019-07-11 16:05:12 +08:00
  • f2293243cc [ID Refactor] Shorten the length of JobID to 4 bytes (#5110) Qing Wang 2019-07-11 14:25:16 +08:00
  • 88365d4112 Fix Java MultithreadingTest (#5170) Hao Chen 2019-07-11 13:40:40 +08:00
  • 43b6513d19 [GCS] Move node resource info from client table to resource table (#5050) Kai Yang 2019-07-11 13:17:19 +08:00
  • 691c9733f9 [tune] Document trainable attributes and enable user-checkpoint… (#4868) Richard Liaw 2019-07-10 18:51:11 -07:00
  • e6a81d40a5 [stability] Make task result for RemoveTask optional (#5146) Philipp Moritz 2019-07-10 13:33:41 -07:00
  • 0c34749779 Use bazel disk cache for all CI jobs (#5144) Hao Chen 2019-07-10 22:03:45 +08:00
  • 0b540ab492 [tune] Test example checkpointing (#4728) Richard Liaw 2019-07-10 01:58:26 -07:00
  • e55c8ca165 Fix crash because of the reference to deleted variable in grpc server call (#5158) Joey Jiang 2019-07-10 14:06:21 +08:00
  • 2b7b7c7547 Add linting pre-push hook (#5154) Edward Oakes 2019-07-09 21:49:12 -07:00
  • 5ab5017c67 [rllib] Fix impala stress test (#5101) Eric Liang 2019-07-09 20:22:30 -07:00
  • 5733690aa6 Add success and fail callback of grpc sending reply (#5141) Joey Jiang 2019-07-09 17:03:57 +08:00
  • 5aec750107 Add warning/error if object store memory exceeds available memory (#4893) Eric Liang 2019-07-08 21:37:08 -07:00
  • dfc94ce7bc [rllib]Add entropy coeff decay (#5043) Stefan Pantic 2019-07-09 03:30:32 +02:00
  • eeb67db861 [autoscaler] Log AWS NodeProvider create_instances (#4998) Daniel Edgecumbe 2019-07-08 21:22:26 +01:00
  • 8a30b93e42 Define common data structures with protobuf. (#5121) Hao Chen 2019-07-08 22:41:37 +08:00
  • b4e51c8aa1 Support clang-format whose version is not 7.0 (#5139) Joey Jiang 2019-07-08 17:15:09 +08:00
  • 7ad854d4c6 [tune] Use traceback.format_tb() (fixes #5135) (#5136) Sam Toyer 2019-07-08 01:13:06 -07:00
  • 274233962f Remove unused connection file in object manager (#5123) Joey Jiang 2019-07-08 10:59:36 +08:00
  • 893744b3be [rllib] Revert "use make template" which seems to break DQN/Atari (#5134) Eric Liang 2019-07-07 19:51:26 -07:00
  • 7e020e7183 [tune] tune.run keep_checkpoints_num (#5117) Morgan Giraud 2019-07-08 02:14:56 +02:00
  • 8f53364097 Improve local_mode (#5060) Edward Oakes 2019-07-08 02:10:50 +02:00
  • 932d6b2517 [rllib] Port IMPALA to ModelV2/build_tf_policy (#5130) Eric Liang 2019-07-07 15:06:41 -07:00
  • 6a14f1a540 [autoscaler] Small fixes for local cluster usability (#4864) Richard Liaw 2019-07-06 21:55:18 -07:00
  • 1798d4f077 [autoscaler] Add hard kill and monitor commands (#5082) Richard Liaw 2019-07-06 21:52:55 -07:00
  • 445bcb29b0 [hotfix] fix backward compat with older yaml libraries Eric Liang 2019-07-06 20:41:28 -07:00
  • c15ed3ac55 [rllib] Shuffle RNN sequences in PPO as well (#5129) Eric Liang 2019-07-06 20:40:49 -07:00
  • c04b69902c Updates for #5072 (#5091) Brandon Bertelsen 2019-07-06 18:05:50 -05:00
  • 0448847a02 Update protobuf version (#5128) Eric Liang 2019-07-06 15:59:55 -07:00
  • 09bde397c9 Multiagent experiment resume (#5102) Aleksei Petrenko 2019-07-06 11:38:17 -07:00
  • e9b88dcbed [wingman -> tune] Add system performance tracking (#4924) Dušan Josipović 2019-07-06 09:57:35 +02:00
  • c3e9d94b18 [tune][minor] Reduce checkpointing frequency (#4859) Richard Liaw 2019-07-06 00:54:24 -07:00
  • 4b56a5eb27 [tune] missing torch.load in mnist_pytorch_trainable.py (#5103) Kim Jeong Ju 2019-07-06 16:14:41 +09:00
  • c5253cc300 Add job table to state API (#5076) Philipp Moritz 2019-07-06 00:05:48 -07:00
  • 53d5a8a45f [tune] Fix sort (#5111) Richard Liaw 2019-07-05 16:05:10 -07:00
  • 4183303a2f Add bazel build options for plasma to use glog (#5108) Joey Jiang 2019-07-05 19:00:19 +08:00
  • 9cc4cc6a52 Fail format.sh if yapf/flake8 versions are incorrect. (#5083) Robert Nishihara 2019-07-04 23:22:01 -07:00
  • 54d5969cea [grpc] Add grpc server to worker (#5054) Zhijun Fu 2019-07-04 20:16:42 +08:00
  • 41a16c55ef [tune] Fixed bug with joining experiment_path twice. (#5106) ztangent 2019-07-04 13:48:08 +08:00
  • 1a543a6571 [serve] add missing __init__.py file under serve/utils (#4609) Patrick 2019-07-04 08:27:59 +08:00
  • 0dbb6c4911 [tune] PBT perturbing after first iteration (#5097) Richard Liaw 2019-07-03 17:27:26 -07:00
  • 34d054ff19 [rllib] ModelV2 API (#4926) Eric Liang 2019-07-03 15:59:47 -07:00
  • 9e0192bc0b [tune] Change the log syncing behavior (#4450) Kristian Hartikainen 2019-07-02 20:46:00 -07:00
  • 71d4637b75 [core worker] Refactor CoreWorker member classes (#5062) Stephanie Wang 2019-07-02 15:30:30 -07:00
  • 1cf7728f35 [Core worker] Serialize ActorHandle in core worker. Make ActorHandle thread safe. (#5034) Kai Yang 2019-07-02 16:48:43 +08:00