Commit Graph

6280 Commits

Author SHA1 Message Date
Kai Yang ffc267f94b [Test] Ignore setproctitle for local mode (#11819) 2020-11-05 11:07:34 -08:00
SangBin Cho 3cd1d7f44a [Metrics] Implement basic metrics changes (#11769)
* Implement basic metrics changes

* Addressed code review.

* Fix build issue.

* Fix build issue.
2020-11-05 11:07:05 -08:00
SangBin Cho 049df70289 [OSS] Introduce Stale bot (#11790)
* first iteration.

* Add a newline at the end of yaml

* Addressed code review.

* Addressed review.
2020-11-05 11:02:37 -08:00
Kai Fricke 603accf1c2 [tune] logger refactor part 3: Add ExperimentLogger class (#11749) 2020-11-05 08:55:38 -08:00
Richard Liaw f6717b8b03 [autoscaler] Support empty node list for kill node (#11810) 2020-11-04 22:40:07 -08:00
dHannasch d0f3befd9c Add --redis-shard-ports to the list of ports that need to be open on the head node. (#11808) 2020-11-04 21:26:09 -08:00
Richard Liaw efa07d5403 Revert "Revert "[tune] PB2 (#11466)" (#11795)" (#11812) 2020-11-04 20:47:12 -08:00
Tao Wang 612ddb2dd1 [GCS]Open light heartbeat by default (#11689) 2020-11-05 12:11:00 +08:00
DK.Pino 50110b934c [Placement Group]Enhance create placement group java api (#11702)
* enhance create pg java api

* add state for PlacementGroup

* fix comment

* move default pg

* make default pg name private

* add bundle size and bundle resource size check when placement group create
2020-11-05 09:59:36 +08:00
Eric Liang 69145d6215 [hotfix] Bazel candidates not found due to raising too early 2020-11-04 16:08:51 -08:00
Ian Rodney 22bbbc3171 [wheel] Fix Manylinux2014 Build (#11811) 2020-11-04 14:50:38 -08:00
Amog Kamsetty 92718de40c [SGD] Better support for custom DDP (#11771) 2020-11-04 13:58:51 -08:00
dHannasch 6147b6a1a3 [docs] Note that the printed IP address can be incorrect. (#11804)
* If the head node is on a subnet with NAT, then you will need a different IP address.

* Specify what you are checking firewall settings and network configuration *for*.

* reword following @amogkam

* Give the full error message.
2020-11-04 13:48:03 -08:00
Ameer Haj Ali ebdf8ba3fa [autoscaler] Support legacy cluster configs with the new resource demand scheduler (#11751) 2020-11-04 12:05:48 -08:00
Kai Yang 31598338b3 [Core] Fix ray start failure to due to bug of redis address detection (#11735)
* Fix ray start failure to due redis address detection bug

* Address comment
2020-11-04 12:04:44 -08:00
Alex Wu 53aac55739 [autoscaler] Autoscaler simulator (#11690) 2020-11-04 12:04:11 -08:00
Sven Mika d6c7c7c675 [RLlib] Make sure, DQN torch actions are of type=long before torch.nn.functional.one_hot() op. (#11800) 2020-11-04 18:04:03 +01:00
heng2j 9073e6507c WIP: Update to support the Food Collector environment (#11373)
* Update to support the Food Collector environment 

Recently, I am trying out ML Agent with Ray, and trying to use the food collector environment. Since the observation space and action space haven't defined in the unity3d_env.py. I propose to make this changes to add the support for Food Collector. I have tried to use this env in the [unity3d_env_local example](https://github.com/ray-project/ray/blob/master/rllib/examples/unity3d_env_local.py). Please let me know if this the proper adjustment. Even these are just few line of code, please let me know how can I made a proper contribution.

* Apply suggestions from code review
2020-11-04 12:29:16 +01:00
Pierre TASSEL 66605cfcbd [RLLib] Random Parametric Trainer (#11366) 2020-11-04 11:12:51 +01:00
mvindiola1 4518fe790f [RLLIB] Convert torch state arrays to tensors during compute log likelihoods (#11708) 2020-11-04 09:33:56 +01:00
Akash Patel b7531fb4f5 [redis-py] change redis-py deprecated hmset usage to hset (#11776) 2020-11-03 22:23:02 -08:00
Amog Kamsetty 7248d5f4ae Revert "[tune] PB2 (#11466)" (#11795)
This reverts commit e7aafd7d24.
2020-11-03 21:05:00 -08:00
Kai Fricke 007634fd1b [tune] logger refactor part 2: Add SyncerCallback (#11748)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-03 21:04:40 -08:00
Barak Michener 05c4e3fb2a [build] Build wheels with manylinux2014 (#11621)
* necessary changes

* Split bazel install

* manylinux2014

* change references to manylinux2014

* Fix lint

* port alex's docker build changes

* fix config issue

* remove extra manylinux2010 requirement script

* revert SHA overwrite

* wip

* incompatible_linklibs

* fix nits
2020-11-03 19:36:32 -08:00
Ian Rodney 9527220a86 [serve] Fix Controller Crashes on Win (#11792) 2020-11-03 16:54:16 -08:00
architkulkarni 2ef707e440 Update advanced.rst (#11793) 2020-11-03 16:16:36 -08:00
Sven Mika 5b788ccb13 [RLlib] Trajectory view API (prep PR for switching on by default across all RLlib; plumbing only) (#11717) 2020-11-03 12:53:34 -08:00
Ian Rodney c3074f559c [serve] Split out metadata for checkpointing (#11533) 2020-11-03 12:41:24 -08:00
Philipp Moritz 39ce0eadbe Ray PDB support (#11739) 2020-11-03 09:49:23 -08:00
Stephanie Wang 952b71dc94 Fix windows build (#11786) 2020-11-03 12:38:45 -05:00
Max Fitton d352feadf0 [Dashboard] Memory Page Loading Wheel (#11651)
* Switch memory view loading message over to a loading wheel to make UX less confusing.

* lint

Co-authored-by: Max Fitton <max@semprehealth.com>
2020-11-03 09:37:30 -08:00
Ameer Haj Ali 08e0e8311a [autoscaler] Fixing AWS instance types autofill (#11758) 2020-11-03 09:34:14 -08:00
Kai Fricke f7b19c41e3 [tune] logger refactor part 1: move classes and utilities to own files (#11746)
* [tune] logger refactor part 1: move classes and utilities to own files

* Fix circular dependency

* Remove uneeded pretty print copy

* Apply suggestions from code review
2020-11-03 07:48:09 -08:00
desktable 5af745c90d [RLlib] Implement the SlateQ algorithm (#11450) 2020-11-03 09:52:04 +01:00
Lara Codeca e735add268 [RLlib] Integration with SUMO Simulator (#11710) 2020-11-03 09:45:03 +01:00
Maksim Smolin 0a6d24a727 [cli] Remove the deprecated old_style logging calls (#10776)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-02 23:40:18 -08:00
dHannasch e7f7cb29c4 [docs] Show expected terminal output for manual cluster setup (#11752)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-02 20:59:14 -08:00
Ian Rodney 6e89702508 [docker] Disable Readme push to avoid errors (#11770) 2020-11-02 19:12:51 -08:00
Max Fitton 3202ff74c2 [Dashboard] Don't show GPU columns if no GPU in cluster (#11704) 2020-11-02 18:07:27 -06:00
Stephanie Wang 0ba777af99 [Object spilling] Add policy to automatically spill objects on OutOfMemory (#11673) 2020-11-02 12:42:02 -08:00
Ameer Haj Ali 8d74a04a42 [autoscaler] Flag flip for resource_demand_scheduler should take into account queue (#11615) 2020-11-02 12:41:22 -08:00
Alex Wu cce91b51bd [docker] Fix docker regex (#11726)
Co-authored-by: Alex Wu <alex@anyscale.com>
2020-11-02 11:23:06 -08:00
Ian Rodney 171e02c684 [serve] re-enable serve-controller-crash test (#11579) 2020-11-02 11:22:09 -08:00
fangfengbin 4a7d0e059d [GCS]Optimize subscription perf (#11669)
* [GCS]Optimize subscription perf

* fix review comment

Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-11-02 09:46:04 -08:00
dHannasch 8346dedc3a Fix the linter failure. (#11755) 2020-11-02 18:02:15 +01:00
bcahlit 26176ec570 [RLlib] Fix epsilon_greedy on nested_action_spaces only in pytorch (#11453)
* [RLlib] Fix epsilon_greedy on nested_action_spaces only in pytorch

* epsilon_greedy on Continuous action

* formatt

* Fix error

* fix format

* fix bug

* increase speed

* Update rllib/utils/exploration/epsilon_greedy.py

* Update rllib/utils/exploration/epsilon_greedy.py

* Update rllib/utils/exploration/epsilon_greedy.py

Co-authored-by: Sven Mika <sven@anyscale.io>
2020-11-02 12:22:33 +01:00
Sven Mika 54d85a6c2a [RLlib] Fix RNN learning for tf-eager/tf2.x. (#11720) 2020-11-02 11:18:41 +01:00
Sven Mika bfc4f95e01 [RLlib] Fix test_bc.py test case. (#11722)
* Fix large json test file.

* Fix large json test file.

* WIP.
2020-10-31 00:16:09 -07:00
Eric Liang 48dee789b3 Add random actor placement; fix cancellation callback; update test skips (#11684) 2020-10-30 18:36:35 -07:00
DK.Pino b10871a1f5 [Core]Fix get workrer table bug (#11516)
* fix get_worker_table bug

* fix lint

* fix comment

* remove actor table

* fix comment

* fix get alive worker

* remove unused python import
2020-10-30 14:48:29 -07:00