Commit Graph

6812 Commits

Author SHA1 Message Date
Richard Liaw 50dbf1a307 [core] Support configurable number of "check for redis" attempts (#11902)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-10 14:57:57 -08:00
Ian Rodney 1d158dda32 [serve] Rename to use replicas, not workers (#11822) 2020-11-10 11:36:15 -08:00
Eric Liang 9b8218aabd [docs] Move all /latest links to /master (#11897)
* use master link

* remae

* revert non-ray

* more

* mre
2020-11-10 10:53:28 -08:00
fangfengbin 543f7809a6 [GCS]Add gcs dump log(Part1) (#11727)
* add part code

* fix compile bug

* Fix bug

* Add part code

* fix review comment

* fix review comment

* fix lint error

* fix review comment

Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-11-10 14:10:03 +08:00
Nikita Vemuri aba9288615 [Autoscaler] Introduce callback system (#11674)
Co-authored-by: Nikita Vemuri <nikitavemuri@Nikitas-MacBook-Pro.local>
Co-authored-by: Xiayue Charles Lin <xcl@anyscale.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-09 20:03:15 -08:00
Eric Liang ee2da0cf45 [Core] PushManager for reliable broadcast (#11869) 2020-11-09 18:01:47 -08:00
Benjamin Black 1999266bba Updated pettingzoo env to acomidate api changes and fixes (#11873)
* Updated pettingzoo env to acomidate api changes and fixes

* fixed test failure

* fixed linting issue

* fixed test failure
2020-11-09 16:09:49 -08:00
Eric Liang a9cf0141a0 [autoscaler] Fix semantics of request_resources (#11820) 2020-11-09 14:57:40 -08:00
Edward Oakes 1c132f2ff8 [serve] Improve DEBUG logging for understanding perf (#11838) 2020-11-09 14:10:42 -06:00
architkulkarni adcaabcd64 [Serve] Reconfigure backend class at runtime (#11709) 2020-11-09 14:04:51 -06:00
Kai Fricke 287aba6dc3 [tune] schedulers: Add test for context finalization (#11889) 2020-11-09 11:37:05 -08:00
Richard Liaw a09e49ee94 [core] Add retry for reading session name (#11844) 2020-11-09 11:22:50 -08:00
Kai Fricke 88be1ea20b [tune] Handle infinite and NaN values (#11835) 2020-11-09 11:18:31 -08:00
Kai Yang 904f48ebd9 [Core] Multi-tenancy: Pass job ID from Raylet to worker via env variable (#11829)
* Pass job ID from Raylet to worker via env variable

* fix

* fix

* fix

* lint

* fix

* fix test_object_spilling

* address comments

* lint

* fix
2020-11-09 11:02:15 -08:00
Tao Wang 77e3163630 [GCS]Only pass node id to node failure detector (#11886)
* [GCS]Only pass node id to node failure detector

* rename
2020-11-09 10:52:33 -08:00
Max Fitton 368b14a0da Stop dashboard from erroring when an actor does not have a corresponding core worker (#11870) 2020-11-09 11:36:34 -06:00
Edward Oakes 2feba4409c [serve] Fix long running failure test (#11805) 2020-11-09 11:21:03 -06:00
fangfengbin 407a212816 [GCS]Fix TestActorTableResubscribe bug (#11830)
* fix compile bug

* [GCS]Fix TestActorTableResubscribe bug

* rm unused code

* fix lint error

* fix review comment

* fix ut bug

Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-11-08 23:50:05 -08:00
dHannasch 64ca30c060 [doc] Troubleshooting --dashboard-port (#11816) 2020-11-08 15:53:50 -08:00
Eric Liang 0932320eb3 Move test_joblib back to new_scheduler_broken category (#11872) 2020-11-07 20:08:41 -08:00
Stephanie Wang 61e41257e7 [Object spilling] Queue failed object creation requests until objects have been spilled (#11796)
* Queue creation requests

* Cleanup disconnected clients

* Remove unused

* todo

* FIFO order for create requests, remove warmup for IO workers

* test and lint

* disable test

* lint

* Skip on windows
2020-11-06 18:22:19 -05:00
Amog Kamsetty 900a48c19c [Tune] Better warnings/exceptions for fail_fast='raise' (#11842) 2020-11-06 15:01:55 -08:00
Aaron Miller 045fed5cd2 [examples] comment out rsync_ settings for K8S (#11862) 2020-11-06 14:35:21 -08:00
SangBin Cho e0ecf5d79d Revert "[GCS]Open light heartbeat by default (#11689)" (#11861)
This reverts commit 612ddb2dd1.
2020-11-06 14:34:59 -08:00
Simon Mo 871cde989a Re-Revert: [Serialization] Update CloudPickle to 1.6.0 (#9694) (#11837) 2020-11-06 12:24:36 -08:00
Kishan Sagathiya c5e6c90e1e [Core] Add name of actor in the result of ray.actors() (#11828)
Added name field to `actor_info`

Fixes #11112
2020-11-06 10:45:44 -08:00
bermaker 12ae0f20c6 [Metrics] Fix prometheus configuration doc (#11856) 2020-11-06 10:34:33 -08:00
Eric Liang 6b7a4dfaa0 [rllib] Forgot to pass ioctx to child json readers (#11839)
* fix ioctx

* fix
2020-11-05 22:07:57 -08:00
Philipp Moritz 28e7439cf0 [doc] Add documentation for Ray debugger (#11815)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-05 16:25:27 -08:00
Barak Michener 27c810a97e Basic protos for ray client (#11762) 2020-11-05 16:23:54 -08:00
Eric Liang f86c4f992c Fix RAY_ENABLE_NEW_SCHEDULER=1 pytest test_advanced_2.py::test_zero_cpus_actor (#11817) 2020-11-05 16:02:04 -08:00
architkulkarni 347e871409 [Serve] Add dependency management (#11743) 2020-11-05 16:39:37 -06:00
Kai Yang ffc267f94b [Test] Ignore setproctitle for local mode (#11819) 2020-11-05 11:07:34 -08:00
SangBin Cho 3cd1d7f44a [Metrics] Implement basic metrics changes (#11769)
* Implement basic metrics changes

* Addressed code review.

* Fix build issue.

* Fix build issue.
2020-11-05 11:07:05 -08:00
SangBin Cho 049df70289 [OSS] Introduce Stale bot (#11790)
* first iteration.

* Add a newline at the end of yaml

* Addressed code review.

* Addressed review.
2020-11-05 11:02:37 -08:00
Kai Fricke 603accf1c2 [tune] logger refactor part 3: Add ExperimentLogger class (#11749) 2020-11-05 08:55:38 -08:00
Richard Liaw f6717b8b03 [autoscaler] Support empty node list for kill node (#11810) 2020-11-04 22:40:07 -08:00
dHannasch d0f3befd9c Add --redis-shard-ports to the list of ports that need to be open on the head node. (#11808) 2020-11-04 21:26:09 -08:00
Richard Liaw efa07d5403 Revert "Revert "[tune] PB2 (#11466)" (#11795)" (#11812) 2020-11-04 20:47:12 -08:00
Tao Wang 612ddb2dd1 [GCS]Open light heartbeat by default (#11689) 2020-11-05 12:11:00 +08:00
DK.Pino 50110b934c [Placement Group]Enhance create placement group java api (#11702)
* enhance create pg java api

* add state for PlacementGroup

* fix comment

* move default pg

* make default pg name private

* add bundle size and bundle resource size check when placement group create
2020-11-05 09:59:36 +08:00
Eric Liang 69145d6215 [hotfix] Bazel candidates not found due to raising too early 2020-11-04 16:08:51 -08:00
Ian Rodney 22bbbc3171 [wheel] Fix Manylinux2014 Build (#11811) 2020-11-04 14:50:38 -08:00
Amog Kamsetty 92718de40c [SGD] Better support for custom DDP (#11771) 2020-11-04 13:58:51 -08:00
dHannasch 6147b6a1a3 [docs] Note that the printed IP address can be incorrect. (#11804)
* If the head node is on a subnet with NAT, then you will need a different IP address.

* Specify what you are checking firewall settings and network configuration *for*.

* reword following @amogkam

* Give the full error message.
2020-11-04 13:48:03 -08:00
Ameer Haj Ali ebdf8ba3fa [autoscaler] Support legacy cluster configs with the new resource demand scheduler (#11751) 2020-11-04 12:05:48 -08:00
Kai Yang 31598338b3 [Core] Fix ray start failure to due to bug of redis address detection (#11735)
* Fix ray start failure to due redis address detection bug

* Address comment
2020-11-04 12:04:44 -08:00
Alex Wu 53aac55739 [autoscaler] Autoscaler simulator (#11690) 2020-11-04 12:04:11 -08:00
Sven Mika d6c7c7c675 [RLlib] Make sure, DQN torch actions are of type=long before torch.nn.functional.one_hot() op. (#11800) 2020-11-04 18:04:03 +01:00
heng2j 9073e6507c WIP: Update to support the Food Collector environment (#11373)
* Update to support the Food Collector environment 

Recently, I am trying out ML Agent with Ray, and trying to use the food collector environment. Since the observation space and action space haven't defined in the unity3d_env.py. I propose to make this changes to add the support for Food Collector. I have tried to use this env in the [unity3d_env_local example](https://github.com/ray-project/ray/blob/master/rllib/examples/unity3d_env_local.py). Please let me know if this the proper adjustment. Even these are just few line of code, please let me know how can I made a proper contribution.

* Apply suggestions from code review
2020-11-04 12:29:16 +01:00