Edward Oakes
|
40f77101d5
|
Check for raylet PID as ppid in dashboard agent fate-sharing (#12867)
|
2020-12-20 16:44:20 -08:00 |
|
fangfengbin
|
1305f5d4e5
|
[GCS]GCS based Actor Scheduling support actor colocation (#12707)
* [GCS]GCS based Actor Scheduling support actor colocation
* fix review comment
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
|
2020-12-10 13:46:43 -08:00 |
|
fangfengbin
|
93c0eb249c
|
[PlacementGroup]Support acquire and return bundle resource from gcs resource manager (#12349)
|
2020-12-08 10:29:57 +08:00 |
|
fangfengbin
|
7e1422e925
|
[PlacementGroup]Fix placement group strict spread bug when node dead (#12647)
* [PlacementGroup]Fix strict spread bug when node dead
* fix ut bug
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
|
2020-12-07 21:50:28 +08:00 |
|
Philipp Moritz
|
73a1a232b9
|
Ray debugger stepping between tasks (#12075)
|
2020-12-06 21:50:18 -08:00 |
|
fangfengbin
|
260b07cf0c
|
[PlacementGroup]Add PlacementGroup wait java api (#12499)
* add part code
* add part code
* add part code
* add part code
* fix review comments
* fix compile bug
* fix compile bug
* fix review comments
* fix review comments
* fix code style
* add part code
* fix review comments
* fix review comments
* fix code style
* rebase master
* fix bug
* fix lint error
* fix compile bug
* fix newline issue
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
|
2020-12-05 16:40:04 +08:00 |
|
SangBin Cho
|
0138c2dbb4
|
[Metrics] Remove redundant unit specification. (#12595)
|
2020-12-04 00:06:21 -08:00 |
|
Kai Yang
|
21fcee28f9
|
[Java] Simplify Ray.init() by invoking ray start internally (#10762)
|
2020-12-04 14:33:45 +08:00 |
|
fangfengbin
|
ff34563539
|
[PlacementGroup]Fix bug that kill workers mistakenly when gcs restarts (#12568)
|
2020-12-03 17:50:48 +08:00 |
|
Stephanie Wang
|
443339ab19
|
[core] Move out-of-memory handling into the plasma store and support async object creation (#12186)
* Refactor to extract creation request queue
* timer on oom
* move timer out
* Move evict_if_full and on_store_full into plasma store
* Remove client-side code
* revert
* Distinguish between transient and permanent OOM delays
* update
* Move out create request queue, unit test
* unit test
* Fix max retries
* test
* Do not pin restored objects
* First pass to add polling requests, unit test passes
* worker plasma client retries plasma requests
* cleanup
* Clean up after disconnected clients, check memory leaks
* Support immediate requests in request queue
* Option to try creating immediately
* lint
* Fix build, address comments
* doc
* fixes
* debug travis
* debug
* debug
* debug
* debug
* Revert "debug"
This reverts commit 6bf2f6ee5640e71630c4aecdb7ebf54911ea32db.
Revert "debug"
This reverts commit 73017099c9b06cdaae1217bf0e0f4d23ed68a9e5.
Revert "debug"
This reverts commit 5a155529e28cee9461a598b0cdf7b6a3cc194c93.
Revert "debug"
This reverts commit b50c2101afd45d4cf663daae857bfe1b40387703.
Revert "debug travis"
This reverts commit 012b8721dedf9bca46294ae75eee2815b160368b.
* Skip if new scheduler enabled
* error message
* merge
|
2020-12-02 13:25:54 -05:00 |
|
Ian Rodney
|
786f839ff3
|
[Windows] Fix windows build (#12555)
* fix remote watch
* remove const
* unfix remote-watch
* format
|
2020-12-02 09:37:40 -08:00 |
|
Kai Fricke
|
0a12eba603
|
Revert "Fix race condition between failure detection and references going out of scope (#12548)" (#12570)
This reverts commit 8801e87a
|
2020-12-02 10:20:17 -05:00 |
|
Stephanie Wang
|
8801e87afd
|
Fix race condition between failure detection and references going out of scope (#12548)
* fix
* lint
|
2020-12-01 20:52:30 -05:00 |
|
Barak Michener
|
6412dfaf38
|
[ray_client] actors v0 (#12388)
|
2020-12-01 13:12:08 -08:00 |
|
SangBin Cho
|
0e892908f7
|
[Object Spilling] Delete spilled objects when references are gone out of scope. (#12341)
|
2020-12-01 13:10:39 -08:00 |
|
Simon Mo
|
f596113fc7
|
[Core] Actor Retries Out of Order Tasks on Restart (#12338)
|
2020-12-01 09:35:54 -08:00 |
|
SangBin Cho
|
f6f3cc9af1
|
[Core]Remove checkpoint table (#12235)
* Delete an actor entry from node manager.
* Remove checkpoint table
* remote checkpoint interface
* remove checkpoint interface
* fix ExitActorTest
Co-authored-by: chaokunyang <shawn.ck.yang@gmail.com>
|
2020-12-01 08:58:36 -08:00 |
|
Tao Wang
|
b85c6abc3e
|
Rename fields/variables from client id to node id (#12457)
|
2020-11-30 14:33:36 +08:00 |
|
Alex Wu
|
f1cc33a6a6
|
Actor resource backlog hotfix (#12471)
* prepare implemented
* works?
* deflek
* git
* deflek round 2
* .
* improve the test
Co-authored-by: Alex <alex@anyscale.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
|
2020-11-29 20:55:50 -08:00 |
|
Eric Liang
|
9ad0f173d6
|
Prestart workers to avoid slow start when multi-tenancy is enabled (#12430)
|
2020-11-27 21:47:46 -08:00 |
|
Eric Liang
|
569eee5e71
|
Enable more new scheduler tests (#12421)
|
2020-11-27 16:10:38 -08:00 |
|
fangfengbin
|
d5215745e4
|
[PlacementGroup] Introduce GcsResourceManager and avoid copying resources when scheduling placement groups (#12253)
|
2020-11-26 11:21:58 +08:00 |
|
SangBin Cho
|
2e4e285ef0
|
[Object Spilling] Fusion small objects (#12087)
|
2020-11-25 10:13:32 -08:00 |
|
Tao Wang
|
4dd0aa7822
|
[GCS]make thread number of gcs rpc server configurable (#12257)
|
2020-11-25 11:40:29 +08:00 |
|
Tao Wang
|
5d47d02f81
|
[GCS]add callback for RegisterSelf api, make it done first (#12252)
|
2020-11-25 11:36:44 +08:00 |
|
Tao Wang
|
e025b9e788
|
[TEST]Move all WaitReady together (#12254)
|
2020-11-25 11:21:24 +08:00 |
|
Tao Wang
|
2af10c1b78
|
[GCS]Add new message ReportResourceUsage (#11848)
|
2020-11-25 11:18:26 +08:00 |
|
Tao Wang
|
e1075c0a82
|
[GCS]Fill resource fields when re-report heartbeat after gcs restarted (#12097)
|
2020-11-25 11:07:02 +08:00 |
|
fangfengbin
|
1d909321c9
|
[PlacementGroup]Fix node manager release unused bundles bug (#12346)
|
2020-11-25 11:02:43 +08:00 |
|
fangfengbin
|
5934b20b96
|
[PlacementGroup]Fix destroy bundle resources bug (#12336)
* [PlacementGroup]Fix destroy bundle resources bug
* revert AddBundleLocations code change
* add comment
* fix review comments
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
|
2020-11-25 09:45:26 +08:00 |
|
Lixin Wei
|
462c7fb575
|
[streaming] export aligned_ symbols from raylet.so (#12345)
|
2020-11-24 10:16:12 -06:00 |
|
ZhuSenlin
|
1ae4d2873a
|
[GCS] refactor gcs initialization (#11890)
|
2020-11-24 21:11:18 +08:00 |
|
fangfengbin
|
be7938ee09
|
[PlacementGroup]Fix AddBundleLocations bug (#12330)
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
|
2020-11-24 16:57:17 +08:00 |
|
dHannasch
|
2c4514a2c0
|
[minor] Refactor to expose RedisContext::PingPort (#12022)
|
2020-11-23 20:39:50 -08:00 |
|
fangfengbin
|
084f03797b
|
[Placement Group]Placement Group supports gcs failover(Part3) (#12036)
|
2020-11-23 16:57:58 +08:00 |
|
Eric Liang
|
dac09bd569
|
Fix actor_registry_ copied on each heartbeat; Improve receive object chunk debug messages (#12187)
|
2020-11-19 16:45:37 -08:00 |
|
Stephanie Wang
|
7bf5145d36
|
Lint plasma source files (#12171)
|
2020-11-19 19:08:18 -05:00 |
|
Eric Liang
|
de86d5aff7
|
ActorStatisticalData() debug metrics bog down raylet with 100% CPU (#12148)
* comment out bad
* update
|
2020-11-19 11:38:44 -08:00 |
|
SangBin Cho
|
7d67af6c2a
|
[Metrics] Add stats to measure process startup time + scheduling stats. (#12100)
* Add new stats.
* Fix issues.
|
2020-11-19 11:04:26 -08:00 |
|
Ian Rodney
|
7fcce785ed
|
[hotfix] Fix windows build (#12146)
* [hotfix] fix windows
* remove debug logs
|
2020-11-19 11:00:19 -08:00 |
|
Ian Rodney
|
e086ddc18f
|
[core] Add Recursive task cancelation (#11923)
|
2020-11-18 15:18:40 -08:00 |
|
Alex Wu
|
e9c9ba9c9f
|
[New Scheduler] Don't start tasks if the owner is dead (#12050)
|
2020-11-18 11:34:19 -08:00 |
|
Ameer Haj Ali
|
eef624750c
|
[ray client] ray wait() implementation (#12072)
|
2020-11-18 11:33:57 -08:00 |
|
dHannasch
|
b41f4fdec2
|
Extract the connection logic to reduce duplication. (#12016)
|
2020-11-18 00:12:58 -08:00 |
|
fangfengbin
|
d87af0da88
|
[PlacementGroup]Add gcs placement group manager debug info (#12061)
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
|
2020-11-18 11:15:38 +08:00 |
|
fangfengbin
|
f400333841
|
[Placement Group]Placement Group supports gcs failover(Part2) (#12003)
* add testcase
* fix ut
* fix review comment
* fix review comment
* fix review comments
* fix ut bug
* add part code
* add part code
* add part code
* add testcase
* add part code
* fix ut bug
* fix ut timeout bug
* fix ut bug
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
|
2020-11-18 10:59:26 +08:00 |
|
Stephanie Wang
|
f6bdd5ab17
|
[New Scheduler] Spillback from the queue of tasks assigned to the local node (#12084)
|
2020-11-17 16:13:59 -08:00 |
|
dHannasch
|
b5dfdb2a21
|
Log the Redis shard addresses as originally received from the head GCS. (#12011)
|
2020-11-17 13:11:17 -08:00 |
|
dHannasch
|
010e6cef3f
|
Allow setting the RAY_BACKEND_LOG_LEVEL to trace. (#12012)
|
2020-11-17 13:10:23 -08:00 |
|
dHannasch
|
f0dcf01807
|
Clarify that Ray is not yet retrying to connect. (#12013)
|
2020-11-17 13:01:42 -08:00 |
|