Simon Mo
f51c26bae6
Revert "[Core]Fix ray.kill doesn't cancel pending actor bug ( #13254 )" ( #14013 )
...
This reverts commit 2092b097ea .
2021-02-09 11:36:38 -08:00
fangfengbin
2092b097ea
[Core]Fix ray.kill doesn't cancel pending actor bug ( #13254 )
2021-02-09 10:59:14 +08:00
DK.Pino
fb89f9c2c8
[Placement Group] Support named placement group ( #13755 )
2021-02-05 11:04:51 +08:00
DK.Pino
7f6d326ad8
[Placement Group]Add detached support for placement group. ( #13582 )
2021-01-27 18:51:26 +08:00
fangfengbin
4a6c53da46
[Core]Fix raylet scheduling bug ( #13452 )
...
* [Core]Fix raylet scheduling bug
* fix lint error
* fix lint error
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com >
2021-01-14 14:50:32 +01:00
fangfengbin
91878d18b5
[PlacementGroup]Fix placement group wait api disorder bug ( #12827 )
...
* [PlacementGroup]Fix placment group wait api disorder bug
* fix review comment
* fix review comment
* fix review comment
* fix review comments
* increase num_heartbeats_timeout
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com >
2020-12-16 18:45:53 +08:00
Tao Wang
35f7d84dbe
Revert heartbeat interval to keep ci stable ( #12836 )
...
* Revert heartbeat interval to keep ci stable
* fix missing one
2020-12-14 16:58:40 +08:00
DK.Pino
153b24746c
[Placement Group] Refactor pg resource constrain in node manager ( #12538 )
...
* first version by pointer
* second version reference
* clean up
* add cpp ut
* lint
* extract LocalPlacementGroupManagerInterface
* lint
* fix commemt
* add idempotency test
* lint
* fix pg ut
* fix pg ut
* python lint
* fix pg ut timeout
* python lint
* fix comment
* lint
* lint
2020-12-12 23:32:15 -08:00
Eric Liang
bdc6624da8
Revert "[PlacementGroup]Add PlacementGroup wait python api ( #12601 )" ( #12825 )
...
This reverts commit 401d342602 .
2020-12-12 12:13:48 -08:00
Tao Wang
295b6e5ce4
Split heartbeat message ( #12535 )
...
* first
* xxx
* Split heartbeat message
* only report resource usage when changed
* Fix GetAllResourceUsage
* Fix report resource usage
* Increase default heartbeat interval
* regularize heartbeat interval in test case
2020-12-11 21:19:57 +08:00
fangfengbin
401d342602
[PlacementGroup]Add PlacementGroup wait python api ( #12601 )
2020-12-07 13:53:49 +08:00
fangfengbin
ff34563539
[PlacementGroup]Fix bug that kill workers mistakenly when gcs restarts ( #12568 )
2020-12-03 17:50:48 +08:00
Tao Wang
e1075c0a82
[GCS]Fill resource fields when re-report heartbeat after gcs restarted ( #12097 )
2020-11-25 11:07:02 +08:00
fangfengbin
f400333841
[Placement Group]Placement Group supports gcs failover(Part2) ( #12003 )
...
* add testcase
* fix ut
* fix review comment
* fix review comment
* fix review comments
* fix ut bug
* add part code
* add part code
* add part code
* add testcase
* add part code
* fix ut bug
* fix ut timeout bug
* fix ut bug
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com >
2020-11-18 10:59:26 +08:00
fangfengbin
7f050c706b
[PlacementGroup]Skip flaky testcase ( #12065 )
...
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com >
2020-11-17 12:21:34 -08:00
fangfengbin
8fb926565c
[Placement Group]Placement Group supports gcs failover (Part1) ( #11933 )
2020-11-16 14:42:56 +08:00
SangBin Cho
6e2a1eac36
[Placement Group] Placement group automatic cleanup. ( #11546 )
...
* In progress. Done with all placement group manager code.
* It is working with job.
* Finished detached actor implementation.
* Fix minor issue.
* In progress.
* Addressed code review.
* Addressed code review.
* Addressed code reivew.
* Fix a build error.
2020-10-30 10:55:43 -07:00
DK.Pino
9f804ade5f
[Placement Group]Add get all placement group api ( #11460 )
...
* add get all interface for placement group
* add get all interface for placement group
* make it work
* fix lint
* fix lint
* fix comment
* add cpp test
* fix python lint
2020-10-23 11:46:48 -07:00
SangBin Cho
666fcde8ca
[Placement group] Input validation ( #11152 )
...
* Add a basic input validation.
* Addressed code review.
2020-10-14 13:56:41 -07:00
SangBin Cho
1e39c40370
[Placement Group] Capture child tasks by default. ( #11025 )
...
* In progress.
* Finished up.
* Improve comment.
* Addressed code review.
* Fix test failure.
* Fix ci failures.
* Fix CI issues.
2020-09-27 19:33:00 -07:00
SangBin Cho
29663d89f1
[Placement Group] Remove warning msg for placement groups. ( #11034 )
...
* Done.
* Addressed code review.
* Fixed typo.
* Addressed code review.
2020-09-25 20:53:42 -07:00
SangBin Cho
5e6b887f2d
[Placement Group] Capture Child Task Part 1 ( #10968 )
...
* In progress.
* In progers.
* Done.
* Addressed code review.
* Increase timeout to make a test less flaky.
* Addressed code review.
* Addressed code review.
2020-09-24 09:02:03 -07:00
SangBin Cho
dcb9e03fde
[Placement Group] Atomic Creation using 2 phase protocol part 2. ( #10599 )
...
* In progress.
* In Progress
* Basic done.
* Fix build issues.
* Addressed code review.
* Change the confusing test name.
* Fix comments.
* Addressed code review.
2020-09-08 13:11:11 -07:00
Eric Liang
8ee7c182f5
[1.0] move placement groups from experimental to util. Note they are still undocumented. ( #10554 )
...
* move files
* Update __init__.py
* remove
* Update __init__.py
2020-09-04 19:01:24 -07:00
SangBin Cho
dc7fe1a4c5
[Placement Group] Atomic Placement Group Part 1, Basic Structure. ( #10482 )
...
* Write a test.
* Basic structure done.
* Reduce flakiness of tests.
* Addressed code review.
* Skipping tests because it is flaky for now.
* Fix linting issues.
* Increase sleep time to see lint messages.
* Lint issue fixed.
2020-09-02 18:14:46 -07:00
Eric Liang
2a204260a8
[api] Second round of 1.0 API changes: exceptions, num_return_vals ( #10377 )
2020-08-28 19:57:02 -07:00
Eric Liang
bd245a1c18
[api] Clean up and document Actor name / lifetime API ( #10332 )
2020-08-27 13:38:39 -07:00
SangBin Cho
3b3ca96a4e
[Placement Group] Wait ( #10259 )
...
* Initial progress done.
* Fix wrong test.
* Improve tests.
* Update code.
* Addressed code review and merge conflict.
* Addressed code review.
2020-08-24 20:14:48 -07:00
fangfengbin
b61a79efd7
[Placement Group]Fix SigSegv bug ( #10262 )
...
* fix SigSegv bug
* fix review comments
* fix ut bug
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com >
2020-08-23 11:33:40 -07:00
fangfengbin
36c6c4b298
[Placement group] Check if placement group bundle index is valid ( #10194 )
...
* add part code
* rebase master
* add java testcase
* fix review comments
* fix lint error
* rebase master
* fix lint error
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com >
2020-08-21 11:04:56 -07:00
fangfengbin
a462ae2747
[Placement Group]Add strict spread strategy ( #10174 )
...
* support STRICT_SPREAD strategy
* fix review comments
* rebase master
* fix lint error
* fix lint error
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com >
2020-08-20 10:18:58 -07:00
SangBin Cho
224933b5e4
[Placement Group] Remove API part 2 ( #10215 )
...
* Initial progress done.
* Fix mistake.
* Addressed code review.
* Fix cpp build issue.
* Addressed code review.
2020-08-20 09:50:13 -07:00
fangfengbin
9734dbca3e
[Placement Group]Reschedule bundles when the node of bundles is dead ( #10021 )
2020-08-19 13:24:42 -07:00
SangBin Cho
263df6163c
[Placement Group] Placement group remove api part 1 ( #10063 )
...
* Added basic rpc calls.
* fix issues.
* Fix the gcs server not getting request issue.
* In Progress.
* Basic logic done. Tests are required.
* In progress.
* In progress in refactoring context.
* Revert "In progress in refactoring context."
This reverts commit 38236256cf1306c60dd203e75d45ceb4509c8106.
* Working now.
* Python test works.
* Lint.
* Addressed code review.
* Addressed code review.
* Lint.
* Added unit tests.
* Done, but one of unit tests fail
* Addressed code review.
* Addressed the last code review.
* Fix the wrong test case.
2020-08-18 12:44:00 -07:00
SangBin Cho
053188dfbe
[Placement Group] Support Placement Group state table. ( #10090 )
...
* Done.
* Addressed code review.
* Linting.
* Fix lint.
* Fix lint.
* Fix a test.
* Lint.
* Add a lint sleep to test.
* Fix the lint issue.
* Fixed doc build error.
2020-08-17 09:24:50 -07:00
fangfengbin
edd783bc32
[Placement Group]Add soft pack strategy ( #10099 )
2020-08-17 12:01:34 +08:00
Eric Liang
c9f13b0833
[Placement Groups] Support CUDA_VISIBLE_DEVICES ( #10053 )
2020-08-13 18:00:04 -07:00
Eric Liang
7d4f204aa8
[Placement Group] Allow scheduling a task on any bundle (-1, default) ( #9885 )
...
* wip
* wip
* fix tests
* wip
* wip
* wip
* wip
* wip
* add test
* update
* update
* remov debug
* comments
2020-08-06 00:05:21 -07:00
Eric Liang
b73080c85f
Allow tasks to be used with placement groups ( #9738 )
2020-07-31 10:51:37 -07:00
Alisa
51e12ee97c
Python api of placement group ( #9243 )
2020-07-27 14:57:05 -07:00