Commit Graph

1622 Commits

Author SHA1 Message Date
Tao Wang b85c6abc3e Rename fields/variables from client id to node id (#12457) 2020-11-30 14:33:36 +08:00
Alex Wu f1cc33a6a6 Actor resource backlog hotfix (#12471)
* prepare implemented

* works?

* deflek

* git

* deflek round 2

* .

* improve the test

Co-authored-by: Alex <alex@anyscale.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-11-29 20:55:50 -08:00
Eric Liang 9ad0f173d6 Prestart workers to avoid slow start when multi-tenancy is enabled (#12430) 2020-11-27 21:47:46 -08:00
Eric Liang 569eee5e71 Enable more new scheduler tests (#12421) 2020-11-27 16:10:38 -08:00
fangfengbin d5215745e4 [PlacementGroup] Introduce GcsResourceManager and avoid copying resources when scheduling placement groups (#12253) 2020-11-26 11:21:58 +08:00
SangBin Cho 2e4e285ef0 [Object Spilling] Fusion small objects (#12087) 2020-11-25 10:13:32 -08:00
Tao Wang 4dd0aa7822 [GCS]make thread number of gcs rpc server configurable (#12257) 2020-11-25 11:40:29 +08:00
Tao Wang 5d47d02f81 [GCS]add callback for RegisterSelf api, make it done first (#12252) 2020-11-25 11:36:44 +08:00
Tao Wang e025b9e788 [TEST]Move all WaitReady together (#12254) 2020-11-25 11:21:24 +08:00
Tao Wang 2af10c1b78 [GCS]Add new message ReportResourceUsage (#11848) 2020-11-25 11:18:26 +08:00
Tao Wang e1075c0a82 [GCS]Fill resource fields when re-report heartbeat after gcs restarted (#12097) 2020-11-25 11:07:02 +08:00
fangfengbin 1d909321c9 [PlacementGroup]Fix node manager release unused bundles bug (#12346) 2020-11-25 11:02:43 +08:00
fangfengbin 5934b20b96 [PlacementGroup]Fix destroy bundle resources bug (#12336)
* [PlacementGroup]Fix destroy bundle resources bug

* revert AddBundleLocations code change

* add comment

* fix review comments

Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-11-25 09:45:26 +08:00
Lixin Wei 462c7fb575 [streaming] export aligned_ symbols from raylet.so (#12345) 2020-11-24 10:16:12 -06:00
ZhuSenlin 1ae4d2873a [GCS] refactor gcs initialization (#11890) 2020-11-24 21:11:18 +08:00
fangfengbin be7938ee09 [PlacementGroup]Fix AddBundleLocations bug (#12330)
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-11-24 16:57:17 +08:00
dHannasch 2c4514a2c0 [minor] Refactor to expose RedisContext::PingPort (#12022) 2020-11-23 20:39:50 -08:00
fangfengbin 084f03797b [Placement Group]Placement Group supports gcs failover(Part3) (#12036) 2020-11-23 16:57:58 +08:00
Eric Liang dac09bd569 Fix actor_registry_ copied on each heartbeat; Improve receive object chunk debug messages (#12187) 2020-11-19 16:45:37 -08:00
Stephanie Wang 7bf5145d36 Lint plasma source files (#12171) 2020-11-19 19:08:18 -05:00
Eric Liang de86d5aff7 ActorStatisticalData() debug metrics bog down raylet with 100% CPU (#12148)
* comment out bad

* update
2020-11-19 11:38:44 -08:00
SangBin Cho 7d67af6c2a [Metrics] Add stats to measure process startup time + scheduling stats. (#12100)
* Add new stats.

* Fix issues.
2020-11-19 11:04:26 -08:00
Ian Rodney 7fcce785ed [hotfix] Fix windows build (#12146)
* [hotfix] fix windows

* remove debug logs
2020-11-19 11:00:19 -08:00
Ian Rodney e086ddc18f [core] Add Recursive task cancelation (#11923) 2020-11-18 15:18:40 -08:00
Alex Wu e9c9ba9c9f [New Scheduler] Don't start tasks if the owner is dead (#12050) 2020-11-18 11:34:19 -08:00
Ameer Haj Ali eef624750c [ray client] ray wait() implementation (#12072) 2020-11-18 11:33:57 -08:00
dHannasch b41f4fdec2 Extract the connection logic to reduce duplication. (#12016) 2020-11-18 00:12:58 -08:00
fangfengbin d87af0da88 [PlacementGroup]Add gcs placement group manager debug info (#12061)
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-11-18 11:15:38 +08:00
fangfengbin f400333841 [Placement Group]Placement Group supports gcs failover(Part2) (#12003)
* add testcase

* fix ut

* fix review comment

* fix review comment

* fix review comments

* fix ut bug

* add part code

* add part code

* add part code

* add testcase

* add part code

* fix ut bug

* fix ut timeout bug

* fix ut bug

Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-11-18 10:59:26 +08:00
Stephanie Wang f6bdd5ab17 [New Scheduler] Spillback from the queue of tasks assigned to the local node (#12084) 2020-11-17 16:13:59 -08:00
dHannasch b5dfdb2a21 Log the Redis shard addresses as originally received from the head GCS. (#12011) 2020-11-17 13:11:17 -08:00
dHannasch 010e6cef3f Allow setting the RAY_BACKEND_LOG_LEVEL to trace. (#12012) 2020-11-17 13:10:23 -08:00
dHannasch f0dcf01807 Clarify that Ray is not yet retrying to connect. (#12013) 2020-11-17 13:01:42 -08:00
DK.Pino 0f9e2fec12 [Placement Group] Add get / get all / remove interface for Placement Group Java api. (#11821)
* add placement group java get/get all interface

* add remove placement group api

* fix some issue like: Placement Group -> placement group

* extract dumplicate code to placement group utils

* specify running mode for placement group ut

* update checkGlobalStateAccessorPointerValid -> validateGlobalStateAccessorPointer

* use THROW_EXCEPTION_AND_RETURN_IF_NOT_OK

* update pg log print
2020-11-17 12:32:39 +08:00
Tao Wang d525e61288 [GCS]Open light heartbeat by default (#11968)
* [GCS]Open light heartbeat by default (#11689)

* Add some unit tests
2020-11-16 18:21:47 -08:00
Stephanie Wang c49554fb7a Abstract plasma store creation request queue (#12039) 2020-11-16 17:09:15 -08:00
fangfengbin 8fb926565c [Placement Group]Placement Group supports gcs failover (Part1) (#11933) 2020-11-16 14:42:56 +08:00
Gabriele Oliaro 4744ed01f7 Queueing non-actor tasks at the workers (#11051)
* separated adding tasks to queue and executing them (worker side)

* linting

* first review

* second rev

* rev3, all tests passing locally

* linting

* rev4

* linting

* finished rev4, all tests passing locally (mac)

* rev4, all tests passing locally

* linting

* rev5

* bug fix

* hopefully fixed build

* nvm

* ptr cast

* linting

* no special treatment for actor creation tasks
2020-11-12 12:44:13 -05:00
Tao Wang 3fbd8be851 [Placement Group]Do not really subtract resources, just count (#11894)
* [Placement Group]Do not really subtract resources, just count

* add todo
2020-11-12 00:01:19 -08:00
SangBin Cho f80d812799 [Object Spilling] Introduce SpillWorker & RestoreWorker Pool to avoid IO worker deadlock. (#11885) 2020-11-11 18:20:14 -08:00
Tao Wang 92286660e4 [Core] Lazy create node manager clients, and destroy then (#11928) 2020-11-11 08:51:40 -08:00
Siyuan (Ryans) Zhuang b8dda0e3d0 [Serialization] Fix buffer alignment issues (#11888)
* fix buffer alignment issues

* remove unused fields

* aligned memory allocation

* windows compat

* license. fix compiler warnings

* fix compilation error

* reinterpret_cast
2020-11-10 23:44:16 -08:00
dHannasch 29cb32539e [Core] If failed to connect to redis, try to say why. (#11916) 2020-11-10 18:22:10 -08:00
fangfengbin 433e4f32da [GCS]Reduce get operations of worker table (#11599)
* [GCS]Reduce get operations of worker table

* fix ut bug

* fix ut bug

* fix review comment

Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-11-10 18:11:25 -08:00
Eric Liang 46f3652102 Remove repeat push timeout from object manager (#11874) 2020-11-10 16:26:53 -08:00
fangfengbin 543f7809a6 [GCS]Add gcs dump log(Part1) (#11727)
* add part code

* fix compile bug

* Fix bug

* Add part code

* fix review comment

* fix review comment

* fix lint error

* fix review comment

Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-11-10 14:10:03 +08:00
Eric Liang ee2da0cf45 [Core] PushManager for reliable broadcast (#11869) 2020-11-09 18:01:47 -08:00
Kai Yang 904f48ebd9 [Core] Multi-tenancy: Pass job ID from Raylet to worker via env variable (#11829)
* Pass job ID from Raylet to worker via env variable

* fix

* fix

* fix

* lint

* fix

* fix test_object_spilling

* address comments

* lint

* fix
2020-11-09 11:02:15 -08:00
Tao Wang 77e3163630 [GCS]Only pass node id to node failure detector (#11886)
* [GCS]Only pass node id to node failure detector

* rename
2020-11-09 10:52:33 -08:00
fangfengbin 407a212816 [GCS]Fix TestActorTableResubscribe bug (#11830)
* fix compile bug

* [GCS]Fix TestActorTableResubscribe bug

* rm unused code

* fix lint error

* fix review comment

* fix ut bug

Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-11-08 23:50:05 -08:00