[xray] raylet scheduling mechanism with a simple spillback policy (#2749)

## What do these changes do?
* distribute load and resource information on a heartbeat
* for each raylet, maintain total and available resource capacity as well as measure of current load
* this PR introduces a new notion of load, defined as a sum of all resource demand induced by queued ready tasks on the local raylet. This provides a heterogeneity-aware measure of load that supersedes legacy Ray's task count as a proxy for load.
* modify the scheduling policy to perform *capacity-based*, *load-aware*, *optimistically concurrent* resource allocation
* perform task spillover to the heartbeating node in response to a heartbeat, implementing  heterogeneity-aware late-binding/work-stealing.
This commit is contained in:
Alexey Tumanov
2018-08-28 00:03:34 -07:00
committed by Robert Nishihara
parent 90ae8f11df
commit de047daea7
17 changed files with 585 additions and 174 deletions
+1 -1
View File
@@ -337,7 +337,7 @@ class Monitor(object):
static_resources[static] = message.ResourcesTotalCapacity(i)
# Update the load metrics for this local scheduler.
client_id = message.ClientId().decode("utf-8")
client_id = ray.utils.binary_to_hex(message.ClientId())
ip = self.local_scheduler_id_to_ip_map.get(client_id)
if ip:
self.load_metrics.update(ip, static_resources, dynamic_resources)