mirror of
https://github.com/wassname/ray.git
synced 2026-06-28 01:46:10 +08:00
af0c1174cd
This includes most of the TF code used for the OSDI experiment. Perf sanity check on p3.16xl instances: Overall scaling looks ok, with the multi-node results within 5% of OSDI final numbers. This seems reasonable given that hugepages are not enabled here, and the param server shards are placed randomly. $ RAY_USE_XRAY=1 ./test_sgd.py --gpu --batch-size=64 --num-workers=N \ --devices-per-worker=M --strategy=<simple|ps> \ --warmup --object-store-memory=10000000000 Images per second total gpus total | simple | ps ======================================== 1 | 218 2 (1 worker) | 388 4 (1 worker) | 759 4 (2 workers) | 176 | 623 8 (1 worker) | 985 8 (2 workers) | 349 | 1031 16 (2 nodes, 2 workers) | 600 | 1661 16 (2 nodes, 4 workers) | 468 | 1712 <--- OSDI perf was 1817