From b70f31339c40a6aaff3fc1a67515fa9cf2c5ee9b Mon Sep 17 00:00:00 2001 From: Richard Liaw Date: Wed, 11 Mar 2020 13:08:27 -0700 Subject: [PATCH] [sgd] Benchmark Fixes (#7553) * fix * fix --- doc/source/raysgd/raysgd_pytorch.rst | 16 +++++++------- .../sgd/torch/examples/benchmarks/README.rst | 22 +++++++++---------- 2 files changed, 19 insertions(+), 19 deletions(-) diff --git a/doc/source/raysgd/raysgd_pytorch.rst b/doc/source/raysgd/raysgd_pytorch.rst index a7cc457aa..06b0a97ed 100644 --- a/doc/source/raysgd/raysgd_pytorch.rst +++ b/doc/source/raysgd/raysgd_pytorch.rst @@ -544,10 +544,10 @@ RaySGD TorchTrainer provides comparable or better performance than other existin Number DataParallel Ray (PyTorch) DataParallel Ray (PyTorch) of GPUs + Apex + Apex ======= ============ ============= ============ ============== - 1 2769.7 5143 2962.7 6172 - 2 5492.2 9463 5886.1 10052.8 - 4 10733.4 18807 11705.9 20319.5 - 8 21872.5 36911.8 23317.9 38642 + 1 355.5 356 776 770 + 2 656 701 1303 1346 + 4 1289 1401 2606 2695 + 8 2521 2795 4795 5862 **Multi-node benchmarks**: @@ -561,10 +561,10 @@ RaySGD TorchTrainer provides comparable or better performance than other existin Number Horovod Ray (PyTorch) Horovod Ray (PyTorch) of GPUs + Apex + Apex ======= ======= ============= ======= ============== - 1 * 8 2769.7 5143 2962.7 6172 - 2 * 8 5492.2 9463 5886.1 10052.8 - 4 * 8 10733.4 18807 11705.9 20319.5 - 8 * 8 21872.5 36911.8 23317.9 38642 + 1 * 8 2769.7 2962.7 5143 6172 + 2 * 8 5492.2 5886.1 9463 10052.8 + 4 * 8 10733.4 11705.9 18807 20319.5 + 8 * 8 21872.5 23317.9 36911.8 38642 diff --git a/python/ray/util/sgd/torch/examples/benchmarks/README.rst b/python/ray/util/sgd/torch/examples/benchmarks/README.rst index 42aa31f96..0b532dba5 100644 --- a/python/ray/util/sgd/torch/examples/benchmarks/README.rst +++ b/python/ray/util/sgd/torch/examples/benchmarks/README.rst @@ -13,7 +13,7 @@ Single Node Results Here are benchmarking results comparing the following: * torch.nn.DataParallel -* torch.nn.Parallel with ``apex.amp`` enabled (``O1``) +* torch.nn.DataParallel with ``apex.amp`` enabled (``O1``) * Ray (wrapping Pytorch DistributedDataParallel) * Ray (wrapping Pytorch DistributedDataParallel) with ``apex.amp`` enabled (``O1``) @@ -36,10 +36,10 @@ Framework versions used: Number DataParallel Ray (PyTorch) DataParallel Ray (PyTorch) of GPUs + Apex + Apex ======= ============ ============= ============ ============== - 1 2769.7 5143 2962.7 6172 - 2 5492.2 9463 5886.1 10052.8 - 4 10733.4 18807 11705.9 20319.5 - 8 21872.5 36911.8 23317.9 38642 + 1 355.5 356 776 770 + 2 656 701 1303 1346 + 4 1289 1401 2606 2695 + 8 2521 2795 4795 5862 .. image:: raysgd_multigpu_benchmark.png @@ -54,8 +54,8 @@ Here are benchmarking results comparing the following: * Horovod * Horovod with ``apex.amp`` enabled (``O1``) -* Pytorch DistributedDataParallel -* Pytorch DistributedDataParallel with ``apex.amp`` enabled (``O1``) +* Ray (wrapping Pytorch DistributedDataParallel) +* Ray (wrapping Pytorch DistributedDataParallel) with ``apex.amp`` enabled (``O1``) on synthetic ImageNet data (via ``benchmark.py`` and ``horovod_benchmark_apex.py``) as of 03/04/2020. @@ -77,10 +77,10 @@ Framework versions used: Number Horovod Ray (PyTorch) Horovod Ray (PyTorch) of GPUs + Apex + Apex ======= ======= ============= ======= ============== - 1 * 8 2769.7 5143 2962.7 6172 - 2 * 8 5492.2 9463 5886.1 10052.8 - 4 * 8 10733.4 18807 11705.9 20319.5 - 8 * 8 21872.5 36911.8 23317.9 38642 + 1 * 8 2769.7 2962.7 5143 6172 + 2 * 8 5492.2 5886.1 9463 10052.8 + 4 * 8 10733.4 11705.9 18807 20319.5 + 8 * 8 21872.5 23317.9 36911.8 38642 .. image:: raysgd_multinode_benchmark.png