Friday, April 12, 2019

2019-04-12 Friday - Some Recent ImageNet ML Training Benchmarks

A few notes I've collected on recent ImageNet ML performance benchmark achievements - useful to help illustrate some architecture considerations - through a lens of infrastructure requirements - when trying to balance the potentially conflicting project constraints of accuracy/cost/performance.

This posting is a place-holder to provide a handy link for sharing some of those types of examples in the future.

Context:
2019:
  • New Technique Cuts AI Training Time By More Than 60 Percent
    • https://news.ncsu.edu/2019/04/new-technique-cuts-ai-training-time-by-more-than-60-percent/
      • "Adaptive Deep Reuse cut training time for AlexNet by 69 percent; for VGG-19 by 68 percent; and for CifarNet by 63 percent – all without accuracy loss."
      • "The paper, “Adaptive Deep Reuse: Accelerating CNN Training on the Fly,” will be presented at the 35th IEEE International Conference on Data Engineering, being held April 8-11 in Macau SAR, China. The work was done with support from the National Science Foundation under grant numbers CCF-1525609, CNS-1717425 and CCF-1703487." 
  • SenseTime Trains ImageNet/AlexNet In Record 1.5 minutes
    • https://medium.com/syncedreview/sensetime-trains-imagenet-alexnet-in-record-1-5-minutes-e944ab049b2c
      • "Researchers from Beijing-based AI unicorn SenseTime and Nanyang Technological University have trained ImageNet/AlexNet in a record-breaking 1.5 minutes, a significant 2.6 times speedup over the previous record of 4 minutes."
      • "...a single NVIDIA M40 GPU requires 14 days to complete 90-epoch ResNet-50 training"
      • "Researchers used 512 Volta GPUs for ImageNet/AlexNet training and achieved 58.2 percent accuracy in 1.5 minutes, with a corresponding training throughput of 1514.3k images/s and a 410.2 speedup ratio."
      • "The previous record was held by a Tencent Machine Learning (腾讯机智, Jizhi) team, which used 1024 GPUs to train AlexNet on the ImageNet dataset in 4 minutes.
2018:  
  • ImageNet Training in Minutes 
    • https://arxiv.org/abs/1709.05011
      • "Finishing 90-epoch ImageNet-1k training with ResNet-50 on a NVIDIA M40 GPU takes 14 days. This training requires 10^18 single precision operations in total. On the other hand, the world's current fastest supercomputer can finish 2 * 10^17 single precision operations per second (Dongarra et al 2017, this https URL). If we can make full use of the supercomputer for DNN training, we should be able to finish the 90-epoch ResNet-50 training in one minute. However, the current bottleneck for fast DNN training is in the algorithm level. Specifically, the current batch size (e.g. 512) is too small to make efficient use of many processors. For large-scale DNN training, we focus on using large-batch data-parallelism synchronous SGD without losing accuracy in the fixed epochs. The LARS algorithm (You, Gitman, Ginsburg, 2017, arXiv:1708.03888) enables us to scale the batch size to extremely large case (e.g. 32K). We finish the 100-epoch ImageNet training with AlexNet in 11 minutes on 1024 CPUs. About three times faster than Facebook's result (Goyal et al 2017, arXiv:1706.02677), we finish the 90-epoch ImageNet training with ResNet-50 in 20 minutes on 2048 KNLs without losing accuracy. State-of-the-art ImageNet training speed with ResNet-50 is 74.9% top-1 test accuracy in 15 minutes. We got 74.9% top-1 test accuracy in 64 epochs, which only needs 14 minutes. "  
  • Now anyone can train Imagenet in 18 minutes
    • https://www.fast.ai/2018/08/10/fastai-diu-imagenet/ 
      • "...train Imagenet to 93% accuracy in just 18 minutes, using 16 public AWS cloud instances, each with 8 NVIDIA V100 GPUs, running the fastai and PyTorch libraries. This is a new speed record for training Imagenet to this accuracy on publicly available infrastructure, and is 40% faster than Google’s DAWNBench record on their proprietary TPU Pod cluster. Our approach uses the same number of processing units as Google’s benchmark (128) and costs around $40 to run."
  • Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour 
    • https://arxiv.org/pdf/1706.02677.pdf
      • "In this paper, we empirically show that on the ImageNet dataset large minibatches cause optimization difficulties, but when these are addressed the trained networks exhibit good generalization. Specifically, we show no loss of accuracy when training with large minibatch sizes up to 8192 images. To achieve this result, we adopt a hyperparameter-free linear scaling rule for adjusting learning rates as a function of minibatch size and develop a new warmup scheme that overcomes optimization challenges early in training. With these simple techniques, our Caffe2-based system trains ResNet-50 with a minibatch size of 8192 on 256 GPUs in one hour, while matching small minibatch accuracy. Using commodity hardware, our implementation achieves ∼90% scaling efficiency when moving from 8 to 256 GPUs."
2015: 

Additional Useful Resources/References:



Other Interesting Articles:

2018:

No comments:

Copyright

© 2001-2021 International Technology Ventures, Inc., All Rights Reserved.