https://www.nextplatform.com/2017/10/13/new-optimizations-improve-deep-learning-frameworks-cpus/
"Intel has been reported to claim that processing in BigDL is “orders of magnitude faster than out-of-box open source Caffe, Torch, or TensorFlow on a single-node Xeon processor (i.e., comparable with mainstream GPU).”
2017-08-09
TensorFlow* Optimizations on Modern Intel® Architecture
https://software.intel.com/en-us/articles/tensorflow-optimizations-on-modern-intel-architecture
"TensorFlow benchmarks, with CPU optimizations added, see CPU performance gain as much as 72X"
A paper presented during the 2017 International Conference on Machine Learning (ICML)
- Deep Tensor Convolution on Multicores
- https://arxiv.org/abs/1611.06565
- "...Another important reason to look at CPUs is when batch size is 1, as may be the case in Reinforcement Learning, where it is not worthwhile to move data between CPU and GPU."
- "Deep convolutional neural networks (ConvNets) of 3-dimensional kernels allow
joint modeling of spatiotemporal features. These networks have improved
performance of video and volumetric image analysis, but have been limited in
size due to the low memory ceiling of GPU hardware. Existing CPU
implementations overcome this constraint but are impractically slow. Here we
extend and optimize the faster Winograd-class of convolutional algorithms to
the
N -dimensional case and specifically for CPU hardware. First, we remove the need to manually hand-craft algorithms by exploiting the relaxed constraints and cheap sparse access of CPU memory. Second, we maximize CPU utilization and multicore scalability by transforming data matrices to be cache-aware, integer multiples of AVX vector widths. Treating 2-dimensional ConvNets as a special (and the least beneficial) case of our approach, we demonstrate a 5 to 25-fold improvement in throughput compared to previous state-of-the-art."
No comments:
Post a Comment