Accelerating Data-Parallel Neural Network Training with Weighted-Averaging Reparameterisation
Abstract
Recent advances in artificial intelligence has shown a direct correlation between the performance of a network and the number of hidden layers within the network. The Compute Unified Device Architecture (CUDA) framework facilitates the movement of heavy computation from the CPU to the graphics processing unit (GPU) and is used to accelerate the training of neural networks. In this paper, we consider the problem of data-parallel neural network training. We compare the performance of training the same neural network on the GPU with and without data parallelism. When data parallelism is used, we compare with both the conventional averaging of coefficients and our proposed method. We set out to show that not all sub-networks are equal and thus, should not be treated as equals when normalising weight vectors. The proposed method achieved state of the art accuracy faster than conventional training along with better classification performance in some cases.
Communicated by Laura Antonelli
References
- 1. , A novel deep learning based framework for the detection and classification of breast cancer using transfer learning, Pattern Recognition Letters 125 (2019) 1–6. Crossref, ISI, Google Scholar
- 2. , An application of image classification to saltwater fish identification in Louisiana fisheries, in Proceedings of the 2019 3rd International Conference on Information System and Data Mining (ACM, 2019), pp. 129–132. Crossref, Google Scholar
- 3. , Understanding emotions in text using deep learning and big data, Computers in Human Behavior 93 (2019) 309–317. Crossref, ISI, Google Scholar
- 4. , Multi-task self-supervised learning for robust speech recognition, in 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020) (IEEE, 2020), pp. 6989–6993. Crossref, Google Scholar
- 5. , Deep learning for drug discovery and cancer research: Automated analysis of vascularization images, IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 16(3) (2019), 1029–1035. Crossref, Google Scholar
- 6. , Parallelization of digit recognition system using deep convolutional neural network on CUDA, in 2017 Third International Conference on Sensing, Signal Processing and Security (ICSSS) (IEEE, 2017), pp. 379–383. Crossref, Google Scholar
- 7. , Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2018). ISI, Google Scholar
- 8. , Recent advances in convolutional neural network acceleration, Neurocomputing 323 (2019) 37–51. Crossref, ISI, Google Scholar
- 9. , GPU-accelerated simulation of massive spatial data based on the modified planar rotator model, Mathematical Geosciences 52(1) (2020) 123–143. Crossref, ISI, Google Scholar
- 10. , NVIDIA CUDA C programming guide, NVIDIA Corporation 120(18) (2011) 8. Google Scholar
- 11. , On the performance of GoogLeNet and AlexNet applied to sketches, in Thirtieth AAAI Conference on Artificial Intelligence (
2016 ). Google Scholar - 12. ,
Identity mappings in deep residual networks , in European Conference on Computer Vision (Springer, 2016), pp. 630–645. Crossref, Google Scholar - 13. , Hashing supported iterative MapReduce based scalable SBE Reduct computation, in International Conference on Distributed Computing and Internet Technology (Springer, 2018), pp. 163–170. Crossref, Google Scholar
- 14. , Measurement-based evaluation of data-parallelism for OpenCV feature-detection algorithms, in 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC) (IEEE, 2018), pp. 701–710. Crossref, Google Scholar
- 15. , Delayed weight update for faster convergence in data-parallel deep learning, in 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP) (IEEE, 2018), pp. 663–667. Crossref, Google Scholar
- 16. P. Izmailov, D. Podoprikhin, T. Garipov, D. Vetrov and A. G. Wilson, Averaging weights leads to wider optima and better generalization, arXiv preprint arXiv:1803.05407 (2018). Google Scholar
- 17. , A hitchhiker’s guide on distributed training of deep neural networks, Journal of Parallel and Distributed Computing 137 (2020) 65–76. Crossref, ISI, Google Scholar
- 18. , Prototyping a gpgpu neural network for deep-learning big data analysis, Big Data Research 8 (2017) 50–56. Crossref, ISI, Google Scholar
- 19. , On calibration of modern neural networks, in Proceedings of the 34th International Conference on Machine Learning, Volume 70, JMLR.org2017, pp. 1321–1330. Google Scholar
- 20. ,
Weight normalization: A simple reparameterization to accelerate training of deep neural networks , in Advances in Neural Information Processing Systems, pp. 901–909. Google Scholar - 21. , Parallel back-propagation neural network training technique using CUDA on multiple GPUs, in 2015 IEEE MTT-S International Conference on Numerical Electromagnetic and Multiphysics Modeling and Optimization (NEMO) (IEEE, 2015), pp. 1–3. Crossref, Google Scholar
- 22. , Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems (O’Reilly Media, 2019). Google Scholar
- 23. , Caffe: Convolutional architecture for fast feature embedding, in Proceedings of the 22nd ACM International Conference on Multimedia (
2014 ), pp. 675–678. Google Scholar - 24. S. Lee, S. Purushwalkam, M. Cogswell, D. Crandall and D. Batra, Why M heads are better than one: Training a diverse ensemble of deep networks, arXiv preprint arXiv:1511.06314 (2015). Google Scholar
- 25. , Tensorflow: A system for large-scale machine learning, in 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (
2016 ), pp. 265–283. Google Scholar - 26. , Communication quantization for data-parallel training of deep neural networks, in 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC) (IEEE, 2016), pp. 1–8. Crossref, Google Scholar
- 27. , Rank, trace-norm and max-norm, in International Conference on Computational Learning Theory (Springer, 2005), pp. 545–560. Crossref, Google Scholar
- 28. , Analysis of function of rectified linear unit used in deep learning, in 2015 International Joint Conference on Neural Networks (IJCNN) (IEEE, 2015), pp. 1–8. Crossref, Google Scholar
- 29. ,
Gradient descent for spiking neural networks , in Advances in Neural Information Processing Systems (2018), pp. 1438–1448. Google Scholar - 30. , A direct adaptive method for faster backpropagation learning: The RPROP algorithm, in IEEE International Conference on Neural Networks (
1993 ) (IEEE, 1993), pp. 586–591. Crossref, Google Scholar - 31. , Back propagation neural network based cluster head identification in mimo sensor networks for intelligent transportation systems, IEEE Access 8 (2020) 28524–28532. Crossref, ISI, Google Scholar
- 32. , Deep Learning with Keras (Packt Publishing Ltd., 2017). Google Scholar


