World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

Accelerating Data-Parallel Neural Network Training with Weighted-Averaging Reparameterisation

    Recent advances in artificial intelligence has shown a direct correlation between the performance of a network and the number of hidden layers within the network. The Compute Unified Device Architecture (CUDA) framework facilitates the movement of heavy computation from the CPU to the graphics processing unit (GPU) and is used to accelerate the training of neural networks. In this paper, we consider the problem of data-parallel neural network training. We compare the performance of training the same neural network on the GPU with and without data parallelism. When data parallelism is used, we compare with both the conventional averaging of coefficients and our proposed method. We set out to show that not all sub-networks are equal and thus, should not be treated as equals when normalising weight vectors. The proposed method achieved state of the art accuracy faster than conventional training along with better classification performance in some cases.

    Communicated by Laura Antonelli

    References

    • 1. S.  Khan, N.  Islam, Z. Jan, I. U. Din and J. J. C. Rodrigues, A novel deep learning based framework for the detection and classification of breast cancer using transfer learning, Pattern Recognition Letters 125 (2019) 1–6. Crossref, ISIGoogle Scholar
    • 2. N.  Manandhar and J. W. Burris, An application of image classification to saltwater fish identification in Louisiana fisheries, in Proceedings of the 2019 3rd International Conference on Information System and Data Mining (ACM, 2019), pp. 129–132. CrossrefGoogle Scholar
    • 3. A.  Chatterjee, U. Gupta, M. K. Chinnakotla, R. Srikanth, M. Galley and P. Agrawal, Understanding emotions in text using deep learning and big data, Computers in Human Behavior 93 (2019) 309–317. Crossref, ISIGoogle Scholar
    • 4. M.  Ravanelli, J. Zhong, S. Pascual, P. Swietojanski, J. Monteiro, J. Trmal and Y. Bengio, Multi-task self-supervised learning for robust speech recognition, in 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020) (IEEE, 2020), pp. 6989–6993. CrossrefGoogle Scholar
    • 5. G. Urban, K. Bache, D. T. Phan, A. Sobrino, A. K. Shmakov, S. J. Hachey, C. C. Hughes and P. Baldi, Deep learning for drug discovery and cancer research: Automated analysis of vascularization images, IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 16(3) (2019), 1029–1035. CrossrefGoogle Scholar
    • 6. S.  Singh, A. Paul and M. Arun, Parallelization of digit recognition system using deep convolutional neural network on CUDA, in 2017 Third International Conference on Sensing, Signal Processing and Security (ICSSS) (IEEE, 2017), pp. 379–383. CrossrefGoogle Scholar
    • 7. C. Zhang, G. Sun, Z. Fang, P. Zhou, P. Pan and J. Cong, Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2018). ISIGoogle Scholar
    • 8. Q.  Zhang, M. Zhang, T. Chen, Z. Sun, Y. Ma and B. Yu, Recent advances in convolutional neural network acceleration, Neurocomputing 323 (2019) 37–51. Crossref, ISIGoogle Scholar
    • 9. M.  Žukovič, M. Borovskỳ, M. Lach and D. T. Hristopulos, GPU-accelerated simulation of massive spatial data based on the modified planar rotator model, Mathematical Geosciences 52(1) (2020) 123–143. Crossref, ISIGoogle Scholar
    • 10. C.  Nvidia, NVIDIA CUDA C programming guide, NVIDIA Corporation 120(18) (2011) 8. Google Scholar
    • 11. P. Ballester and R. M. Araujo, On the performance of GoogLeNet and AlexNet applied to sketches, in Thirtieth AAAI Conference on Artificial Intelligence (2016). Google Scholar
    • 12. K. He, X. Zhang, S. Ren and J. Sun, Identity mappings in deep residual networks, in European Conference on Computer Vision (Springer, 2016), pp. 630–645. CrossrefGoogle Scholar
    • 13. U. V. Divya and P. S. Prasad, Hashing supported iterative MapReduce based scalable SBE Reduct computation, in International Conference on Distributed Computing and Internet Technology (Springer, 2018), pp. 163–170. CrossrefGoogle Scholar
    • 14. J.  Danielsson, M. Jagemar, M. Behnam, M. Sjodin and T. Seceleanu, Measurement-based evaluation of data-parallelism for OpenCV feature-detection algorithms, in 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC) (IEEE, 2018), pp. 701–710. CrossrefGoogle Scholar
    • 15. T. Youkawa, H. Mori, Y. Miyauchi, K. Yamada, S. Izumi, M. Yoshimoto and H. Kawaguchi, Delayed weight update for faster convergence in data-parallel deep learning, in 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP) (IEEE, 2018), pp. 663–667. CrossrefGoogle Scholar
    • 16. P. Izmailov, D. Podoprikhin, T. Garipov, D. Vetrov and A. G. Wilson, Averaging weights leads to wider optima and better generalization, arXiv preprint arXiv:1803.05407 (2018). Google Scholar
    • 17. K. S. Chahal, M. S. Grover, K.  Dey and R. R. Shah, A hitchhiker’s guide on distributed training of deep neural networks, Journal of Parallel and Distributed Computing 137 (2020) 65–76. Crossref, ISIGoogle Scholar
    • 18. A.  Fonseca and B. Cabral, Prototyping a gpgpu neural network for deep-learning big data analysis, Big Data Research 8 (2017) 50–56. Crossref, ISIGoogle Scholar
    • 19. C.  Guo, G.  Pleiss, Y.  Sun and K. Q. Weinberger, On calibration of modern neural networks, in Proceedings of the 34th International Conference on Machine Learning, Volume 70, JMLR.org2017, pp. 1321–1330. Google Scholar
    • 20. T.  Salimans and D. P. Kingma, Weight normalization: A simple reparameterization to accelerate training of deep neural networks, in Advances in Neural Information Processing Systems, pp. 901–909. Google Scholar
    • 21. S.  Zhang, P. Gunupudi and Q.-J. Zhang, Parallel back-propagation neural network training technique using CUDA on multiple GPUs, in 2015 IEEE MTT-S International Conference on Numerical Electromagnetic and Multiphysics Modeling and Optimization (NEMO) (IEEE, 2015), pp. 1–3. CrossrefGoogle Scholar
    • 22. A.  Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems (O’Reilly Media, 2019). Google Scholar
    • 23. Y.  Jia, E.  Shelhamer, J.  Donahue, S.  Karayev, J.  Long, R.  Girshick, S.  Guadarrama and T.  Darrell, Caffe: Convolutional architecture for fast feature embedding, in Proceedings of the 22nd ACM International Conference on Multimedia (2014), pp. 675–678. Google Scholar
    • 24. S. Lee, S. Purushwalkam, M. Cogswell, D. Crandall and D. Batra, Why M heads are better than one: Training a diverse ensemble of deep networks, arXiv preprint arXiv:1511.06314 (2015). Google Scholar
    • 25. M.  Abadi, P.  Barham, J.  Chen, Z.  Chen, A.  Davis, J.  Dean, M.  Devin, S.  Ghemawat, G.  Irving, M.  Isard et al., Tensorflow: A system for large-scale machine learning, in 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (2016), pp. 265–283. Google Scholar
    • 26. N.  Dryden, T.  Moon, S. A. Jacobs and B.  Van Essen, Communication quantization for data-parallel training of deep neural networks, in 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC) (IEEE, 2016), pp. 1–8. CrossrefGoogle Scholar
    • 27. N.  Srebro and A. Shraibman, Rank, trace-norm and max-norm, in International Conference on Computational Learning Theory (Springer, 2005), pp. 545–560. CrossrefGoogle Scholar
    • 28. K.  Hara, D. Saito and H. Shouno, Analysis of function of rectified linear unit used in deep learning, in 2015 International Joint Conference on Neural Networks (IJCNN) (IEEE, 2015), pp. 1–8. CrossrefGoogle Scholar
    • 29. D.  Huh and T. J.  Sejnowski, Gradient descent for spiking neural networks, in Advances in Neural Information Processing Systems (2018), pp. 1438–1448. Google Scholar
    • 30. M.  Riedmiller and H. Braun, A direct adaptive method for faster backpropagation learning: The RPROP algorithm, in IEEE International Conference on Neural Networks (1993) (IEEE, 1993), pp. 586–591. CrossrefGoogle Scholar
    • 31. A.  Mukherjee, D. K. Jain, P.  Goswami, Q. Xin, L. Yang and J. J. Rodrigues, Back propagation neural network based cluster head identification in mimo sensor networks for intelligent transportation systems, IEEE Access 8 (2020) 28524–28532. Crossref, ISIGoogle Scholar
    • 32. A.  Gulli and S.  Pal, Deep Learning with Keras (Packt Publishing Ltd., 2017). Google Scholar