World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

Machine Learning to Design an Auto-tuning System for the Best Compressed Format Detection for Parallel Sparse Computations

    Many applications in scientific computing process very large sparse matrices on parallel architectures. The presented work in this paper is a part of a project where our general aim is to develop an auto-tuner system for the selection of the best matrix compression format in the context of high-performance computing. The target smart system can automatically select the best compression format for a given sparse matrix, a numerical method processing this matrix, a parallel programming model and a target architecture. Hence, this paper describes the design and implementation of the proposed concept. We consider a case study consisting of a numerical method reduced to the sparse matrix vector product (SpMV), some compression formats, the data parallel as a programming model and, a distributed multi-core platform as a target architecture. This study allows extracting a set of important novel metrics and parameters which are relative to the considered programming model. Our metrics are used as input to a machine-learning algorithm to predict the best matrix compression format. An experimental study targeting a distributed multi-core platform and processing random and real-world matrices shows that our system can improve in average up to 7% the accuracy of the machine learning.

    Communicated by Sun-Yuan Hsieh

    References

    • 1. K. Akbudak, E. Kayaaslan and C. Aykanat, Hypergraph-Partitionning-Based Models and Methods for Exploiting Cache Locality in Sparse-Matrix Vector Multiplication, Technical Report, Bilkent University, Faculty of Engineering, Ankara, Turkey, February 2012. Google Scholar
    • 2. S. Arlot and A. Celisse, A survey of cross-validation procedures for model selection, Statist. Surv. 4 (2010) 40–79. https://doi.org/10.1214/09-SS054 CrossrefGoogle Scholar
    • 3. W. Armstronga and Alistair P. Rendella, Runtime sparse matrix format selection, Procedia Computer Science 1(1) (2010) 135–144. CrossrefGoogle Scholar
    • 4. T. O. Ayodele, New Advances in Machine Learning (InTechOpen, 2010). Google Scholar
    • 5. N. Bell and M. Garland, Implementing sparse matrix-vector multiplication on throughput-oriented processors, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, Portland-Oregon, November 2009. Google Scholar
    • 6. R. Bisseling, Parallel Scientific Computation: A Structured Approach Using BSP and MPI (Utrecht University, Oxford University Press, Technical Report, May 2004). Google Scholar
    • 7. A. Benatia, W. Ji, Y. Wang and F. Shi, Sparse matrix format selection with multiclass SVM for SpMV on GPU, 45th International Conference on Parallel Processing (ICPP), Philadelphia, USA, August 2016. Google Scholar
    • 8. A. Benatia, W. Ji, Y. Wang and F. Shi, Machine learning approach for the predicting performance of SpMV on GPU, IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS), Wuhan, China, December 2016. Google Scholar
    • 9. D. Bzdok, M. Krzywinski and N. Altman, Machine learning: Supervised methods SVM and kNN, Nature Methods (Nature Publishing Group, 2018), Tech. Rep. hal-01657491. Crossref, ISIGoogle Scholar
    • 10. J. W. Choi, A. Singh and R. W. Vuduc, Model-driven autotuning of sparse matrix-vector multiply on GPUs, Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Bangalore, India, January 2010, pp. 115—126. Google Scholar
    • 11. W. Cao, L. Yao, Z. Li, Y. Wang and Z. Wang, Implementing sparse matrix-vector multiplication using CUDA based on a hybrid sparse matrix format, International Conference on Computer Application and System Modeling (ICCASM 2010), Taiwan, China, October 2010. Google Scholar
    • 12. T. Davis and Y. Hu, The University of Florida sparse matrix collection, ACM Transactions on Mathematical Software 38(1) (2011) 1–25. ISIGoogle Scholar
    • 13. S. Delage Santacreu, Introduction au Calcul Paralléle avec la Bibliothéque MPI (Message Passing Interface), Course, IDRIS (Institut du Développement et des Ressources en Informatique Scientifique), France, November 2008. Google Scholar
    • 14. A. Ekambaram and E. Montagne, An alternative compressed storage format for sparse matrices, Proceedings of International Symposium on Computer and Information Sciences, Antalya, Turkey, November 2003, pp. 196–203. Google Scholar
    • 15. A. Farzaneh, H. Kheiri and M. Abbaspour Shahmersi, An efficient storage format for large sparse matrices, Communications Series A1 Mathematics & Statistics 58(2) (2009) 1. Google Scholar
    • 16. P. Guo and C. W. Lee, A performance prediction and analysis integrated framework for SPMV on GPUs, Procedia Computer Science 80 (2016) 178–189. CrossrefGoogle Scholar
    • 17. O. Hamdi-Larbi, Study of the Distribution on Large Scale System of Numerical Computation Dealing with Compressed Sparse Matrices, Ph.D. in Computer Science, FST (Tunisia)-UVSQ (France), March 2010. Google Scholar
    • 18. K. Hamidouche, F. Cappello and D. Etiemble, Comparison of MPI, OpenMP, and MPI + OpenMP on a Multi-Core AMD Shared Memory Multiprocessor Node, RenPar’19, Rencontre Francophone de Parallelisme, Toulouse, France, September 2009. Google Scholar
    • 19. C. W. Hsu, C. C. Chang and C. J. Lin, A Practical Guide to Support Vector Classification, Taipei, 2016. Google Scholar
    • 20. O. Hamdi-Larbi, Z. Mahjoub and N. Emad, Load balancing in pre-processing of large-scale distributed sparse computation, 8th LCI Conference on High Performance Clustered Computing, California, USA, 2007. Google Scholar
    • 21. A. Jain, pOSKI: An Extensible Autotuning Framework to Perform Optimized SPMVs on Multicore Architectures, Research Report, University of California, USA, 2008. Google Scholar
    • 22. K. Kourtis, G. Goumas and N. Koziris, Improving the performance of multithreaded sparse matrix-vector multiplication using index and value compression, 37th International Conference on Parallel Processing, September 2008. Google Scholar
    • 23. M. Kreutzer, G. Hager, G. Wellein, H. Fehske, A. Basermann and A. R. Bishop, Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation, IEEE 26th Inter. Parallel and Distributed Processing Symposium Workshops & PhD Forum, Shanghai, China, May 2012, pp. 1696–1702. Google Scholar
    • 24. J. Koster, Parallel Templates for Numerical Linear Algebra, A High-Performance Computation Library, Ph.D., Utrecht University, Netherlands, July 2002. Google Scholar
    • 25. S. B. Kotsiantis, Supervised machine learning: A review of classification techniques, Informatica 31(3) (2007) 3–24. Google Scholar
    • 26. C. Lehnert, R. Berrendorf, J. P. Ecker and F. Mannuss, Performance prediction and ranking of SPMV kernels on GPU architectures, Proceedings of the 22nd International Conference on Euro-Par, New York, USA, August 2016, pp. 90–102. Google Scholar
    • 27. S. Lee and R. Eigenmann, Adaptive runtime tuning of parallel sparse matrix-vector multiplication on distributed memory systems, Proceedings of the 22nd Annual International Conference on Supercomputing, Island of Kos, Greece, June 2008, pp. 195–204. Google Scholar
    • 28. J. Li, X. Zhang, G. Tan and M. Chen, SMAT: An input adaptive sparse matrix-vector multiplication auto-tuner, Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, Washington, USA, June 2013, pp. 117–126. Google Scholar
    • 29. I. Mehrez and O. Hamdi-Larbi, Sparse matrix vector product distribution on a cluster of multicore nodes, 3rd ALAMA Meeting (Linear Algebra, Matrix Analysis and Applications), Madrid-Spain, June 2012. Google Scholar
    • 30. I. Mehrez and O. Hamdi-Larbi, Compromis Equilibrage des Charges-Optimisation des Communications pour la Distribution de Calcul Creux sur une Grappe Multi-cœurs, ComPAS’2013/RenPar’21, Grenoble, France, January 2013. Google Scholar
    • 31. I. Mehrez and O. Hamdi-Larbi, SpMV distribution using hypergraph model and S-GBNZ algorithm, 8th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), Compiègne, France, October 2013. Google Scholar
    • 32. D. Meyer, Support Vector Machines: The Interface to libsvm in package e1071, Technical Report, University of Applied Sciences Technikum Wien, Austria, February 2017. Google Scholar
    • 33. I. Mehrez, O. Hamdi-Larbi, T. Dufaud and N. Emad, Towards an auto-tuning system design for optimal sparse compression format selection with user expertise, 13th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA), Agadir, Marocco, November–December 2016. Google Scholar
    • 34. I. Mehrez, O. Hamdi-Larbi, T. Dufaud and N. Emad, Understanding the performances of sparse compression formats using data parallel programming model, International Conference on High Performance Computing & Simulation (HPCS), Genova, Italy, 2017. Google Scholar
    • 35. I. Mehrez, O. Hamdi-Larbi, T. Dufaud and N. Emad, Optimal sparse compression format selection on multicore cluster, Second Tunisian Workshop on High Performance Computing (HPC), Tunis, Tunisia, April 2018. Google Scholar
    • 36. I. Mehrez, O. Hamdi-Larbi, T. Dufaud and N. Emad, Machine learning for optimal compression format prediction on multiprocessor platform, International Conference on High Performance Computing & Simulation (HPCS), Orléans-France, July 2018. Google Scholar
    • 37. I. Mehrez, O. Hamdi-Larbi, T. Dufaud and N. Emad, Understanding the performances of SpMV on multiprocessor platform, The 24th International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), Las Vegas, USA, July–August 2018. Google Scholar
    • 38. A. Monakov, A. Lokhmotov and A. Avetisyan, Automatically tuning sparse matrix-vector multiplication for GPU Architectures, Proceedings of the 5th International Conference on High Performance Embedded Architectures and Compilers, Pisa, Italy, January 2010, pp. 111–125. Google Scholar
    • 39. E. Montagne and A. Ekambaram, An optimal storage format for sparse matrices, Information Processing Letters 90(2) (2004) 87–92. Crossref, ISIGoogle Scholar
    • 40. B. Neelima, G. R. M. Reddy and P. S. Raghavendra, Predicting an optimal sparse matrix format for SPMV computation on GPU, IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), AZ, USA, May 2014. Google Scholar
    • 41. S. Petiton and N. Emad, A data parallel scientific computing introduction, Computer Science 1132 (1996) 45–60. Google Scholar
    • 42. Y. Saad, Iterative Methods for Sparse Linear Systems, Second Edition (The Society for Industrial and Applied Mathematics, 2003). CrossrefGoogle Scholar
    • 43. Y. Saad, SPARSKIT: A Basic Tool Kit for Sparse Matrix Computations – Version 2, Technical Report, 1994. Google Scholar
    • 44. O. Spillinger, D. Eliahu, A. Fox and J. Demmel, Matrix Multiplication Algorithm Selection with Support Vector Machines, Technical Report, University of California at Berkeley, 2015. Google Scholar
    • 45. G. Schubert, H. Fehske, G. Hager and G. Wellein, Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems, Parallel Processing Letters 21 (2011) 339–358. LinkGoogle Scholar
    • 46. G. Schubert, G. Hager and H. Fehske, Performance limitations for sparse matrix-vector multiplications on current multicore environments, High Performance Computing in Science and Engineering (Munich, Germany, December 2009). Google Scholar
    • 47. R. Shahnaz and A. Usman, Blocked-based sparse matrix-vector multiplication on distributed memory parallel computers, International Arab Journal of Information Technology 8(2) (2011) 130–136. ISIGoogle Scholar
    • 48. N. Sedaghati, T. Mu, L. N. Pouchet, S. Parthasarathy and P. Sadayappan, Automatic selection of sparse matrix representation on GPUs, Proceedings of the 29th ACM on International Conference on Supercomputing, CA, USA, June 2015, pp. 99–108. Google Scholar
    • 49. A. Srinivasa and M. Sosonkina, Non-uniform memory affinity strategy in multi-threaded sparse matrix computations, Proceedings of the 20th Symposium on High Performance Computing, Orlando, USA, March 2012. Google Scholar
    • 50. P. N. Tan, M. Steinbach, A. Karpatne and V. Kumar, Introduction to Data Mining, 2nd Edition (2018). Google Scholar
    • 51. F. Vázquez, J. J. Fernández and E. M. Garzon, A new approach for sparse matrix vector product on NVIDIA GPUs, Concurrency and Computation: Practice and Experience 23(8) (2011) 815–826. Crossref, ISIGoogle Scholar
    • 52. F. Vázquez, E. M. Garzon, J. A. Martinez and J. J. Fernández, The sparse matrix vector product on GPUs, Proceedings of the 2009 International Conference on Computational and Mathematical Methods in Science and Engineering, Vol. 2, June 2009, pp. 1081–1092. Google Scholar
    • 53. R. Wang, T. Gu and M. Li, Performance prediction based on statistics of sparse matrix-vector multiplication on GPUs, Journal of Computer and Communications 5(6) (2017) 65–83. CrossrefGoogle Scholar
    • 54. S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick and J. Demmel, Optimization of sparse matrix-vector multiplication on emerging multicore platforms, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, Nevada, USA, November 2007. Google Scholar
    • 55. A. J. Yzelmen and R. Bisseling, Cache-oblivious sparse matrix-vector multiplication by using sparse matrix partitioning methods, SIAM Journal on Scientific Computing 31 (2009) 3128–3154. Crossref, ISIGoogle Scholar
    • 56. A. J. Yzelmen and R. Bisseling, Two-dimentional cache-oblivious sparse matrix-vector multiplication, Parallel Computing 37 (2011) 806–819. Crossref, ISIGoogle Scholar
    • 57. A. Benatia, W. Ji, Y. Wang and F. Shi, Machine learning approach for the predicting performance of SpMV on GPU, IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS), Wuhan, China, December 2016. Google Scholar
    • 58. E. Im and K. Yelick, Optimizing sparse matrix computations for register reuse in SPARSITY, in Proceedings of the International Conference on Computational Science, LNCS, Vol. 2073 (2001), pp. 127–136. CrossrefGoogle Scholar
    • 59. J. Laval, L’avenir en accélération: le multicœurs devient la norme et l’informatique est poussée à l’extrême, 2011, http://newsroom.intel.com/docs/DOC-2372. Google Scholar
    • 60. http://www.pcbureau.com/comment-fonctionnent-les-processeurs-multi-cores-504.html. Google Scholar
    • 61. K. Li, W. Yang and K. Li, Performance analysis and optimization for SpMV on GPU using probabilistic modeling, IEEE Transactions on Parallel and Distributed Systems 26(1) (2015). Crossref, ISIGoogle Scholar
    • 62. H. Cui, S. Hirasawa, H. Kobayashi and H. Takizawa, A machine learning-based approach for selecting SpMV kernels and matrix storage formats, IEICE Transactions of Information & Systems E101-D(9) (2018). Crossref, ISIGoogle Scholar
    • 63. Y. Zhao, J. Li, C. Liao and X. Shen, Bridging the gap between deep learning and sparse matrix format selection, in PPoPP ’18, 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Vienna, Austria, February 24–28, 2018 (ACM, New York, NY, USA). Google Scholar
    • 64. P. Langley, Elements of Machine Learning (Morgan Kaufmann, 1996). Google Scholar
    • 65. M. Kuhn, Building predictive models in R using the caret package, Journal of Statistical Software 28(5) (2008) 1–26. https://doi.org/http://dx.doi.org/10.18637/jss.v028.i05 Crossref, ISIGoogle Scholar
    • 66. A. Karatzoglou, A. Smola, K. Hornik and A. Zeileis, Kernlab, An S4 package for kernel methods in R, Journal of Statistical Software 11(9) (2004) 1–20, http://www.jstatsoft.org/v11/i09/. CrossrefGoogle Scholar
    • 67. K.-M. Thai, T.-Q. Nguyen, T.-D. Ngo, T.-D. Tran and T.-N.-P. Huynh, A support vector machine classification model for benzo[c]phenathridine analogues with topoisomerase-I inhibitory activity, Molecules 17 (2012) 4560–4582. Crossref, ISIGoogle Scholar
    • 68. B. Ratne, Statistical and Machine-Learning Data Mining (CRC Press, 2019). Google Scholar
    • 69. G. Dong and H. Liu, Machine Learning vs Data Analysis (CRC Press, 2018). Google Scholar
    • 70. Microsoft Azure, Microsoft Azure Machine Learning: Algorithms Cheat Sheet v7, https://aipiper.com/wp-content/uploads/2019/04/microsoft-machine-learning-algorithm-cheat-sheet-v7.pdf, (2015). Google Scholar
    • 71. P. Wlodarczak, Machine Learning and Its Applications (CRC Press, 2019). CrossrefGoogle Scholar