Machine Learning to Design an Auto-tuning System for the Best Compressed Format Detection for Parallel Sparse Computations
Abstract
Many applications in scientific computing process very large sparse matrices on parallel architectures. The presented work in this paper is a part of a project where our general aim is to develop an auto-tuner system for the selection of the best matrix compression format in the context of high-performance computing. The target smart system can automatically select the best compression format for a given sparse matrix, a numerical method processing this matrix, a parallel programming model and a target architecture. Hence, this paper describes the design and implementation of the proposed concept. We consider a case study consisting of a numerical method reduced to the sparse matrix vector product (SpMV), some compression formats, the data parallel as a programming model and, a distributed multi-core platform as a target architecture. This study allows extracting a set of important novel metrics and parameters which are relative to the considered programming model. Our metrics are used as input to a machine-learning algorithm to predict the best matrix compression format. An experimental study targeting a distributed multi-core platform and processing random and real-world matrices shows that our system can improve in average up to 7% the accuracy of the machine learning.
Communicated by Sun-Yuan Hsieh
References
- 1. K. Akbudak, E. Kayaaslan and C. Aykanat, Hypergraph-Partitionning-Based Models and Methods for Exploiting Cache Locality in Sparse-Matrix Vector Multiplication, Technical Report, Bilkent University, Faculty of Engineering, Ankara, Turkey, February 2012. Google Scholar
- 2. , A survey of cross-validation procedures for model selection, Statist. Surv. 4 (2010) 40–79. https://doi.org/10.1214/09-SS054 Crossref, Google Scholar
- 3. , Runtime sparse matrix format selection, Procedia Computer Science 1(1) (2010) 135–144. Crossref, Google Scholar
- 4. , New Advances in Machine Learning (InTechOpen, 2010). Google Scholar
- 5. , Implementing sparse matrix-vector multiplication on throughput-oriented processors, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis,
Portland-Oregon ,November 2009 . Google Scholar - 6. , Parallel Scientific Computation: A Structured Approach Using BSP and MPI (Utrecht University, Oxford University Press, Technical Report, May 2004). Google Scholar
- 7. , Sparse matrix format selection with multiclass SVM for SpMV on GPU, 45th International Conference on Parallel Processing (ICPP),
Philadelphia, USA ,August 2016 . Google Scholar - 8. , Machine learning approach for the predicting performance of SpMV on GPU, IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS),
Wuhan, China ,December 2016 . Google Scholar - 9. , Machine learning: Supervised methods SVM and kNN, Nature Methods (Nature Publishing Group, 2018), Tech. Rep. hal-01657491. Crossref, ISI, Google Scholar
- 10. , Model-driven autotuning of sparse matrix-vector multiply on GPUs, Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming,
Bangalore, India ,January 2010 , pp. 115—126. Google Scholar - 11. , Implementing sparse matrix-vector multiplication using CUDA based on a hybrid sparse matrix format, International Conference on Computer Application and System Modeling (ICCASM 2010),
Taiwan, China ,October 2010 . Google Scholar - 12. , The University of Florida sparse matrix collection, ACM Transactions on Mathematical Software 38(1) (2011) 1–25. ISI, Google Scholar
- 13. S. Delage Santacreu, Introduction au Calcul Paralléle avec la Bibliothéque MPI (Message Passing Interface), Course, IDRIS (Institut du Développement et des Ressources en Informatique Scientifique), France, November 2008. Google Scholar
- 14. , An alternative compressed storage format for sparse matrices, Proceedings of International Symposium on Computer and Information Sciences,
Antalya, Turkey ,November 2003 , pp. 196–203. Google Scholar - 15. , An efficient storage format for large sparse matrices, Communications Series A1 Mathematics & Statistics 58(2) (2009) 1. Google Scholar
- 16. , A performance prediction and analysis integrated framework for SPMV on GPUs, Procedia Computer Science 80 (2016) 178–189. Crossref, Google Scholar
- 17. O. Hamdi-Larbi, Study of the Distribution on Large Scale System of Numerical Computation Dealing with Compressed Sparse Matrices, Ph.D. in Computer Science, FST (Tunisia)-UVSQ (France), March 2010. Google Scholar
- 18. , Comparison of MPI, OpenMP, and MPI + OpenMP on a Multi-Core AMD Shared Memory Multiprocessor Node, RenPar’19, Rencontre Francophone de Parallelisme,
Toulouse, France ,September 2009 . Google Scholar - 19. C. W. Hsu, C. C. Chang and C. J. Lin, A Practical Guide to Support Vector Classification, Taipei, 2016. Google Scholar
- 20. , Load balancing in pre-processing of large-scale distributed sparse computation, 8th LCI Conference on High Performance Clustered Computing,
California, USA ,2007 . Google Scholar - 21. A. Jain, pOSKI: An Extensible Autotuning Framework to Perform Optimized SPMVs on Multicore Architectures, Research Report, University of California, USA, 2008. Google Scholar
- 22. , Improving the performance of multithreaded sparse matrix-vector multiplication using index and value compression, 37th International Conference on Parallel Processing,
September 2008 . Google Scholar - 23. , Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation, IEEE 26th Inter. Parallel and Distributed Processing Symposium Workshops & PhD Forum,
Shanghai, China ,May 2012 , pp. 1696–1702. Google Scholar - 24. J. Koster, Parallel Templates for Numerical Linear Algebra, A High-Performance Computation Library, Ph.D., Utrecht University, Netherlands, July 2002. Google Scholar
- 25. , Supervised machine learning: A review of classification techniques, Informatica 31(3) (2007) 3–24. Google Scholar
- 26. , Performance prediction and ranking of SPMV kernels on GPU architectures, Proceedings of the 22nd International Conference on Euro-Par,
New York, USA ,August 2016 , pp. 90–102. Google Scholar - 27. , Adaptive runtime tuning of parallel sparse matrix-vector multiplication on distributed memory systems, Proceedings of the 22nd Annual International Conference on Supercomputing,
Island of Kos, Greece ,June 2008 , pp. 195–204. Google Scholar - 28. , SMAT: An input adaptive sparse matrix-vector multiplication auto-tuner, Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation,
Washington, USA ,June 2013 , pp. 117–126. Google Scholar - 29. I. Mehrez and O. Hamdi-Larbi, Sparse matrix vector product distribution on a cluster of multicore nodes, 3rd ALAMA Meeting (Linear Algebra, Matrix Analysis and Applications), Madrid-Spain, June 2012. Google Scholar
- 30. , Compromis Equilibrage des Charges-Optimisation des Communications pour la Distribution de Calcul Creux sur une Grappe Multi-cœurs, ComPAS’2013/RenPar’21,
Grenoble, France ,January 2013 . Google Scholar - 31. , SpMV distribution using hypergraph model and S-GBNZ algorithm, 8th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC),
Compiègne, France ,October 2013 . Google Scholar - 32. D. Meyer, Support Vector Machines: The Interface to libsvm in package e1071, Technical Report, University of Applied Sciences Technikum Wien, Austria, February 2017. Google Scholar
- 33. , Towards an auto-tuning system design for optimal sparse compression format selection with user expertise, 13th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA),
Agadir, Marocco ,November–December 2016 . Google Scholar - 34. , Understanding the performances of sparse compression formats using data parallel programming model, International Conference on High Performance Computing & Simulation (HPCS),
Genova, Italy ,2017 . Google Scholar - 35. , Optimal sparse compression format selection on multicore cluster, Second Tunisian Workshop on High Performance Computing (HPC),
Tunis, Tunisia ,April 2018 . Google Scholar - 36. , Machine learning for optimal compression format prediction on multiprocessor platform, International Conference on High Performance Computing & Simulation (HPCS),
Orléans-France ,July 2018 . Google Scholar - 37. , Understanding the performances of SpMV on multiprocessor platform, The 24th International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA),
Las Vegas, USA ,July–August 2018 . Google Scholar - 38. , Automatically tuning sparse matrix-vector multiplication for GPU Architectures, Proceedings of the 5th International Conference on High Performance Embedded Architectures and Compilers,
Pisa, Italy ,January 2010 , pp. 111–125. Google Scholar - 39. , An optimal storage format for sparse matrices, Information Processing Letters 90(2) (2004) 87–92. Crossref, ISI, Google Scholar
- 40. , Predicting an optimal sparse matrix format for SPMV computation on GPU, IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW),
AZ, USA ,May 2014 . Google Scholar - 41. , A data parallel scientific computing introduction, Computer Science 1132 (1996) 45–60. Google Scholar
- 42. , Iterative Methods for Sparse Linear Systems, Second Edition (The Society for Industrial and Applied Mathematics, 2003). Crossref, Google Scholar
- 43. Y. Saad, SPARSKIT: A Basic Tool Kit for Sparse Matrix Computations – Version 2, Technical Report, 1994. Google Scholar
- 44. O. Spillinger, D. Eliahu, A. Fox and J. Demmel, Matrix Multiplication Algorithm Selection with Support Vector Machines, Technical Report, University of California at Berkeley, 2015. Google Scholar
- 45. , Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems, Parallel Processing Letters 21 (2011) 339–358. Link, Google Scholar
- 46. ,
Performance limitations for sparse matrix-vector multiplications on current multicore environments , High Performance Computing in Science and Engineering (Munich, Germany, December 2009). Google Scholar - 47. , Blocked-based sparse matrix-vector multiplication on distributed memory parallel computers, International Arab Journal of Information Technology 8(2) (2011) 130–136. ISI, Google Scholar
- 48. , Automatic selection of sparse matrix representation on GPUs, Proceedings of the 29th ACM on International Conference on Supercomputing,
CA, USA ,June 2015 , pp. 99–108. Google Scholar - 49. , Non-uniform memory affinity strategy in multi-threaded sparse matrix computations, Proceedings of the 20th Symposium on High Performance Computing,
Orlando, USA ,March 2012 . Google Scholar - 50. , Introduction to Data Mining, 2nd Edition (2018). Google Scholar
- 51. , A new approach for sparse matrix vector product on NVIDIA GPUs, Concurrency and Computation: Practice and Experience 23(8) (2011) 815–826. Crossref, ISI, Google Scholar
- 52. , The sparse matrix vector product on GPUs, Proceedings of the 2009 International Conference on Computational and Mathematical Methods in Science and Engineering, Vol. 2,
June 2009 , pp. 1081–1092. Google Scholar - 53. , Performance prediction based on statistics of sparse matrix-vector multiplication on GPUs, Journal of Computer and Communications 5(6) (2017) 65–83. Crossref, Google Scholar
- 54. , Optimization of sparse matrix-vector multiplication on emerging multicore platforms, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing,
Nevada, USA ,November 2007 . Google Scholar - 55. , Cache-oblivious sparse matrix-vector multiplication by using sparse matrix partitioning methods, SIAM Journal on Scientific Computing 31 (2009) 3128–3154. Crossref, ISI, Google Scholar
- 56. , Two-dimentional cache-oblivious sparse matrix-vector multiplication, Parallel Computing 37 (2011) 806–819. Crossref, ISI, Google Scholar
- 57. , Machine learning approach for the predicting performance of SpMV on GPU, IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS),
Wuhan, China ,December 2016 . Google Scholar - 58. , Optimizing sparse matrix computations for register reuse in SPARSITY, in Proceedings of the International Conference on Computational Science,
LNCS , Vol. 2073 (2001), pp. 127–136. Crossref, Google Scholar - 59. J. Laval, L’avenir en accélération: le multicœurs devient la norme et l’informatique est poussée à l’extrême, 2011, http://newsroom.intel.com/docs/DOC-2372. Google Scholar
- 60. http://www.pcbureau.com/comment-fonctionnent-les-processeurs-multi-cores-504.html. Google Scholar
- 61. , Performance analysis and optimization for SpMV on GPU using probabilistic modeling, IEEE Transactions on Parallel and Distributed Systems 26(1) (2015). Crossref, ISI, Google Scholar
- 62. , A machine learning-based approach for selecting SpMV kernels and matrix storage formats, IEICE Transactions of Information & Systems E101-D(9) (2018). Crossref, ISI, Google Scholar
- 63. , Bridging the gap between deep learning and sparse matrix format selection, in PPoPP ’18, 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming,
Vienna, Austria ,February 24–28, 2018 (ACM, New York, NY, USA). Google Scholar - 64. , Elements of Machine Learning (Morgan Kaufmann, 1996). Google Scholar
- 65. , Building predictive models in R using the caret package, Journal of Statistical Software 28(5) (2008) 1–26. https://doi.org/http://dx.doi.org/10.18637/jss.v028.i05 Crossref, ISI, Google Scholar
- 66. , Kernlab, An S4 package for kernel methods in R, Journal of Statistical Software 11(9) (2004) 1–20, http://www.jstatsoft.org/v11/i09/. Crossref, Google Scholar
- 67. , A support vector machine classification model for benzo[c]phenathridine analogues with topoisomerase-I inhibitory activity, Molecules 17 (2012) 4560–4582. Crossref, ISI, Google Scholar
- 68. , Statistical and Machine-Learning Data Mining (CRC Press, 2019). Google Scholar
- 69. , Machine Learning vs Data Analysis (CRC Press, 2018). Google Scholar
- 70. Microsoft Azure, Microsoft Azure Machine Learning: Algorithms Cheat Sheet v7, https://aipiper.com/wp-content/uploads/2019/04/microsoft-machine-learning-algorithm-cheat-sheet-v7.pdf, (2015). Google Scholar
- 71. , Machine Learning and Its Applications (CRC Press, 2019). Crossref, Google Scholar


