World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

Batched Computation of the Singular Value Decompositions of Order Two by the AVX-512 Vectorization

    In this paper a vectorized algorithm for simultaneously computing up to eight singular value decompositions (SVDs, each of the form A=UΣV) of real or complex matrices of order two is proposed. The algorithm extends to a batch of matrices of an arbitrary length n, that arises, for example, in the annihilation part of the parallel Kogbetliantz algorithm for the SVD of matrices of order 2n. The SVD method for a single matrix of order two is derived first. It scales, in most instances error-free, the input matrix A such that the scaled singular values cannot overflow whenever the elements of A are finite, and then computes the URV factorization of the scaled matrix, followed by the SVD of the non-negative upper-triangular middle factor. A vector-friendly data layout for the batch is then introduced, where the same-indexed elements of each of the input and the output matrices form vectors, and the algorithm’s steps over such vectors are described. The vectorized approach is shown to be about three times faster than processing each matrix in the batch separately, while slightly improving accuracy over the straightforward method for the 2×2 SVD.

    Supplementary material is available in https://github.com/venovako/VecKog repository.

    This work is dedicated to the memory of Saša Singer.

    AMSC: 65F15, 65Y05, 65Y10

    References

    • 1. E. G. Kogbetliantz, Solution of linear equations by diagonalization of coefficients matrix, Quart. Appl. Math. 13(2) (1955) 123–132. CrossrefGoogle Scholar
    • 2. M. Bečka, G. Okša and M. Vajteršic, Dynamic ordering for a parallel block-Jacobi SVD algorithm, Parallel Comp. 28(2) (2002) 243–262. Crossref, ISIGoogle Scholar
    • 3. G. Okša, Y. Yamamoto, M. Bečka and M. Vajteršic, Asymptotic quadratic convergence of the two-sided serial and parallel block-Jacobi SVD algorithm, SIAM J. Matrix Anal. and Appl. 40(2) (2019) 639–671. Crossref, ISIGoogle Scholar
    • 4. V. Novaković and S. Singer, A Kogbetliantz-type algorithm for the hyperbolic SVD arXiv:2003.06701v1 [math.NA] (March, 2020). Google Scholar
    • 5. ISO/IEC JTC1/SC22/WG14, ISO/IEC 9899:2018(en) Information Technology — Programming Languages — C, 4th edn. (ISO, 2018). International standard. Google Scholar
    • 6. Intel Corp., Intel® and IA-32 Architectures Software Developer’s Manual (May, 2020). https://software.intel.com/en-us/articles/intel-sdm (Order No. 325462-072US, Combined Volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D and 4). Google Scholar
    • 7. S. Singer, E. Di Napoli, V. Novaković and G. Čaklović, The LAPW method with eigendecomposition based on the Hari–Zimmermann generalized hyperbolic SVD, SIAM J. Sci. Comput. 42(5) (2020) C265–C293. CrossrefGoogle Scholar
    • 8. V. Hari, On the global convergence of the complex HZ method, SIAM J. Matrix Anal. Appl. 40(4) (2019) 1291–1310. Crossref, ISIGoogle Scholar
    • 9. V. Hari, Globally convergent Jacobi methods for positive definite matrix pairs, Numer. Algorithms 79(1) (2018) 221–249. Crossref, ISIGoogle Scholar
    • 10. V. Novaković, S. Singer and S. Singer, Blocking and parallelization of the Hari–Zimmermann variant of the Falk–Langemeyer algorithm for the generalized SVD, Parallel Comput. 49 (2015) 136–152. Crossref, ISIGoogle Scholar
    • 11. V. Novaković, Parallel Jacobi-type algorithms for the singular and the generalized singular value decomposition, PhD thesis, University of Zagreb, Croatia (https://urn.nsk.hr/urn:nbn:hr:217:515320, December 2017). Google Scholar
    • 12. Z. Drmač, Implementation of Jacobi rotations for accurate singular value computation in floating point arithmetic, SIAM J. Sci. Comput. 18(4) (1997) 1200–1222. Crossref, ISIGoogle Scholar
    • 13. V. Hari, S. Singer and S. Singer, Block-oriented J-Jacobi methods for Hermitian matrices, Linear Algebra Appl. 433(8–10) (2010) 1491–1512. Crossref, ISIGoogle Scholar
    • 14. V. Hari, S. Singer and S. Singer, Full block J-Jacobi method for Hermitian matrices, Linear Algebra Appl. 444 (2014) 1–27. Crossref, ISIGoogle Scholar
    • 15. A. Haidar, A. Abdelfattah, M. Zounon, S. Tomov and J. Dongarra, A guide for achieving high performance with very small matrices on GPU: A case study of batched LU and Cholesky factorizations, IEEE Trans. Parallel Distrib. Syst. 29(5) (2018) 973–984. Crossref, ISIGoogle Scholar
    • 16. J. Dongarra, M. Gates, J. Kurzak, P. Luszczek and Y. M. Tsai, Autotuning numerical dense linear algebra for batched computation with GPU hardware accelerators, Proc. IEEE 106(11) (2018) 2040–2055. Crossref, ISIGoogle Scholar
    • 17. A. Abdelfattah, S. Tomov and J. Dongarra, Matrix multiplication on batches of small matrices in half and half-complex precisions, J. Parallel Distrib. Comput. 145 (2020) 188–201. Crossref, ISIGoogle Scholar
    • 18. W. H. Boukaram, G. Turkiyyah, H. Ltaief and D. E. Keyes, Batched QR and SVD algorithms on GPUs with applications in hierarchical matrix compression, Parallel Comput. 74 (2018) 19–33. Crossref, ISIGoogle Scholar
    • 19. G. W. Stewart, An updating algorithm for subspace tracking, IEEE Trans. Signal Process. 40(6) (1992) 1535–1541. Crossref, ISIGoogle Scholar
    • 20. J. P. Charlier, M. Vanbegin and P. Van Dooren, On efficient implementations of Kogbetliantz’s algorithm for computing the singular value decomposition, Numer. Math. 52(3) (1987) 279–300. Crossref, ISIGoogle Scholar
    • 21. V. Hari and J. Matejaš, Accuracy of two SVD algorithms for 2 × 2 triangular matrices, Appl. Math. Comput. 210(1) (2009) 232–257. ISIGoogle Scholar
    • 22. J. Matejaš and V. Hari, On high relative accuracy of the Kogbetliantz method, Linear Algebra Appl. 464 (2015) 100–129. Crossref, ISIGoogle Scholar
    • 23. E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney and D. Sorensen, LAPACK Users’ Guide, 3rd edn. (Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1999). CrossrefGoogle Scholar
    • 24. S. Qiao and X. Wang, Computing the Singular Values of 2-by-2 Complex Matrices, http://www.cas.mcmaster.ca/sqrl/papers/sqrl5.pdf (May, 2002). Google Scholar
    • 25. IEEE Task P754, IEEE 754-2008, Standard for Floating-Point Arithmetic (IEEE, New York, NY, USA, August 2008). Google Scholar
    • 26. NVIDIA Corp., CUDA C++ Programming Guide v10.2.89 (November, 2019). https://docs.nvidia.com/cuda/cuda-c-programming-guide/. Google Scholar
    • 27. V. Novaković and S. Singer, Implicit Hari–Zimmermann algorithm for the generalized SVD on the GPUs, Int. J. High Perform. Comput. Appl. OnlineFirst (2020) 36 pages. https://doi.org/10.1177/1094342020972772 Google Scholar
    • 28. OpenMP ARB, OpenMP Application Programming Interface Version 4.5 (November, 2015). https://www.openmp.org/wp-content/uploads/openmp-4.5.pdf. Google Scholar