World Scientific
  • Search
  •   
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

Efficient Complex High-Precision Computations on GPUs without Precision Loss

    https://doi.org/10.1142/S0218126617501870Cited by:2 (Source: Crossref)

    General-purpose computing on graphics processing units is the utilization of a graphics processing unit (GPU) to perform computation in applications traditionally handled by the central processing unit. Many attempts have been made to implement well-known algorithms on embedded and mobile GPUs. Unfortunately, these applications are computationally complex and often require high precision arithmetic, whereas embedded and mobile GPUs are designed specifically for graphics, and thus are very restrictive in terms of input/output, precision, programming style and primitives available.

    This paper studies how to implement efficient and accurate high-precision algorithms on embedded GPUs adopting the OpenGL ES language. We discuss the problems arising during the design phase, and we detail our implementation choices, focusing on the SIFT and ALP key-point detectors. We transform standard, i.e., single (or double) precision floating-point computations, to reduced-precision GPU arithmetic without precision loss. We develop a desktop framework to simulate Gaussian Scale Space transforms on all possible target embedded GPU platforms, and with all possible range and precision arithmetic. We illustrate how to re-engineer standard Gaussian Scale Space computations to mobile multi-core parallel GPUs using the OpenGL ES language. We present experiments on a large set of standard images, proving how efficiency and accuracy can be maintained on different target platforms. To sum up, we present a complete framework to minimize future programming effort, i.e., to easily check, on different embedded platforms, the accuracy and performance of complex algorithms requiring high-precision computations.

    This paper was recommended by Regional Editor Kshirasagar Naik.

    References

    • 1. A. P. Witkin, Scale-space filtering, Proc. 8th Int. Joint Conf. Artificial Intelligence (IJCAI’83) Vol. 2, pp. 1019–1022 (San Francisco, CA, USA, 1983). Google Scholar
    • 2. A. Asano, T. Maruyama and Y. Yamaguchi, Performance comparison of FPGA, GPU and CPU in image processing, Int. Conf. Field Programmable Logic and Applications, Prague, Czech Republic, August 2009, pp. 126–131. Google Scholar
    • 3. K. T. Cheng and Y. Wang, Using mobile GPU for general-purpose computing — A case study of face recognition on smartphones, IEEE Int. Symp. VLSI Design, Automation and Test (VLSI-DAT), Hsinchu, Taiwan, April 2011, pp. 1–4. Google Scholar
    • 4. M. H. Christensen. http://blog.hvidtfeldts.net/index.php/2011/02/gpu-versus-cpu-for-pixel-graphics/. Accessed: 2014-06-15. Google Scholar
    • 5. Computer Vision Laboratory, Zurich building image database. http://www.vision.ee.ethz.ch/showroom/zubud/index.en.html. Accessed: 2014-10-01. Google Scholar
    • 6. D. Connors, K. Dunn and J. Wiencrot. Adaptive OpenCL (ACL) execution in GPU architectures, ACM Int. Conf. High-Performance and Embedded Architectures and Compilers, Berlin, Germany, January 2013, pp. 547–558. Google Scholar
    • 7. T. H. Cormen, C. E. Leiserson, R. L. Rivest and C. Stein, Introduction to Algorithms, 3rd edn. (The MIT Press, Cambridge, 2009). Google Scholar
    • 8. D. H. Bailey, Y. Hida, X. S. Li, B. Thompson, K. Jeyabalan and A. Kaiser, The high-precision software directory. http://crd-legacy.lbl.gov/˜dhbailey/mpdist/. Accessed: 2013-10-15. Google Scholar
    • 9. A. Ensor and S. Hall. GPU-based image analysis on mobile devices, Twenty-Sixth International Conference Image and Vision Computing (IVCNZ), Auckland, New Zealand, October 2011, pp. 384–390. Google Scholar
    • 10. G. Francini et al. Private communication, March 2014. Google Scholar
    • 11. H. Fassold, H. Stiegler, J. Rosner, M. Thaler and W. Bailer. A gpu-accelerated two stage visual matching pipeline for image and video retrieval, 13th International Workshop Content-Based Multimedia Indexing (CBMI), Prague, Czech Republic, June 2015, pp. 1–5. Google Scholar
    • 12. G. Francini, M. Balestri and S. Lepsøy, CDVS: Telecom Italia’s Response to CE1 — Interest point detection. Technical Report ISO/IEC JTC1/SC29/WG11 Doc. M31369, MPEG-7 Video Subgroup: Compact Descriptors for Visual Search, http://wg11.sc29.org/, Geneva, Switzerland, 2013. Google Scholar
    • 13. B. Girod, V. Chandrasekhar, D. M. Chen, N. Cheung, R. Grzeszczuk, Y. A. Reznik, G. Takacs, S. S. Tsai and R. Vedantham, Mobile visual search, IEEE Signal Process. Mag. 28 (2011) 61–76. Google Scholar
    • 14. D. Göddeke, R. Strzodka and S. Turek, Accelerating double precision FEM simulations with GPUs, Proc. ASIM 2005 — 18th Symp. Simulation Technique, Erlangen, Germany, September 2005, pp. 139–144. Google Scholar
    • 15. G. Da Graça and D. Defour, Implementation of float-float operators on graphics hardware, 7th Conf. Real Numbers and Computers, RNC7, Nancy, France, 2006, pp. 23–32. Google Scholar
    • 16. S. Gupta and M. R. Babu. Performance analysis of GPU compared to single-core and multi-core CPU for natural language applications, Int. J. Adv. Comput. Sci. Appl. 2 (2011) 50–53. Google Scholar
    • 17. S. Heymann, K. Müller, B. Fröhlich, F. Medien and T. Wiegand, SIFT implementation and optimization for general-purpose GPU, Winter School of Computer Graphics (WSCG), Plzen, Czech Republic, January 2007, pp. 317–322. Google Scholar
    • 18. K. E. Hillesland and A. Lastra, GPU floating-point paranoia, ACM Workshop on General Purpose Computing on Graphics Processors, Los Angeles, California, USA, August 2004, p. 37. Google Scholar
    • 19. M. Huang and C. Lai, Parallelizing computer vision algorithms on acceleration technologies: A sift case study, IEEE China Summit Int. Conf. Signal and Information Processing (ChinaSIP), Xi’an, China, July 2014, pp. 325–329. Google Scholar
    • 20. J. Jaros and P. Pospichal, A fair comparison of modern CPUs and GPUs running the genetic algorithm under the Knapsack benchmark, Applications of Evolutionary Computation, Vol. 7248, Lecture Notes in Computer Science (Springer, New York, 2012), pp. 426–435. CrossrefGoogle Scholar
    • 21. G. R. Kayombya, SIFT feature extraction on a smartphone GPU using OpenGL ES 2.0, Masters thesis, MIT, Cambridge, MA, (2010). Google Scholar
    • 22. Khronos Group, The opengl es 2.0 specification, http://www.khronos.org/opengles. Accessed: 2014-03-15. Google Scholar
    • 23. Tony Lindeberg, Scale-space theory: A basic tool for Analysing Structures at different Scales, J. Appl. Stat. 21(2) (1994) 224–270. Google Scholar
    • 24. M. Bordallo Lopez, H. Nykanen, J. Hannuksela, O. Silven and M. Vehvilainen, Accelerating image recognition on mobile devices using GPGPU, SPIE Parallel Processing for Imaging Applications, San Francisco, California, USA, January 2011, pp. 78720R–78720R-10. Google Scholar
    • 25. M. Bordallo López, H. Nykänen, J. Hannuksela, O. Silvén and M. Vehviläinen, Accelerating image recognition on mobile devices using GPGPU, Proc. Parallel Processing for Imaging Applications (SPIE 7872), San Francisco, California, USA, January 2011, p. 78720R. Google Scholar
    • 26. D. G. Lowe, Object recognition from local scale-invariant features, Proc. Int. Conf. Computer Vision (ICCV ’99), Vol. 2 (Washington, DC, USA, 1999), pp. 1150–1157. Google Scholar
    • 27. D. G. Lowe, Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vision, 60 (2) :91–110, 2004. Crossref, Web of ScienceGoogle Scholar
    • 28. K. Mikolajczyk and C. Schmid, A performance evaluation of local descriptors, IEEE Trans. Pattern Anal. Mach. Intell. 27 (2005) 1615–1630. Crossref, Web of ScienceGoogle Scholar
    • 29. MPEG, Text of ISO/IEC CD 15938-13 Compact Descriptors for Visual Search, 2013. Google Scholar
    • 30. NVIDIA, Nvidia perfkit. https://developer.nvidia.com/nvidia-perfkit, Accessed: 2013-06-15. Google Scholar
    • 31. F. Oberti, A. Teschioni and C. S. Regazzoni, ROC curves for performance evaluation of video sequences processing systems for surveillance applications, IEEE Int. Conf. Image Processing (ICIP), Kobe, Japan, October 1999, pp. 949–953. Google Scholar
    • 32. Tom Olson, Benchmarking floating point precision in mobile gpus, http://community.arm.com/groups/arm-mali-graphics/blog/2013/10/10. Accessed: 2013-09-15. Google Scholar
    • 33. OpenGL.org, The opengl software development kit. http://www.opengl.org/sdk/docs/man/. Accessed: 2014-03-15. Google Scholar
    • 34. The Moodstocks Repository, An ios app featuring sift descriptors extraction with opengl es 2.0, https://github.com/Moodstocks/sift-gpu-iphone. Accessed: 2013-12-15. Google Scholar
    • 35. B. Rister, G. Wang, M. Wu and J. R. Cavallaro, A fast and efficient sift detector using the mobile GPU, ICASSP IEEE, Vancouver, BC, Canada, 2013, pp. 2674–2678. Google Scholar
    • 36. S. Sinha, J. M. Frahm, M. Pollefeys and Y. Genc, Feature tracking and matching in video using programmable graphics hardware, Mach. Vis. Appl. 22 (2011) 207–217. Crossref, Web of ScienceGoogle Scholar
    • 37. M. A. Suchard, Q. Wang, C. Chan, J. Frelinger, A. Cron and M. West, Understanding GPU programming for statistical computation: Studies in massively parallel massive mixtures, J. Comp. Graph. Stat. 19 (2010) 419–438. Crossref, Web of ScienceGoogle Scholar
    • 38. Telecom Italia, The cturin180 test set, http://jol.telecomitalia.com/jolvisible/cturin180/?lang=en. Accessed: 2016-15-01. Google Scholar
    • 39. The Moving Picture Experts Group, http://mpeg.chiariglione.org. Accessed: 2014-06-01. Google Scholar
    • 40. G. Wang, B. Rister and J. R. Cavallaro, Workload analysis and efficient OpenCL-based implementation of SIFT algorithm on a smartphone, IEEE Global Conf. Signal and Information Processing (GlobalSIP), December, 2013, pp. 759–762. Google Scholar
    • 41. G. Wang, Y. Xiong, J. Yun and J. R. Cavallaro, Accelerating computer vision algorithms using OpenCL framework on the mobile CPU — A case study, Proc. Acoustics, Speech and Signal Processing (ICASSP) (Vancouver Convention & Exhibition centre, May 2013), pp. 2629–2633. Google Scholar
    • 42. S. Warn, W. Emeneker, J. Cothren and A. Apon, Accelerating SIFT on parallel architectures, IEEE Int. Conf. Cluster Computing and Workshops, New Orleans, Louisiana, USA, September 2009, p. 14. Google Scholar
    • 43. H. Xie, K. Gao, Y. Zhang, J. Li and Y. Liu, GPU-based fast scale invariant interest point detector, IEEE Int. Conf. Acoustics Speech and Signal Processing (ICASSP), Dallas, Texas, USA, March 2010, pp. 2494–2497. Google Scholar
    • 44. H. Zhang, D. Zhang and X. Bi, Comparison and analysis of GPGPU and parallel computing on multi-core CPU, Int. J. Inform. Edu. Tech. 2 (2012) 185–187. CrossrefGoogle Scholar