Efficient Complex High-Precision Computations on GPUs without Precision Loss
Abstract
General-purpose computing on graphics processing units is the utilization of a graphics processing unit (GPU) to perform computation in applications traditionally handled by the central processing unit. Many attempts have been made to implement well-known algorithms on embedded and mobile GPUs. Unfortunately, these applications are computationally complex and often require high precision arithmetic, whereas embedded and mobile GPUs are designed specifically for graphics, and thus are very restrictive in terms of input/output, precision, programming style and primitives available.
This paper studies how to implement efficient and accurate high-precision algorithms on embedded GPUs adopting the OpenGL ES language. We discuss the problems arising during the design phase, and we detail our implementation choices, focusing on the SIFT and ALP key-point detectors. We transform standard, i.e., single (or double) precision floating-point computations, to reduced-precision GPU arithmetic without precision loss. We develop a desktop framework to simulate Gaussian Scale Space transforms on all possible target embedded GPU platforms, and with all possible range and precision arithmetic. We illustrate how to re-engineer standard Gaussian Scale Space computations to mobile multi-core parallel GPUs using the OpenGL ES language. We present experiments on a large set of standard images, proving how efficiency and accuracy can be maintained on different target platforms. To sum up, we present a complete framework to minimize future programming effort, i.e., to easily check, on different embedded platforms, the accuracy and performance of complex algorithms requiring high-precision computations.
This paper was recommended by Regional Editor Kshirasagar Naik.
References
- 1. , Scale-space filtering, Proc. 8th Int. Joint Conf. Artificial Intelligence (IJCAI’83) Vol. 2, pp. 1019–1022 (San Francisco, CA, USA, 1983). Google Scholar
- 2. , Performance comparison of FPGA, GPU and CPU in image processing, Int. Conf. Field Programmable Logic and Applications,
Prague, Czech Republic ,August 2009 , pp. 126–131. Google Scholar - 3. , Using mobile GPU for general-purpose computing — A case study of face recognition on smartphones, IEEE Int. Symp. VLSI Design, Automation and Test (VLSI-DAT),
Hsinchu, Taiwan ,April 2011 , pp. 1–4. Google Scholar - 4. M. H. Christensen. http://blog.hvidtfeldts.net/index.php/2011/02/gpu-versus-cpu-for-pixel-graphics/. Accessed: 2014-06-15. Google Scholar
- 5. Computer Vision Laboratory, Zurich building image database. http://www.vision.ee.ethz.ch/showroom/zubud/index.en.html. Accessed: 2014-10-01. Google Scholar
- 6. . Adaptive OpenCL (ACL) execution in GPU architectures, ACM Int. Conf. High-Performance and Embedded Architectures and Compilers,
Berlin, Germany ,January 2013 , pp. 547–558. Google Scholar - 7. , Introduction to Algorithms, 3rd edn. (The MIT Press, Cambridge, 2009). Google Scholar
- 8. D. H. Bailey, Y. Hida, X. S. Li, B. Thompson, K. Jeyabalan and A. Kaiser, The high-precision software directory. http://crd-legacy.lbl.gov/˜dhbailey/mpdist/. Accessed: 2013-10-15. Google Scholar
- 9. . GPU-based image analysis on mobile devices, Twenty-Sixth International Conference Image and Vision Computing (IVCNZ),
Auckland, New Zealand ,October 2011 , pp. 384–390. Google Scholar - 10. G. Francini et al. Private communication, March 2014. Google Scholar
- 11. . A gpu-accelerated two stage visual matching pipeline for image and video retrieval, 13th International Workshop Content-Based Multimedia Indexing (CBMI),
Prague, Czech Republic ,June 2015 , pp. 1–5. Google Scholar - 12. G. Francini, M. Balestri and S. Lepsøy, CDVS: Telecom Italia’s Response to CE1 — Interest point detection. Technical Report ISO/IEC JTC1/SC29/WG11 Doc. M31369, MPEG-7 Video Subgroup: Compact Descriptors for Visual Search, http://wg11.sc29.org/, Geneva, Switzerland, 2013. Google Scholar
- 13. B. Girod, V. Chandrasekhar, D. M. Chen, N. Cheung, R. Grzeszczuk, Y. A. Reznik, G. Takacs, S. S. Tsai and R. Vedantham, Mobile visual search, IEEE Signal Process. Mag. 28 (2011) 61–76. Google Scholar
- 14. , Accelerating double precision FEM simulations with GPUs, Proc. ASIM 2005 — 18th Symp. Simulation Technique,
Erlangen, Germany ,September 2005 , pp. 139–144. Google Scholar - 15. , Implementation of float-float operators on graphics hardware, 7th Conf. Real Numbers and Computers, RNC7,
Nancy, France , 2006, pp. 23–32. Google Scholar - 16. . Performance analysis of GPU compared to single-core and multi-core CPU for natural language applications, Int. J. Adv. Comput. Sci. Appl. 2 (2011) 50–53. Google Scholar
- 17. , SIFT implementation and optimization for general-purpose GPU, Winter School of Computer Graphics (WSCG),
Plzen, Czech Republic ,January 2007 , pp. 317–322. Google Scholar - 18. , GPU floating-point paranoia, ACM Workshop on General Purpose Computing on Graphics Processors,
Los Angeles, California, USA ,August 2004 , p. 37. Google Scholar - 19. , Parallelizing computer vision algorithms on acceleration technologies: A sift case study, IEEE China Summit Int. Conf. Signal and Information Processing (ChinaSIP),
Xi’an, China ,July 2014 , pp. 325–329. Google Scholar - 20. ,
A fair comparison of modern CPUs and GPUs running the genetic algorithm under the Knapsack benchmark , Applications of Evolutionary Computation, Vol. 7248,Lecture Notes in Computer Science (Springer, New York, 2012), pp. 426–435. Crossref, Google Scholar - 21. G. R. Kayombya, SIFT feature extraction on a smartphone GPU using OpenGL ES 2.0, Masters thesis, MIT, Cambridge, MA, (2010). Google Scholar
- 22. Khronos Group, The opengl es 2.0 specification, http://www.khronos.org/opengles. Accessed: 2014-03-15. Google Scholar
- 23. , Scale-space theory: A basic tool for Analysing Structures at different Scales, J. Appl. Stat. 21(2) (1994) 224–270. Google Scholar
- 24. , Accelerating image recognition on mobile devices using GPGPU, SPIE Parallel Processing for Imaging Applications,
San Francisco, California, USA ,January 2011 , pp. 78720R–78720R-10. Google Scholar - 25. , Accelerating image recognition on mobile devices using GPGPU, Proc. Parallel Processing for Imaging Applications (SPIE 7872),
San Francisco, California, USA ,January 2011 , p. 78720R. Google Scholar - 26. , Object recognition from local scale-invariant features, Proc. Int. Conf. Computer Vision (ICCV ’99), Vol. 2 (Washington, DC, USA, 1999), pp. 1150–1157. Google Scholar
- 27. , Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vision, 60 (2) :91–110, 2004. Crossref, Web of Science, Google Scholar
- 28. , A performance evaluation of local descriptors, IEEE Trans. Pattern Anal. Mach. Intell. 27 (2005) 1615–1630. Crossref, Web of Science, Google Scholar
- 29. MPEG, Text of ISO/IEC CD 15938-13 Compact Descriptors for Visual Search, 2013. Google Scholar
- 30. NVIDIA, Nvidia perfkit. https://developer.nvidia.com/nvidia-perfkit, Accessed: 2013-06-15. Google Scholar
- 31. , ROC curves for performance evaluation of video sequences processing systems for surveillance applications, IEEE Int. Conf. Image Processing (ICIP),
Kobe, Japan ,October 1999 , pp. 949–953. Google Scholar - 32. Tom Olson, Benchmarking floating point precision in mobile gpus, http://community.arm.com/groups/arm-mali-graphics/blog/2013/10/10. Accessed: 2013-09-15. Google Scholar
- 33. OpenGL.org, The opengl software development kit. http://www.opengl.org/sdk/docs/man/. Accessed: 2014-03-15. Google Scholar
- 34. The Moodstocks Repository, An ios app featuring sift descriptors extraction with opengl es 2.0, https://github.com/Moodstocks/sift-gpu-iphone. Accessed: 2013-12-15. Google Scholar
- 35. , A fast and efficient sift detector using the mobile GPU, ICASSP IEEE,
Vancouver, BC, Canada , 2013, pp. 2674–2678. Google Scholar - 36. , Feature tracking and matching in video using programmable graphics hardware, Mach. Vis. Appl. 22 (2011) 207–217. Crossref, Web of Science, Google Scholar
- 37. , Understanding GPU programming for statistical computation: Studies in massively parallel massive mixtures, J. Comp. Graph. Stat. 19 (2010) 419–438. Crossref, Web of Science, Google Scholar
- 38. Telecom Italia, The cturin180 test set, http://jol.telecomitalia.com/jolvisible/cturin180/?lang=en. Accessed: 2016-15-01. Google Scholar
- 39. The Moving Picture Experts Group, http://mpeg.chiariglione.org. Accessed: 2014-06-01. Google Scholar
- 40. , Workload analysis and efficient OpenCL-based implementation of SIFT algorithm on a smartphone, IEEE Global Conf. Signal and Information Processing (GlobalSIP),
December, 2013 , pp. 759–762. Google Scholar - 41. , Accelerating computer vision algorithms using OpenCL framework on the mobile CPU — A case study, Proc. Acoustics, Speech and Signal Processing (ICASSP) (Vancouver Convention & Exhibition centre,
May 2013 ), pp. 2629–2633. Google Scholar - 42. , Accelerating SIFT on parallel architectures, IEEE Int. Conf. Cluster Computing and Workshops,
New Orleans, Louisiana, USA ,September 2009 , p. 14. Google Scholar - 43. , GPU-based fast scale invariant interest point detector, IEEE Int. Conf. Acoustics Speech and Signal Processing (ICASSP),
Dallas, Texas, USA ,March 2010 , pp. 2494–2497. Google Scholar - 44. , Comparison and analysis of GPGPU and parallel computing on multi-core CPU, Int. J. Inform. Edu. Tech. 2 (2012) 185–187. Crossref, Google Scholar