HARNESSING THE POWER OF IDLE GPUS FOR ACCELERATION OF BIOLOGICAL SEQUENCE ALIGNMENT
Abstract
This paper presents a parallel system capable of accelerating biological sequence alignment on the graphics processing unit (GPU) grid. The GPU grid in this paper is a desktop grid system that utilizes idle GPUs and CPUs in the office and home. Our parallel implementation employs a master-worker paradigm to accelerate an OpenGL-based algorithm that runs on a single GPU. We integrate this implementation into a screensaver-based grid system that detects idle resources on which the alignment code can run. We also show some experimental results comparing our implementation with three different implementations running on a single GPU, a single CPU, or multiple CPUs. As a result, we find that a single non-dedicated GPU can provide us almost the same throughput as two dedicated CPUs in our laboratory environment, where GPU-equipped machines are ordinarily used to develop GPU applications. In a dedicated environment, the GPU-accelerated code achieves five times higher throughput than the CPU-based code. Furthermore, a linear speedup of 30.7X is observed on a 32-node cluster of dedicated GPUs. We also implement a compute unified device architecture (CUDA) based algorithm to demonstrate further acceleration.
This work was partly supported by JSPS Grant-in-Aid for Scientific Research (A)(20240002), Young Researchers (B)(19700061), and the Global COE Program "in silico medicine" at Osaka University.
References
- Proceedings of the IEEE 96(5), 879 (2008), DOI: 10.1109/JPROC.2008.917757. Crossref, ISI, Google Scholar
- Computer Graphics Forum 26(1), 80 (2007), DOI: 10.1111/j.1467-8659.2007.01012.x. Crossref, ISI, Google Scholar
M. Silberstein , Efficient computation of sum-products on GPUs through software-managed cache, Proc. 22nd ACM Int'l Conf. Supercomputing (ICS'08) (2008) pp. 309–318. Google Scholar- ACM Trans. Graphics 22(3), 896 (2003), DOI: 10.1145/882262.882362. Crossref, ISI, Google Scholar
-
R. J. Rost , OpenGL Shading Language , 2nd edn. ( Addison-Wesley , Reading, MA , 2006 ) . Google Scholar - nVIDIA Corporation, "CUDA Programming Guide Version 1.1, " Nov. 2007, http://developer.nvidia.com/cuda/ . Google Scholar
- AMD, "ATI CTM Guide, " 2006, http://ati.amd.com/companyinfo/researcher/documents/ATI_CTM_Guide.pdf . Google Scholar
- Khronos OpenCL Working Group, "The OpenCL specification, " 2009, http://www. khronos.org/registry/cl/ . Google Scholar
- J. Parallel and Distributed Computing 63(5), 597 (2003), DOI: 10.1016/S0743-7315(03)00006-6. Crossref, ISI, Google Scholar
- The [email protected] Project, "[email protected] distributed computing, " 2008, http://folding.stanford.edu/ . Google Scholar
- GPUGRID.net, 2008, http://www.gpugrid.net/ . Google Scholar
- J. Grid Computing 6(4), 399 (2008), DOI: 10.1007/s10723-008-9099-7. Crossref, ISI, Google Scholar
- J. Molecular Biology 147, 195 (1981), DOI: 10.1016/0022-2836(81)90087-5. Crossref, ISI, Google Scholar
- IEEE Trans. Parallel and Distributed Systems 18(9), 1270 (2007). ISI, Google Scholar
-
Y. Munekawa , F. Ino and K. Hagihara , Design and implementation of the Smith-Waterman algorithm on the CUDA-compatible GPU , Proc. 8th IEEE Int'l Conf. Bioinformatics and Bioengineering (BIBE'08) ( 2008 ) . Google Scholar S. Yamagiwa and L. Sousa , Design and implementation of a stream-based distributed computing platform using graphics processing units, Proc. 4th Int'l Conf. Computing Frontiers (CF'07) (2007) pp. 197–204. Google Scholar-
Z. Fan , GPU cluster for high performance computing , Proc. Int'l Conf. High Performance Computing, Networking and Storage (SC'04) ( 2004 ) . Google Scholar M. Strengert , CUDASA: Compute unified device and systems architecture, Proc. 7th Eurographics Symp. Parallel Graphics and Visualization (EGPGV'08) (2008) pp. 49–56. Google Scholar- BMC Bioinformatics 9(10), (2008), DOI: 10.1186/1471-2105-9-S2-S10. Google Scholar
- M. Farrar, "Optimizing Smith-Waterman for the cell broadband engine, " 2008, http://farrar.michael.googlepages.com/ . Google Scholar
P. Zhang , G. Tan and G. R. Gao , Implementation of the Smith-Waterman algorithm on a reconfigurable supercomputing platform, Proc. 1st Workshop High-performance reconfigurable computing technology and applications (HPRCTA'06) (2007) pp. 39–48. Google ScholarD. Shreiner , OpenGL Programming Guide, 5th edn. (Addison-Wesley, Reading, MA, 2005). Google Scholar- Genomics 11(3), 635 (1991), DOI: 10.1016/0888-7543(91)90071-L. Crossref, ISI, Google Scholar
R. Raman , M. Livny and M. Solomon , Resource management through multilateral matchmaking, Proc. 9th IEEE Int'l Symp. High Performance Distributed Computing (HPDC'00) (2000) pp. 290–291. Google ScholarI. Buck , K. Fatahalian and P. Hanrahan , GPUBench: Evaluating GPU performance for numerical and scientific applications, Proc. 1st ACM Workshop General-Purpose Computing on Graphics Processors (GP '04) (2004) p. C-20. Google Scholar- OpenGL Extension Registry, "Gl_ext_framebuffer_object, "2006, http://oss.sgi.com/projects/ogl-sample/registry/EXT/framebuffer_object.txt . Google Scholar
- Nucleic Acids Research 25(1), 31 (1997), DOI: 10.1093/nar/25.1.31. Crossref, ISI, Google Scholar
X. Meng and V. Chaudhary , Exploiting multi-level parallelism for homology search using general purpose processors, Proc. 11th Int'l Conf. Parallel and Distributed Systems (ICPADS'05), Volume II Workshops (2005) pp. 331–335. Google Scholar-
V. Volkov and J. W. Demmel , Benchmarking GPUs to tune dense linear algebra , Proc. Int'l Conf. High Performance Computing, Networking and Storage (SC'08) ( 2008 ) . Google Scholar - Computer 40(12), 50 (2007), DOI: 10.1109/MC.2007.445. Crossref, ISI, Google Scholar
J. Dean and S. Ghemawat , MapReduce: Simplified data processing on large clusters, Proc. 6th Symp. Operating System Design and Implementation (OSDI'04) (2004) pp. 137–150. Google Scholar-
N. Maruyama , A. Nukada and S. Matsuoka , Software-based ECC for GPUs , Proc. Symp. Application Accelerators in High Performance Computing (SAAHPC'09) ( 2009 ) . Google Scholar - Parallel Computing 31(2), 205 (2005), DOI: 10.1016/j.parco.2005.02.006. Crossref, ISI, Google Scholar
T. Okuyama , F. Ino and K. Hagihara , A task parallel algorithm for computing the costs of all-pairs shortest paths on the CUDA-compatible GPU, Proc. 6th Int'l Symp. Parallel and Distributed Processing and Applications (ISPA '08) (2008) pp. 284–291. Google Scholar


