World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

HARNESSING THE POWER OF IDLE GPUS FOR ACCELERATION OF BIOLOGICAL SEQUENCE ALIGNMENT

    This paper presents a parallel system capable of accelerating biological sequence alignment on the graphics processing unit (GPU) grid. The GPU grid in this paper is a desktop grid system that utilizes idle GPUs and CPUs in the office and home. Our parallel implementation employs a master-worker paradigm to accelerate an OpenGL-based algorithm that runs on a single GPU. We integrate this implementation into a screensaver-based grid system that detects idle resources on which the alignment code can run. We also show some experimental results comparing our implementation with three different implementations running on a single GPU, a single CPU, or multiple CPUs. As a result, we find that a single non-dedicated GPU can provide us almost the same throughput as two dedicated CPUs in our laboratory environment, where GPU-equipped machines are ordinarily used to develop GPU applications. In a dedicated environment, the GPU-accelerated code achieves five times higher throughput than the CPU-based code. Furthermore, a linear speedup of 30.7X is observed on a 32-node cluster of dedicated GPUs. We also implement a compute unified device architecture (CUDA) based algorithm to demonstrate further acceleration.

    This work was partly supported by JSPS Grant-in-Aid for Scientific Research (A)(20240002), Young Researchers (B)(19700061), and the Global COE Program "in silico medicine" at Osaka University.

    References

    • J. D. Owenset al., Proceedings of the IEEE 96(5), 879 (2008), DOI: 10.1109/JPROC.2008.917757. Crossref, ISIGoogle Scholar
    • J. D. Owenset al., Computer Graphics Forum 26(1), 80 (2007), DOI: 10.1111/j.1467-8659.2007.01012.x. Crossref, ISIGoogle Scholar
    • M. Silbersteinet al., Efficient computation of sum-products on GPUs through software-managed cache, Proc. 22nd ACM Int'l Conf. Supercomputing (ICS'08) (2008) pp. 309–318. Google Scholar
    • W. R. Market al., ACM Trans. Graphics 22(3), 896 (2003), DOI: 10.1145/882262.882362. Crossref, ISIGoogle Scholar
    • R. J.   Rost , OpenGL Shading Language , 2nd edn. ( Addison-Wesley , Reading, MA , 2006 ) . Google Scholar
    • nVIDIA Corporation, "CUDA Programming Guide Version 1.1, " Nov. 2007, http://developer.nvidia.com/cuda/ . Google Scholar
    • AMD, "ATI CTM Guide, " 2006, http://ati.amd.com/companyinfo/researcher/documents/ATI_CTM_Guide.pdf . Google Scholar
    • Khronos OpenCL Working Group, "The OpenCL specification, " 2009, http://www. khronos.org/registry/cl/ . Google Scholar
    • A. Chienet al., J. Parallel and Distributed Computing 63(5), 597 (2003), DOI: 10.1016/S0743-7315(03)00006-6. Crossref, ISIGoogle Scholar
    • The [email protected] Project, "[email protected] distributed computing, " 2008, http://folding.stanford.edu/ . Google Scholar
    • GPUGRID.net, 2008, http://www.gpugrid.net/ . Google Scholar
    • Y. Kotani, F. Ino and K. Hagihara, J. Grid Computing 6(4), 399 (2008), DOI: 10.1007/s10723-008-9099-7. Crossref, ISIGoogle Scholar
    • T. F. Smith and M. S. Waterman, J. Molecular Biology 147, 195 (1981), DOI: 10.1016/0022-2836(81)90087-5. Crossref, ISIGoogle Scholar
    • W. Liuet al., IEEE Trans. Parallel and Distributed Systems 18(9), 1270 (2007). ISIGoogle Scholar
    • Y.   Munekawa , F.   Ino and K.   Hagihara , Design and implementation of the Smith-Waterman algorithm on the CUDA-compatible GPU , Proc. 8th IEEE Int'l Conf. Bioinformatics and Bioengineering (BIBE'08) ( 2008 ) . Google Scholar
    • S. Yamagiwa and L. Sousa, Design and implementation of a stream-based distributed computing platform using graphics processing units, Proc. 4th Int'l Conf. Computing Frontiers (CF'07) (2007) pp. 197–204. Google Scholar
    • Z.   Fan et al. , GPU cluster for high performance computing , Proc. Int'l Conf. High Performance Computing, Networking and Storage (SC'04) ( 2004 ) . Google Scholar
    • M. Strengertet al., CUDASA: Compute unified device and systems architecture, Proc. 7th Eurographics Symp. Parallel Graphics and Visualization (EGPGV'08) (2008) pp. 49–56. Google Scholar
    • S. A. Manavski and G. Valle, BMC Bioinformatics 9(10), (2008), DOI: 10.1186/1471-2105-9-S2-S10. Google Scholar
    • M. Farrar, "Optimizing Smith-Waterman for the cell broadband engine, " 2008, http://farrar.michael.googlepages.com/ . Google Scholar
    • P. Zhang, G. Tan and G. R. Gao, Implementation of the Smith-Waterman algorithm on a reconfigurable supercomputing platform, Proc. 1st Workshop High-performance reconfigurable computing technology and applications (HPRCTA'06) (2007) pp. 39–48. Google Scholar
    • D. Shreineret al., OpenGL Programming Guide, 5th edn. (Addison-Wesley, Reading, MA, 2005). Google Scholar
    • W. R. Pearson, Genomics 11(3), 635 (1991), DOI: 10.1016/0888-7543(91)90071-L. Crossref, ISIGoogle Scholar
    • R. Raman, M. Livny and M. Solomon, Resource management through multilateral matchmaking, Proc. 9th IEEE Int'l Symp. High Performance Distributed Computing (HPDC'00) (2000) pp. 290–291. Google Scholar
    • I. Buck, K. Fatahalian and P. Hanrahan, GPUBench: Evaluating GPU performance for numerical and scientific applications, Proc. 1st ACM Workshop General-Purpose Computing on Graphics Processors (GP '04) (2004) p. C-20. Google Scholar
    • OpenGL Extension Registry, "Gl_ext_framebuffer_object, "2006, http://oss.sgi.com/projects/ogl-sample/registry/EXT/framebuffer_object.txt . Google Scholar
    • A. Bairoch and R. Apweiler, Nucleic Acids Research 25(1), 31 (1997), DOI: 10.1093/nar/25.1.31. Crossref, ISIGoogle Scholar
    • X. Meng and V. Chaudhary, Exploiting multi-level parallelism for homology search using general purpose processors, Proc. 11th Int'l Conf. Parallel and Distributed Systems (ICPADS'05), Volume II Workshops (2005) pp. 331–335. Google Scholar
    • V.   Volkov and J. W.   Demmel , Benchmarking GPUs to tune dense linear algebra , Proc. Int'l Conf. High Performance Computing, Networking and Storage (SC'08) ( 2008 ) . Google Scholar
    • W. chun Feng and K. Cameron, Computer 40(12), 50 (2007), DOI: 10.1109/MC.2007.445. Crossref, ISIGoogle Scholar
    • J. Dean and S. Ghemawat, MapReduce: Simplified data processing on large clusters, Proc. 6th Symp. Operating System Design and Implementation (OSDI'04) (2004) pp. 137–150. Google Scholar
    • N.   Maruyama , A.   Nukada and S.   Matsuoka , Software-based ECC for GPUs , Proc. Symp. Application Accelerators in High Performance Computing (SAAHPC'09) ( 2009 ) . Google Scholar
    • M. Strengertet al., Parallel Computing 31(2), 205 (2005), DOI: 10.1016/j.parco.2005.02.006. Crossref, ISIGoogle Scholar
    • T. Okuyama, F. Ino and K. Hagihara, A task parallel algorithm for computing the costs of all-pairs shortest paths on the CUDA-compatible GPU, Proc. 6th Int'l Symp. Parallel and Distributed Processing and Applications (ISPA '08) (2008) pp. 284–291. Google Scholar