DETERMINISTIC SAMPLE SORT FOR GPUS
Abstract
We demonstrate that parallel deterministic sample sort for many-core GPUs (GPU BUCKET SORT) is not only considerably faster than the best comparison-based sorting algorithm for GPUs (THRUST MERGE [Satish et.al., Proc. IPDPS 2009]) but also as fast as randomized sample sort for GPUs (GPU SAMPLE SORT [Leischner et.al., Proc. IPDPS 2010]). However, deterministic sample sort has the advantage that bucket sizes are guaranteed and therefore its running time does not have the input data dependent fluctuations that can occur for randomized sample sort.
Research partially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) and the IBM Center for Advanced Studies Canada.
References
- NVIDIA CUDA Programming Guide. nVIDIA Corporation , www.nvidia.com . Google Scholar
- NVIDIA GPU Technical Specifications. nVIDIA Corporation , www.nvidia.com . Google Scholar
- The OpenCL Specification 1.0. Khronos OpenCL Working Group, 2009 . Google Scholar
- SIAM J. Comput. 18(2), 216 (1989). Crossref, ISI, Google Scholar
D. Cederman and P. Tsigas , A practical quicksort algorithm for graphics processors, Proc. European Symposium on Algorithms (ESA)5193,LNCS (2008) pp. 246–258. Google Scholar- M. Garland. Private communication. nVIDIA Corporation, 2010 . Google Scholar
N. Govindaraju , GPUTeraSort: high performance graphics co-processor sorting for large database management, Proc. International Conference on Management of Data (SIGMOD) (2006) pp. 325–336. Google Scholar- GPGPU.ORG. General-purpose computation on graphics hardware . Google Scholar
-
A. Greb and G. Zachmann , GPU-ABiSort: Optimal parallel sorting on stream architectures , Proc. Int'l Parallel and Distributed Processing Symposium (IPDPS) ( 2006 ) . Google Scholar - J. Concurrency and Computation: Practice and Experience 23, 681 (2011). Crossref, ISI, Google Scholar
N. Leischner , V. Osipov and P. Sanders , GPU sample sort, Proc. Int'l Parallel and Distributed Processing Symposium (IPDPS) (2010) pp. 1–10. Google Scholar- IEEE Micro 28(2), 39 (2008). Crossref, ISI, Google Scholar
-
N. Satish , M. Harris and M. Garland , Designing efficient sorting algorithms for many-core GPUs , Proc. Int'l Parallel and Distributed Processing Symposium (IPDPS) ( 2009 ) . Google Scholar -
Nadathur Satish , Fast sort on cpus and gpus: a case for bandwidth oblivious simd sort , Proc. International Conference on Management of Data (SIGMOD) ( 2010 ) . Google Scholar - J. Par. and Dist. Comp. 14, 362 (1992). Google Scholar
- J. of Parallel and Distributed Computing 68(10), 1381 (2008). Crossref, ISI, Google Scholar


