World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

DETERMINISTIC SAMPLE SORT FOR GPUS

    We demonstrate that parallel deterministic sample sort for many-core GPUs (GPU BUCKET SORT) is not only considerably faster than the best comparison-based sorting algorithm for GPUs (THRUST MERGE [Satish et.al., Proc. IPDPS 2009]) but also as fast as randomized sample sort for GPUs (GPU SAMPLE SORT [Leischner et.al., Proc. IPDPS 2010]). However, deterministic sample sort has the advantage that bucket sizes are guaranteed and therefore its running time does not have the input data dependent fluctuations that can occur for randomized sample sort.

    Research partially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) and the IBM Center for Advanced Studies Canada.

    References

    • NVIDIA CUDA Programming Guide. nVIDIA Corporation , www.nvidia.com . Google Scholar
    • NVIDIA GPU Technical Specifications. nVIDIA Corporation , www.nvidia.com . Google Scholar
    • The OpenCL Specification 1.0. Khronos OpenCL Working Group, 2009 . Google Scholar
    • G. Bilardi and A. Nicolau, SIAM J. Comput. 18(2), 216 (1989). Crossref, ISIGoogle Scholar
    • D. Cederman and P. Tsigas, A practical quicksort algorithm for graphics processors, Proc. European Symposium on Algorithms (ESA)5193, LNCS (2008) pp. 246–258. Google Scholar
    • M. Garland. Private communication. nVIDIA Corporation, 2010 . Google Scholar
    • N. Govindarajuet al., GPUTeraSort: high performance graphics co-processor sorting for large database management, Proc. International Conference on Management of Data (SIGMOD) (2006) pp. 325–336. Google Scholar
    • GPGPU.ORG. General-purpose computation on graphics hardware . Google Scholar
    • A.   Greb and G.   Zachmann , GPU-ABiSort: Optimal parallel sorting on stream architectures , Proc. Int'l Parallel and Distributed Processing Symposium (IPDPS) ( 2006 ) . Google Scholar
    • H. Peters, O. Schulz-Hildebrandt and N. Luttenberger, J. Concurrency and Computation: Practice and Experience 23, 681 (2011). Crossref, ISIGoogle Scholar
    • N. Leischner, V. Osipov and P. Sanders, GPU sample sort, Proc. Int'l Parallel and Distributed Processing Symposium (IPDPS) (2010) pp. 1–10. Google Scholar
    • E. Lindholmet al., IEEE Micro 28(2), 39 (2008). Crossref, ISIGoogle Scholar
    • N.   Satish , M.   Harris and M.   Garland , Designing efficient sorting algorithms for many-core GPUs , Proc. Int'l Parallel and Distributed Processing Symposium (IPDPS) ( 2009 ) . Google Scholar
    • Nadathur   Satish et al. , Fast sort on cpus and gpus: a case for bandwidth oblivious simd sort , Proc. International Conference on Management of Data (SIGMOD) ( 2010 ) . Google Scholar
    • H. Shi and J. Schaeffer, J. Par. and Dist. Comp. 14, 362 (1992). Google Scholar
    • E. Sintorn and U. Assarsson, J. of Parallel and Distributed Computing 68(10), 1381 (2008). Crossref, ISIGoogle Scholar