World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

Integer sorting on multicores and GPUs can be realized by a variety of approaches that include variants of distribution-based methods such as radix-sort, comparison-oriented algorithms such as deterministic regular sampling and random sampling parallel sorting, and network-based algorithms such as Batcher’s bitonic sorting algorithm.

In this work we present an experimental study of integer sorting on multicore processors. We have implemented serial and parallel radix-sort for various radixes, deterministic regular oversampling, and random oversampling parallel sorting, including new variants of ours, and also some previously little explored or unexplored variants of bitonic-sort and odd-even transposition sort. The study uses multithreading and multiprocessing parallel programming libraries with the same C language code working under Open MPI, MulticoreBSP, and BSPlib. We first provide some general high-level observations on the performance of these implementations. If we can conclude anything is that accurate prediction of performance by taking into consideration architecture dependent features such as the structure and characteristics of multiple memory hierarchies is difficult and more often than not untenable. To some degree this is affected by the overhead imposed by the high-level library used in the programming effort.

Another objective is to model the performance of these algorithms and their implementations under the MBSP (Multi-memory BSP) model. Despite the limitations mentioned above, we can still draw some reliable conclusions and reason about the performance of these implementations using the MBSP model, thus making MBSP useful and usable.

References

  • 1. K. Batcher, Sorting networks and their applications, in Proc. of the AFIPS Spring Joint Computing Conference (1968), pp. 307–314. Google Scholar
  • 2. G. Baudet and D. Stevenson, Optimal sorting algorithms for parallel computers, IEEE Transactions on Computers C-27(1) (1978) 84–87. CrossrefGoogle Scholar
  • 3. G. E. Blelloch, C. E. Leiserson, B. M. Maggs, C. G. Plaxton, S. J. Smith and M. Zagha, A comparison of sorting algorithms for the connection machine CM-2, in Proc. of the Symposium on Parallel Algorithms and Architectures (SPAA ’91) (Hilton Head, SC, USA, July 21–24) (ACM Press, 1991), pp. 3–16. Google Scholar
  • 4. B. Bramas, A Novel Hybrid Quicksort Algorithm Vectorized using AVX-512 on Intel Skylake, arXiv:1704.08579, 24 Apr 2017 and revised 12 Mar 2018. Google Scholar
  • 5. D. Cederman and P. Tsingas, On sorting and load balancing on GPUs, ACM SIGARCH Computer Architecture News 36(5) (December 2008) 11–18. CrossrefGoogle Scholar
  • 6. Z. Cheng, K. Qi, L. Jun and Yi-Ran Huang, Thread-level parallel algorithm for sorting integer sequence on multi-core computers, in Proc. International Symposium on Parallel Architectures, Algorithms and Programming (Tianjin, China, December 9–11, 2011) (IEEE Press), pp. 37–41, http://doi.ieeecomputersociety.org/10.1109/PAAP.2011.57. Google Scholar
  • 7. T. H. Cormen, C. E. Leiserson, R. L. Rivest and C. Stein, Introduction to Algorithms, 3rd edn. (The MIT Press, 2009). Google Scholar
  • 8. A. Chan, F. Dehne and H. Zaboli, A note on coarse grained parallel integer sorting, Parallel Processing Letters 9(4) (1999) 533–538. LinkGoogle Scholar
  • 9. F. Dehne and H. Zaboli, Deterministic sample sort for GPUs, Parallel Processing Letters 22(3) (2012) 1250008-1–1250008-14. LinkGoogle Scholar
  • 10. F. Dehne and H. Zaboli, Parallel sorting for GPUs, in eds. A. Adamatzky, Emergent Computation, Emergence, Complexity and Computation, Vol. 24 (Springer, 2017), pp. 293–302. CrossrefGoogle Scholar
  • 11. W. D. Frazer and A. C. McKellar, Samplesort: A sampling approach to minimal storage tree sorting, Journal of the ACM 17(3) (1970) 496–507. Crossref, ISIGoogle Scholar
  • 12. B. A. Garber, Dan Hoeflinger, Xiaoming Li, M. Jesus Garzaran and D. Padua, Automatic generation of a parallel sorting algorithm, in IEEE International Symposium on Parallel and Distributed Processing (April 14–18, 2008), pp. 1–5. Google Scholar
  • 13. A. V. Gerbessiotis and C. J. Siniolakis, Deterministic sorting and randomized median finding on the BSP model, in Proc. of the 8th Annual ACM Symposium on Parallel Algorithms and Architectures (Padua, Italy, June 1996), pp. 223–232. Google Scholar
  • 14. A. V. Gerbessiotis and C. J. Siniolakis, An experimental study of BSP sorting algorithms, in Proc. of 6th Euromicro Workshop on Parallel and Distributed Processing (Madrid, Spain, January 1998) (IEEE Computer Society Press). Google Scholar
  • 15. A. V. Gerbessiotis and C. J. Siniolakis, Efficient deterministic sorting on the BSP model, Parallel Processing Letters 9(1) (1999) 69–79. LinkGoogle Scholar
  • 16. A. V. Gerbessiotis and L. G. Valiant, Direct bulk-synchronous algorithms, Journal of Parallel and Distributed Computing 22 (1994) 251–267. Crossref, ISIGoogle Scholar
  • 17. A. V. Gerbessiotis, Extending the BSP model for multi-core and out-of-core computing: MBSP, Parallel Computing 41 (2015) 90–102. CrossrefGoogle Scholar
  • 18. A. V. Gerbessiotis, http://web.njit.edu/~alexg/cluster/software.html (2018). Google Scholar
  • 19. B. Goglin, Managing the topology of heterogeneous cluster nodes with hardware locality (hwloc), in 2014 International Conference on High Performance Computing & Simulation (HPCS) (Bologna, 2014), pp. 74–81. https://doi.org/10.1109/HPCSim.2014.6903671 Google Scholar
  • 20. D. B. Skillicorn, J. M. D. Hill and W. F. McColl, Questions and answers about BSP, Scientific Programming 6 (1997) 249–274. CrossrefGoogle Scholar
  • 21. C. A. R. Hoare, Quicksort, The Computer Journal 5 (1962) 10–15. Crossref, ISIGoogle Scholar
  • 22. J. S. Huang and Y. C. Chow, Parallel sorting and data partitioning by sampling, in IEEE Computer Society’s Seventh International Computer Software and Applications Conference (November 1983), pp. 627–631. Google Scholar
  • 23. H. Inoue, T. Moriyama, H. Komatsu and T. Nakatani, A high-performance sorting algorithm for multicore single-instruction multiple-data processors, Softw. Pract. Exper. 42 (2012) 753–777. https://doi.org/10.1002/spe.1102 CrossrefGoogle Scholar
  • 24. M. F. Ionescu and K. E. Schauser, Optimizing parallel bitonic sort, in Proc. IEEE Parallel Processing Symposium (Geneva, Switzerland) (IEEE, 1997), pp. 303–309. Google Scholar
  • 25. D. E. Knuth, The Art of Computer Programming, Vol. III: Sorting and Searching (Addison-Wesley, Reading, 1973). Google Scholar
  • 26. D. Langr, P. Tvrdik and I. Simecek, AQsort: Scalable multi-array in-place sorting with OpenMP, Scalable Computing: Practice and Experience 17(4) (2016) 369–391. CrossrefGoogle Scholar
  • 27. F. T. Leighton, Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes (Morgan Kaufmann, California, 1991). Google Scholar
  • 28. D. Nassimi and S. Sahni, Parallel permutation and sorting algorithms and a new generalized connection network, Journal of the ACM 29(3) (July 1982) 642–667. CrossrefGoogle Scholar
  • 29. A. Maus, A full parallel radix sorting algorithm for multicore processors, in Proc. Norsk Informatikkonferanse (NIK 2011) (2011), pp. 37–48. Google Scholar
  • 30. R. L. Graham, T. S. Woodall and J. M. Squyres, Open MPI: A flexible high performance MPI, in Proc. 6th Annual International Conference on Parallel Processing and Applied Mathematics (Poznan, Poland, September 2005) [Springer Verlag Lecture Series in Computer Science, Lecture Notes in Computer Science, Vol. 3911, pp. 228–239]. Google Scholar
  • 31. H. Peters, O. Schulz-Hildebrandt and N. Luttenberger, Fast in-place, comparison-based sorting with CUDA: A study with bitonic sort, Concurrency Computat.: Pract. Exper. 23 (2011) 681–693. https://doi.org/10.1002/cpe.1686 Crossref, ISIGoogle Scholar
  • 32. S. Rathi, Optimizing sorting algorithms using ubiquitous multi-core massively parallel GPGPU processors, in Proc. 7th Int. Conference on Communication, Computing, and Vizualization (2016) [Procedia Computer Science 79 (2016) 231–237]. Google Scholar
  • 33. H. J. Reif and L. G. Valiant, A logarithmic time sort for linear size networks, Journal of the ACM 34 (January 1987) 60–76. Crossref, ISIGoogle Scholar
  • 34. R. Reischuk, Probabilistic parallel algorithms for sorting and selection SIAM Journal on Computing 14(2) (1985) 396–409. Crossref, ISIGoogle Scholar
  • 35. H. Shi and J. Schaeffer, Parallel sorting by regular sampling, Journal of Parallel and Distributed Computing 14 (1992) 362–372. Crossref, ISIGoogle Scholar
  • 36. A. Snavely, L. Carrington, N. Wolter, J. Labarta, R. Badia and A. Purkayastha, A framework for performance modeling and prediction, in Proc. of the 2002 ACM/IEEE Conference on Supercomputing (SC ’02) (IEEE Computer Society Press, Los Alamitos, CA, USA, 2002), pp. 1–17. Google Scholar
  • 37. L. G. Valiant, A bridging model for parallel computation Comm. of the ACM 33(8) (August 1990) 103–111. Crossref, ISIGoogle Scholar
  • 38. A. N. Yzelman, R. H. Bisseling, D. Roose and K. Meerbergen, MulticoreBSP for C: A high-performance library for shared-memory parallel programming, International Journal of Parallel Programming 42(4) (August 2014) 619–642. CrossrefGoogle Scholar
  • 39. L. G. Valiant, A bridging model for multi-core computing Journal of Computer and System Sciences 77(1) (2011) 154–166. CrossrefGoogle Scholar