NESTED OPENMP PARALLELIZATION OF A HIERARCHICAL DATA CLUSTERING ALGORITHM
Abstract
This paper presents a high performance parallel implementation of a hierarchical data clustering algorithm. The OpenMP programming model, either enhanced with our lightweight runtime support or through its tasking model, deals with the high irregularity of the algorithm and allows for efficient exploitation of the inherent loop-level nested parallelism. Thorough experimental evaluation demonstrates the performance scalability of our parallelization and the effective utilization of computational resources, which results in a clustering approach able to provide high quality clustering of very large datasets.
References
- Patterns Analysis and Applications 4, 108 (2001), DOI: 10.1007/s100440170011. Crossref, ISI, Google Scholar
D. Arlia and M. Coppola , Experiments in Parallel Clustering using DBSCAN, Proceedings of Euro-Par (2001) pp. 326–331. Google Scholar- OpenMP Architecture Review Board. OpenMP specifications. Available at , http://www.openmp.org . Google Scholar
-
M. Dash , S. Petrutiu and P. Scheuermann , Efficient Parallel Hierarchical Clustering , Proceedings of the European Conference on Parallel Computing (EUROPAR 2004) . Google Scholar -
V. V. Dimakopoulos , P. E. Hadjidoukas and G. Ch. Philos , A Microbenchmark Study of OpenMP Overheads Under Nested Parallelism , Proceedings of the 4th International Workshop on OpenMP (IWOMP 2003) . Google Scholar -
V. V. Dimakopoulos , X. Tzoumas and E. Leontiadis , A Portable Compiler for OpenMP v. 2.0 , Proceedings of the 5th European Workshop on OpenMP (EWOMP 2003) . Google Scholar - NeuroComputing 70, 809 (2007). Crossref, ISI, Google Scholar
- International Journal of Supercomputer Applications and High Performance Computing 6(3–4), 159 (1994). Google Scholar
M. Gonzalez , Openmp extensions for thread groups and their runtime support, In Workshop on Languages and Compilers for Parallel Computing (Springer-Verlag, 2000) pp. 317–331. Google Scholar-
S. Guha , R. Rastogi and K. Shim , CURE: An Efficient Clustering Algorithm for Large DataBases , Proceedings of the ACM SIGMOD International Conference on Management of Data . Google Scholar -
P. E. Hadjidoukas and V. V. Dimakopoulos , Nested Parallelism in the OMPi OpenMP/C Compiler , Proceedings of Europar '07 . Google Scholar - [12]J. P. Hoeflinger. Extending OpenMP to Clusters, 2006. White Paper, Intel Corporation . Google Scholar
-
E. Januzaj , H.-P. Kriegel and M. Pfeifle , Towards Effective and Efficient Distributed Clustering , Proceedings of Workshop on Clustering Large Data Sets (ICDM2003) ( 1998 ) . Google Scholar - IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 871 (1998), DOI: 10.1109/34.709614. Crossref, ISI, Google Scholar
- WSEAS Transactions on Information Science and Applications 1, 73 (2004), DOI: 10.1109/34.709614. Google Scholar
-
H. S. Nagesh , S. Goil and A. Choudhary , A Scalable Parallel Subspace Clustering Algorithm for Massive Data Sets , Proceedings of the International Conference on Parallel Processing (ICPP'2000) ( 2000 ) . Google Scholar - Parallel Computing 21, 1313 (1995), DOI: 10.1016/0167-8191(95)00017-I. Crossref, ISI, Google Scholar
- OpenMP Architecture Review Board. Openmp application program interface, version 3.0, May 2008 . Google Scholar
- IEEE Transactions on Knowledge and Data Engineering 15(3), (2003), DOI: 10.1109/TKDE.2003.1198395. Google Scholar
K. Stoffel and A. Belkoniene , Parallel K-Means Clustering for Large Data Sets, Proceedings of EuroPar '99 (1999) pp. 1451–1454. Google Scholar-
Y. Tanaka , Performance Evaluation of OpenMP Applications with Nested Parallelism , Proceedings of the Fifth Workshop on Languages, Compilers and Run-Time Systems for Scalable Computers (LCR '2000) ( 2000 ) . Google Scholar -
S. Thibault , An Efficient OpenMP Runtime System for Hierarchical Architectures , Proceedings of the 3rd International Workshop on OpenMP (IWOMP '2008) ( 2007 ) . Google Scholar - IEEE Transactions on Neural Networks 16(3), 645 (2005), DOI: 10.1109/TNN.2005.845141. Crossref, ISI, Google Scholar
- B. Zang and M. Hsu. Scale Up Center-Based Data Clustering Algorithms by Parallelism. In Technical Report HPL-2000-6, Software Technology Laboratory, HP Laboratories, Palo Alto, January 2000 . Google Scholar


