A Parallel Local Search Algorithm for Clustering Large Biological Networks
Abstract
Clustering is an effective technique that can be used to analyze and extract useful information from large biological networks. Popular clustering solutions often require user input for several algorithm options that can seem very arbitrary without experimentation. These algorithms can provide good results in a reasonable time period but they are not above improvements. We present a local search based clustering algorithm free of such required input that can be used to improve the cluster quality of a set of given clusters taken from any existing algorithm or clusters produced via any arbitrary assignment. We implement this local search using a modern GPU based approach to allow for efficient runtime. The proposed algorithm shows promising results for improving the quality of clusters. With already high quality input clusters we can achieve cluster rating improvements upto to 33%.
References
- 1. , “ Knowledge-based analysis of microarray gene expression data by using support vector machines,” Proceedings of the National Academy of Sciences, Vol. 97, No. 1, pp. 262–267, 2000. Google Scholar
- 2. , “ From patterns to pathways: Gene expression data analysis comes of age,” Nature genetics, Vol. 32, p. 502, 2002. Crossref, ISI, Google Scholar
- 3. , , “ Global mapping of the yeast genetic interaction network,” Science, Vol. 303, No. 5659, pp. 808–813, 2004. Crossref, ISI, Google Scholar
- 4. , “ A roadmap of clustering algorithms: Finding a match for a biomedical application,” Briefings in Bioinformatics, Vol. 10, No. 3, pp. 297–314, 2009. Crossref, ISI, Google Scholar
- 5. , “ How does gene expression clustering work?,” Nature Biotechnology, Vol. 23, No. 12, pp. 1499–1502, 2005. Crossref, ISI, Google Scholar
- 6. , “ Impact of heuristics in clustering large biological networks,” Computational Biology and Chemistry, Vol. 59, pp. 28–36, 2015. Crossref, ISI, Google Scholar
- 7. , “ A min-max cut algorithm for graph partitioning and data clustering,” in Proceedings IEEE International Conference on Data Mining, 2001. ICDM 2001, pp. 107–114, IEEE, 2001. Google Scholar
- 8. , “ Microarray data mining using landmark gene-guided clustering,” BMC Bioinformatics, Vol. 9, No. 1, p. 92, 2008. Crossref, Google Scholar
- 9. S. Choudhury, Approximation algorithms for a graph-cut problem with applications to a clustering problem in bioinformatics. PhD thesis, Lethbridge, Alta.: University of Lethbridge, Department of Mathematics and Computer Science, 2008. Google Scholar
- 10. , “ Finding k cuts within twice the optimal,” SIAM Journal on Computing, Vol. 24, No. 1, pp. 101–108, 1995. Crossref, ISI, Google Scholar
- 11. , “ An approximation algorithm for max k-uncut with capacity constraints,” Optimization, Vol. 61, No. 2, pp. 143–150, 2012. Crossref, ISI, Google Scholar
- 12. , “ A local search algorithm for clustering large biological networks,” in Proceedings IEEE International Conference on Bioinformatics and Computational Biology, pp. 73–78, ISCA, 2017. Google Scholar
- 13. , Data Clustering: Algorithms and Applications, CRC Press, 2013. Crossref, Google Scholar
- 14. , “ SPICi: A fast clustering algorithm for large biological networks,” Bioinformatics, Vol. 26, No. 8, pp. 1105–1111, 2010. Crossref, ISI, Google Scholar
- 15. S. M. Van Dongen, “Graph clustering by flow simulation,” 2001. Google Scholar
- 16. , “ Protein complex prediction via cost-based clustering,” Bioinformatics, Vol. 20, No. 17, pp. 3013–3020, 2004. Crossref, ISI, Google Scholar
- 17. , Graph Clustering with Restricted Neighbourhood Search, University of Toronto, Toronto, 2004. Google Scholar
- 18. , “ Evaluation of clustering algorithms for protein-protein interaction networks,” BMC Bioinformatics, Vol. 7, No. 1, p. 488, 2006. Crossref, Google Scholar
- 19. , “ An automated method for finding molecular complexes in large protein interaction networks,” BMC Bioinformatics, Vol. 4, No. 1, p. 2, 2003. Crossref, Google Scholar
- 20. , “ A GPU implementation of parallel constraint-based local search,” in Proceedings of Distributed and Network-Based Processing, pp. 648–655, IEEE, 2014. Google Scholar
- 21. ,
Local Search Algorithms on Graphics Processing Units. A Case Study: The Permutation Perceptron Problem , pp. 264–275. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010. Google Scholar - 22. , “ Hybrid of genetic algorithm and local search to solve MAX-SAT problem using nVidia CUDA framework,” Genetic Programming and Evolvable Machines, Vol. 10, No. 4, p. 391, 2009. Crossref, ISI, Google Scholar
- 23. , “
Parallel GPU implementation of iterated local search for the travelling salesman problem ,” Learning and Intelligent Optimization, pp. 372–377, 2012. Crossref, Google Scholar - 24. , “ A GPU implementation of local search operators for symmetric travelling salesman problem,” PROMET-Traffic&Transportation, Vol. 25, No. 3, pp. 225–234, 2013. Crossref, ISI, Google Scholar
- 25. , “ Parallel local search,” Journal of Heuristics, Vol. 1, pp. 43–65, September 1995. Crossref, Google Scholar
- 26. , et al.., “ The biogrid interaction database: 2008 update,” Nucleic Acids Research, Vol. 36, Suppl. 1, pp. D637–D640, 2008. Google Scholar
- 27. , “ Arrayprospector: A web resource of functional associations inferred from microarray expression data,” Nucleic Acids Research, Vol. 32, Suppl. 2, pp. W445–W448, 2004. Crossref, Google Scholar
- 28. , “ How and when should interactome-derived clusters be used to predict functional modules and protein function?,” Bioinformatics, Vol. 25, No. 23, pp. 3143–3150, 2009. Crossref, ISI, Google Scholar


