CLUSTERING ALGORITHMS FOR DETECTING FUNCTIONAL MODULES IN PROTEIN INTERACTION NETWORKS
Abstract
Protein–Protein Interaction (PPI) networks are believed to be important sources of information related to biological processes and complex metabolic functions of the cell. When studying the workings of a biological cell, it is useful to be able to detect known and predict still undiscovered protein complexes within the cell's PPI networks. Such predictions may be used as an inexpensive tool to direct biological experiments. The increasing amount of available PPI data necessitate a fast, accurate approach to biological complex identification. Because of its importance in the studies of protein interaction network, there are different models and algorithms in identifying functional modules in PPI networks. In this paper, we review some representative algorithms, focusing on the algorithms underlying the approaches and how the algorithms relate to each other. In particular, a comparison is given based on the property of the algorithms. Since the PPI network is noisy and still incomplete, some methods which consider other additional properties for preprocessing and purifying of PPI data are presented. We also give a discussion about the functional annotation and validation of protein complexes. Finally, new progress and future research directions are discussed from the computational viewpoint.
References
- Nature 402, C47 (1999), DOI: 10.1038/35011540. Crossref, Medline, Google Scholar
- Nature Reviews Genetics 5(2), 101 (2004), DOI: 10.1038/nrg1272. Crossref, Medline, Google Scholar
- Science 297(5586), 1551 (2002), DOI: 10.1126/science.1073374. Crossref, Medline, Google Scholar
- Proc. Natl. Acad. Sci. USA. 100(3), 1128 (2003), DOI: 10.1073/pnas.0237338100. Crossref, Medline, Google Scholar
- Nucleic. Acids. Res. 30(1), 303 (2002), DOI: 10.1093/nar/30.1.303. Crossref, Medline, Google Scholar
- IEEE Trans. Knowledge and Data Engineering 20(2), 172 (2008), DOI: 10.1109/TKDE.2007.190689. Crossref, Google Scholar
- BMC Bioinformatics 7(1), 207 (2006), DOI: 10.1186/1471-2105-7-207. Crossref, Medline, Google Scholar
- J. Comput. Biol. 12(6), 835 (2005), DOI: 10.1089/cmb.2005.12.835. Crossref, Medline, Google Scholar
- Genome Biology 5(8), R57 (2004), DOI: 10.1186/gb-2004-5-8-r57. Crossref, Medline, Google Scholar
- BMC Bioinformatics 8, 1 (2007), DOI: 10.1186/1471-2105-8-265. Medline, Google Scholar
- Phys. Rev. E. 75, 045102 (2007), DOI: 10.1103/PhysRevE.75.045102. Crossref, Google Scholar
S. Zhang , A graph-theoretic method for mining functional modules in large sparse protein interaction networks, Proceeding of Sixth IEEE International Conference on Data Mining—Workshops (ICDMW'06) (IEEE Computer Society Press, Los Alamitos, 2006) pp. 130–135. Google Scholar- Przulj N, Analyzing Large Biological Networks: Protein–protein interactions example, Ph.D. thesis, University of Toronto, Canada, 2005 . Google Scholar
- Proc. Natl. Acad. Sci. USA. 100(21), 12123 (2003), DOI: 10.1073/pnas.2032324100. Crossref, Medline, Google Scholar
- BMC Bioinformatics 4(1), 2 (2003), DOI: 10.1186/1471-2105-4-2. Crossref, Medline, Google Scholar
- Pac. Symp. Biocomp. 10, 221 (2005). Google Scholar
- Bioinformatics 22(18), 2283 (2006), DOI: 10.1093/bioinformatics/btl370. Crossref, Medline, Google Scholar
- Bioinformatics 20(3), 340 (2004), DOI: 10.1093/bioinformatics/btg415. Crossref, Medline, Google Scholar
- Information Processing Letters 76(4), 175 (2000), DOI: 10.1016/S0020-0190(00)00142-3. Crossref, Google Scholar
- IEEE/ACM Trans. Comput. Biol. Bioinform. 4, 233 (2007), DOI: 10.1109/TCBB.2007.070210. Crossref, Medline, Google Scholar
- Bioinformatics 20(17), 3013 (2004), DOI: 10.1093/bioinformatics/bth351. Crossref, Medline, Google Scholar
- Bioinformatics 21(3), 364 (2005), DOI: 10.1093/bioinformatics/bti021. Crossref, Medline, Google Scholar
- Van Dongen S, Graph clustering by flow simulation, Ph.D. thesis, Centers for Mathematics and Computer Science (CWI), University of Utrecht, 2000 . Google Scholar
- Nucleic. Acids. Res. 30(7), 1575 (2002), DOI: 10.1093/nar/30.7.1575. Crossref, Medline, Google Scholar
- Proteins 54(1), 49 (2004), DOI: 10.1002/prot.10505. Crossref, Medline, Google Scholar
- Proc. Natl. Acad. Sci. USA. 99, 7821 (2002), DOI: 10.1073/pnas.122653799. Crossref, Medline, Google Scholar
- Phys. Rev. E. 69(6), 066133 (2004), DOI: 10.1103/PhysRevE.69.066133. Crossref, Google Scholar
- Proc. Natl. Acad. Sci. USA. 101(1), 2658 (2004), DOI: 10.1073/pnas.0400054101. Crossref, Medline, Google Scholar
- Bioinformatics 23(2), 207 (2007), DOI: 10.1093/bioinformatics/btl562. Crossref, Medline, Google Scholar
- BMC Bioinformatics 6, 39 (2005), DOI: 10.1186/1471-2105-6-39. Crossref, Medline, Google Scholar
- Bioinformatics 22(24), 3106 (2006). Medline, Google Scholar
- BMC Bioinformatics 7, 1 (2006), DOI: 10.1186/1471-2105-7-488. Medline, Google Scholar
- Phys. Rev. Lett. 76, 3251 (1996), DOI: 10.1103/PhysRevLett.76.3251. Crossref, Medline, Google Scholar
- Nature 435(18), 814 (2005), DOI: 10.1038/nature03607. Crossref, Medline, Google Scholar
- Bioinformatics 22(8), 1021 (2006), DOI: 10.1093/bioinformatics/btl039. Crossref, Medline, Google Scholar
- FEBS. J. 272(Suppl. 1), 434 (2005). Google Scholar
- Comput. Biol. Chem. 30(6), 445 (2006), DOI: 10.1016/j.compbiolchem.2006.10.001. Crossref, Medline, Google Scholar
- J. Proteome. Res. 5, 801 (2006), DOI: 10.1021/pr050366g. Crossref, Medline, Google Scholar
- Bioinformatics 23(13), i29 (2007), DOI: 10.1093/bioinformatics/btm212. Crossref, Medline, Google Scholar
- BMC Bioinformatics 8, 482 (2007), DOI: 10.1186/1471-2105-8-482. Crossref, Medline, Google Scholar
- Comput. Syst. Bioinformatics. Conf. 6, 157 (2007), DOI: 10.1142/9781860948732_0019. Medline, Google Scholar
- J. Bioinform. Comput. Biol. 6(3), 435 (2008). Link, Google Scholar
- Int. J. Bioinform. Res. Appl. 3(1), 65 (2007), DOI: 10.1504/IJBRA.2007.011835. Crossref, Medline, Google Scholar
- ICIC2007, Lecture Notes in Artificial Intelligence 1090 (2007). Google Scholar
- Bioinformatics 23(1), 1124 (2007), DOI: 10.1093/bioinformatics/btm064. Crossref, Medline, Google Scholar
- BMC Bioinformatics 8, S5 (2007), DOI: 10.1186/1471-2105-8-S2-S5. Crossref, Medline, Google Scholar
-
W. T. Tutte , Graph Theory ( Cambridge University Press , Cambridge , 2001 ) . Google Scholar - , Knowledge Discovery in Proteomics , eds.
I. Jurisica and D. A. Wigle ( CRC Press , 2005 ) . Google Scholar - Computer Science Review 1(1), 27 (2007). Google Scholar
- Phys. Rev. Lett. 76(18), 3251 (1996), DOI: 10.1103/PhysRevLett.76.3251. Crossref, Medline, Google Scholar
- Proc. Natl. Acad. Sci. USA. 97(22), 12079 (2000), DOI: 10.1073/pnas.210134797. Crossref, Medline, Google Scholar
- Proteins 46, 405 (2002), DOI: 10.1002/prot.1176. Crossref, Medline, Google Scholar
- Nature 415(6868), 141 (2002), DOI: 10.1038/415141a. Crossref, Medline, Google Scholar
- Nucleic. Acids. Res. 32, D41 (2004), DOI: 10.1093/nar/gkh092. Crossref, Medline, Google Scholar
- , Knowledge Discovery in Bioinformatics: Techniques, Methods and Application , eds.
X. Hu and Y. Pan ( Wiley , 2007 ) . Google Scholar P. Pei and A. Zhang , A topological measurement for weighted protein interaction network, Proceedings of the IEEE Computer Society Bioinformatics Conference (2005) pp. 268–278. Google Scholar- Phys. Rev. E. 67, 061901 (2003), DOI: 10.1103/PhysRevE.67.061901. Crossref, Google Scholar
- Phys. Rev. E. 67, 041908 (2003), DOI: 10.1103/PhysRevE.67.041908. Crossref, Google Scholar
- Phys. Rev. E. 69, 066133 (2004), DOI: 10.1103/PhysRevE.69.066133. Crossref, Google Scholar
- Nature 433(7028), 895 (2005), DOI: 10.1038/nature03288. Crossref, Medline, Google Scholar
- Statistics and Computing 17(4), 395 (2007). Crossref, Google Scholar
R. Kannan , S. Vempala and A. Vetta , On clusterings: Good, bad, and spectral, Proceedings of the 41st Annual Symposium on the Foundation of Computer Science (IEEE Computer Society, Silver Spring, MD, 2000) pp. 367–380. Google Scholar- IEEE. Trans. Pattern. Anal. Mach. Intell. 22(8), 888 (2000). Google Scholar
- IEEE. Trans. Neural. Networks. 16(3), 645 (2005), DOI: 10.1109/TNN.2005.845141. Crossref, Medline, Google Scholar
- IEEE. Trans. Pattern. Anal. Mach. Intell. 29(11), 1944 (2007), DOI: 10.1109/TPAMI.2007.1115. Crossref, Medline, Google Scholar
- Dhillon IS, Guan Y, Kulis B, A unified view of kernel k-means, spectral clustering and graph partitioning, Technical Report, UTCS, TR-04-25, 2005 . Google Scholar
- Verma D, Meila M, A comparison of spectral clustering algorithms, Technical Report, Department of CSE University of Washington Seattle, WA, pp. 98195–92350, 2005 . Google Scholar
- Pattern Recognition 41(1), 176 (2008). Crossref, Google Scholar
- Nucleic. Acids. Res. 31(1), 2443 (2003), DOI: 10.1093/nar/gkg340. Crossref, Medline, Google Scholar
- Phys. Rev. E. 74, 036104 (2006), DOI: 10.1103/PhysRevE.74.036104. Crossref, Google Scholar
- Neural Information Processing Systems 19, 1017 (2006). Google Scholar
- Molecular Systems Biology 3(88), 1 (2007), DOI: 10.1038/msb4100129. Google Scholar
- Genome. Res. 14(6), 1170 (2004), DOI: 10.1101/gr.2203804. Crossref, Medline, Google Scholar
- Bioinformatics 19(15), 1869 (2003), DOI: 10.1093/bioinformatics/btg358. Crossref, Medline, Google Scholar
D. D. Wu and X. Hu , An efficient approach to detect a protein community from a seed, IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB2005) (2005) pp. 135–141. Google Scholar- IEEE. Trans. Nanobioscience. 6(1), 43 (2007), DOI: 10.1109/TNB.2007.891900. Crossref, Medline, Google Scholar
A. Fred and A. K. Jain , Data clustering using evidence accumulation, Proceedings of the 16th Int Conference on Pattern Recognition (2002) pp. 276–280. Google ScholarA. Topchy , Analysis of consensus partition in cluster ensemble, Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM'04) (2004) pp. 225–232. Google Scholar- Bioinformatics 23(2), e170 (2007). Medline, Google Scholar
- J. Comput. Biol. 14, 747 (2007), DOI: 10.1089/cmb.2007.R014. Crossref, Medline, Google Scholar
- Proc. Natl. Acad. Sci. USA. 104(4), 1283 (2007), DOI: 10.1073/pnas.0606914104. Crossref, Medline, Google Scholar
- Phys. Rev. E. 75, 045102 (2007), DOI: 10.1103/PhysRevE.75.045102. Crossref, Google Scholar
- Proc. Natl. Acad. Sci. USA. 105(23), 7907 (2008). Medline, Google Scholar
- Proc. Natl. Acad. Sci. USA. 104(18), 7327 (2007), DOI: 10.1073/pnas.0611034104. Crossref, Medline, Google Scholar
- Proc. Natl. Acad. Sci. USA. 105, 1118 (2008). Medline, Google Scholar
- Bioinformatics 24, 250 (2008), DOI: 10.1093/bioinformatics/btn164. Crossref, Medline, Google Scholar
S. Zhang , Prediction of protein complexes based on protein interaction data and functional annotation data using kernel methods, ICIC 2006, LNBI 4115 (2006) pp. 514–524. Google Scholar- New. J. Phys. 9, 180 (2007), DOI: 10.1088/1367-2630/9/6/180. Crossref, Google Scholar
- PLoS Computational Biology 4(7), e1000118 (2008), DOI: 10.1371/journal.pcbi.1000118. Crossref, Medline, Google Scholar
- Proc. Natl. Acad. Sci. USA. 98(8), 4569 (2001), DOI: 10.1073/pnas.061034498. Crossref, Medline, Google Scholar
- Nature 415(6868), 180 (2002), DOI: 10.1038/415180a. Crossref, Medline, Google Scholar
- Nature 425, 686 (2003), DOI: 10.1038/nature02026. Crossref, Medline, Google Scholar
- Curr. Opin. Microbiol. 7, 535 (2004), DOI: 10.1016/j.mib.2004.08.012. Crossref, Medline, Google Scholar
- Nucleic. Acids. Res. 30(20), 31 (2002), DOI: 10.1093/nar/30.1.31. Crossref, Medline, Google Scholar
- Nature 440(7084), 631 (2006), DOI: 10.1038/nature04532. Crossref, Medline, Google Scholar
- Nucleic. Acids. Res. 32(18), 5539 (2004), DOI: 10.1093/nar/gkh894. Crossref, Medline, Google Scholar
- Bioinformatics 22(13), 1623 (2006), DOI: 10.1093/bioinformatics/btl145. Crossref, Medline, Google Scholar


