GI-Cluster: Detecting genomic islands via consensus clustering on multiple features
Abstract
The accurate detection of genomic islands (GIs) in microbial genomes is important for both evolutionary study and medical research, because GIs may promote genome evolution and contain genes involved in pathogenesis. Various computational methods have been developed to predict GIs over the years. However, most of them cannot make full use of GI-associated features to achieve desirable performance. Additionally, many methods cannot be directly applied to newly sequenced genomes. We develop a new method called GI-Cluster, which provides an effective way to integrate multiple GI-related features via consensus clustering. GI-Cluster does not require training datasets or existing genome annotations, but it can still achieve comparable or better performance than supervised learning methods in comprehensive evaluations. Moreover, GI-Cluster is widely applicable, either to complete and incomplete genomes or to initial GI predictions from other programs. GI-Cluster also provides plots to visualize the distribution of predicted GIs and related features. GI-Cluster is available at https://github.com/icelu/GI_Cluster.
References
- 1. , Resolving the structural features of genomic islands: A machine learning approach, Genome Res 18 (2) :331–342, 2008. Crossref, Medline, Google Scholar
- 2. , Detecting genomic islands using bioinformatics approaches, Nat Rev Microbiol 8 (5) :373–382, 2010. Crossref, Medline, Google Scholar
- 3. , Genomic islands in pathogenic and environmental microorganisms, Nature Rev Microbiol 2 (5) :414–424, 2004. Crossref, Medline, Google Scholar
- 4. , Identification of novel genomic islands in liverpool epidemic strain of Pseudomonas aeruginosa using segmentation and clustering, Front Microbiol 7 :1210, 2016. Crossref, Medline, Google Scholar
- 5. , Pathogenicity islands of virulent bacteria: Structure, function and impact on microbial evolution, Mol Microbiol 23 (6) :1089–1097, 1997. Crossref, Medline, Google Scholar
- 6. , Pathogenicity islands in bacterial pathogenesis, Clin Microbiol Rev 17 (1) :14–56, 2004. Crossref, Medline, Google Scholar
- 7. , Genomic islands: Tools of bacterial horizontal gene transfer and evolution, FEMS Microbiol Rev 33 (2) :376–393, 2009. Crossref, Medline, Google Scholar
- 8. , Integration sites for genetic elements in prokaryotic tRNA and tmRNA genes: Sublocation preference of integrase subfamilies, Nucleic Acids Res 30 (4) :866–875, 2002. Crossref, Medline, Google Scholar
- 9. , Conjugative and mobilizable genomic islands in bacteria: Evolution and diversity, FEMS Microbiol Rev 38 :720–760, 2014. Crossref, Medline, Google Scholar
- 10. , Computational methods for predicting genomic islands in microbial genomes, Comput Struct Biotechnol J 14 :200–206, 2016. Crossref, Medline, Google Scholar
- 11. , Interpolated variable order motifs for identification of horizontally acquired DNA: Revisiting the Salmonella pathogenicity islands, Bioinf 22 (18) :2196–2203, 2006. Crossref, Medline, Google Scholar
- 12. , GI-SVM: A sensitive method for predicting genomic islands based on unannotated sequence of a single genome, J Bioinform Comput Biol 14 (1) :1640003, 2016. Link, Google Scholar
- 13. , Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models, BMC Bioinf 7 :142, 2006. Crossref, Medline, Google Scholar
- 14. , Islandviewer 4: Expanded prediction of genomic islands for larger-scale datasets, Nucleic Acids Res 45 (W1) :W30–W35, 2017. Crossref, Medline, Google Scholar
- 15. , A computational approach for identifying pathogenicity islands in prokaryotic genomes, BMC Bioinf 6 :184, 2005. Crossref, Medline, Google Scholar
- 16. , IslandPath: Aiding detection of genomic islands in prokaryotes, Bioinf 19 (3) :418–420, 2003. Crossref, Medline, Google Scholar
- 17. , Islander: A database of precisely mapped genomic islands in tRNA and tmRNA genes, Nucleic Acids Res 43 (D1) :D48–D53, 2015. Crossref, Medline, Google Scholar
- 18. , Classification of genomic islands using decision trees and their ensemble algorithms, BMC Genomics 11 (Suppl 2) :S1, 2010. Crossref, Medline, Google Scholar
- 19. , An accurate genomic island prediction method for sequenced bacterial and archaeal genomes, J Proteomics Bioinform 7 (8) :214, 2014. Google Scholar
- 20. , GI-POP: A combinational annotation and genomic island prediction pipeline for ongoing microbial genome projects, Gene 518 (1) :114–123, 2013. Crossref, Medline, Google Scholar
- 21. , Towards more robust methods of alien gene detection, Nucleic Acids Res 39 (9) :e56, 2011. Crossref, Medline, Google Scholar
- 22. , A survey of clustering ensemble algorithms, Int J Pattern Recognit 25 (3) :337–372, 2011. Link, Google Scholar
- 23. , A systematic method to identify genomic islands and its applications in analyzing the genomes of Corynebacterium glutamicum and Vibrio vulnificus CMCP6 chromosome I, Bioinf 20 (5) :612–622, 2004. Crossref, Medline, Google Scholar
- 24. , Detection of genomic islands via segmental genome heterogeneity, Nucleic Acids Res 37 (16) :5255–5266, 2009. Crossref, Medline, Google Scholar
- 25. , A new computational method for the detection of horizontal gene transfer events, Nucleic Acids Res 33 (3) :922–933, 2005. Crossref, Medline, Google Scholar
- 26. , Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data, Mach Learn 52 (1–2) :91–118, 2003. Crossref, Google Scholar
- 27. , ConsensusClusterPlus: A class discovery tool with confidence assessments and item tracking, Bioinf 26 (12) :1572–1573, 2010. Crossref, Medline, Google Scholar
- 28. , Ckmeans. 1d. dp: Optimal k-means clustering in one dimension by dynamic programming, R J 3 (2) :29–33, 2011. Crossref, Medline, Google Scholar
- 29. , mclust 5: Clustering, classification and density estimation using gaussian finite mixture models, R J 8 (1) :289–317, 2016. Crossref, Medline, Google Scholar
- 30. , Zisland explorer: Detect genomic islands by combining homogeneity and heterogeneity properties, Brief Bioinf 18 (3) :357–366, 2017. Medline, Google Scholar
- 31. , Newly introduced genomic prophage islands are critical determinants of in vivo competitiveness in the liverpool epidemic strain of Pseudomonas aeruginosa, Genome Res 19 (1) :12–23, 2009. Crossref, Medline, Google Scholar
- 32. , ICEberg: A web-based resource for integrative and conjugative elements found in Bacteria, Nucleic Acids Res 40 (D1) :D621–D626, 2012. Crossref, Medline, Google Scholar
- 33. , , Comparative genomics reveals mechanism for short-term and long-term clonal transitions in pandemic Vibrio cholerae, Proc Natl Acad Sci USA 106 (36) :15442–15447, 2009. Crossref, Medline, Google Scholar
- 34. , Evaluation of genomic island predictors using a comparative genomics approach., BMC Bioinf 9 :329, 2008. Crossref, Medline, Google Scholar
- 35. , IslandViewer: An integrated interface for computational identification and visualization of genomic islands, Bioinf 25 (5) :664–665, 2009. Crossref, Medline, Google Scholar
- 36. , Identification of a modular pathogenicity island that is widespread among urease-producing uropathogens and shares features with a diverse group of mobile elements, Infect Immun 77 (11) :4887–4894, 2009. Crossref, Medline, Google Scholar