World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Mon, Jun 21st, 2021 at 1am (EDT)

During this period, the E-commerce and registration of new users may not be available for up to 6 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

Protein–protein interaction site prediction using random forest proximity distance

    A front-end method based on random forest proximity distance (PD) is used to screen the test set to improve protein–protein interaction site (PPIS) prediction. The assessment of a distance metric is done under the assumption that a distance definition of higher quality leads to higher classification. On an independent test set, the numerical analysis based on statistical inference shows that the PD has the advantage over Mahalanobis and Cosine distance. Based on the fact that the proximity distance depends on the tree composition of the random forest model, an iterative method is designed to optimize the proximity distance, which adjusts the tree composition of the random forest model by adjusting the size of the training set. Two PD metrics, 75PD and 50PD, are obtained by the iterative method. On two independent test sets, compared with the PD produced by the original training set, the values of 75PD in Matthews correlation coefficient and F1 score were higher, and the differences between them were statistically significant. All numerical experiments show that the closer the distance between the test data and the training data, the better the prediction results of the predictor. These indicate that the iterative method can optimize proximity distance definition and the distance information provided by PD can be used to indicate the reliability of prediction results.

    References

    • 1. Sudha G, Nussinov R, Srinivasan N, An overview of recent advances in structural bioinformatics of protein-protein interactions and a guide to their principles, Prog Biophys Mol Biol 116 :141–150, 2014. Crossref, MedlineGoogle Scholar
    • 2. Krüger DM, Gohlke H, Drugscoreppi webserver: Fast and accurate in silico alanine scanning for scoring protein–protein interactions, Nucleic Acids Res 38 :480–486, 2010. CrossrefGoogle Scholar
    • 3. Bradshaw RT, Patel BH, Tate EW, Leatherbarrow RJ, Gould IR, Comparing experimental and computational alanine scanning techniques for probing a prototypical protein–protein interaction, Protein Eng Des Sel 24 :197–207, 2011. Crossref, MedlineGoogle Scholar
    • 4. Aumentado-Armstrong TT, Istrate B, Murgita RA, Algorithmic approaches to protein-protein interaction site prediction, Algorithms Mol Biol 10 :7, 2015. Crossref, MedlineGoogle Scholar
    • 5. Qiu Z, Zhou B, Yuan J, Protein-protein interaction site predictions with minimum covariance determinant and Mahalanobis distance, J Theor Biol 433 :57–63, 2017. Crossref, MedlineGoogle Scholar
    • 6. Breiman L, Random forests, Mach Learn 45 :5–32, 2001. CrossrefGoogle Scholar
    • 7. Peng K, Obradovic Z, Vucetic S, Exploring bias in the protein data bank using contrast classifiers, Pac Symp Biocomput 9 :435–446, 2004. Google Scholar
    • 8. Sakai H, Tsukihara T, Structures of membrane proteins determined at atomic resolution, J Biochem 124 :1051–1059, 2008. CrossrefGoogle Scholar
    • 9. Wang H, Lin C, Yang F, Hu X, Hedged predictions for traditional chinese chronic gastritis diagnosis with confidence machine, Comput Biol Med 39 :425–432, 2009. Crossref, MedlineGoogle Scholar
    • 10. Englund C, Verikas A, A novel approach to estimate proximity in a random forest: An exploratory study, Expert Syst Appl 39 :13046–13050, 2012. CrossrefGoogle Scholar
    • 11. Peng Z, Su X, Ge T, Fan J, Propensity score and proximity matching using random forest, Contemp Clin Trials 47 :85–92, 2016. Crossref, MedlineGoogle Scholar
    • 12. Chen XW, Jeong JC, Sequence-based prediction of protein interaction sites with an integrative method, Bioinformatics 25 :585–591, 2009. Crossref, MedlineGoogle Scholar
    • 13. Bradford JR, Westhead DR, Improved prediction of protein-protein binding sites using a support vector machines approach, Bioinformatics 21 :1487–1494, 2005. Crossref, MedlineGoogle Scholar
    • 14. Zhang QC, Deng L, Fisher M, Guan J, Honig B, Petrey D, Predus: A web server for predicting protein interfaces using structural neighbors, Nucleic Acids Res 39 :283–287, 2011. CrossrefGoogle Scholar
    • 15. Hwang H, Pierce J, Mintseris J, Janin J, Weng Z, Protein-protein docking benchmark version 3.0, Proteins 73 :705–709, 2008. Crossref, MedlineGoogle Scholar
    • 16. Kabsch W, Sander C, Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features, Biopolymers 22 :2577–2637, 1983. Crossref, MedlineGoogle Scholar
    • 17. Qiu Z, Wang X, Prediction of protein–protein interaction sites using patch-based residue characterization, J Theor Biol 293 :143–150, 2012. Crossref, MedlineGoogle Scholar
    • 18. Matthews BW, Comparison of the predicted and observed secondary structure of t4 phage lysozyme, Biochim Biophys Acta 405 :442–451, 1975. Crossref, MedlineGoogle Scholar
    • 19. Chou KC, Using subsite coupling to predict signal peptides, Protein Eng 14 :75–79, 2001. Crossref, MedlineGoogle Scholar
    • 20. Deng L, Guan J, Dong Q, Zhou S, Prediction of protein-protein interaction sites using an ensemble method, BMC Bioinform. 10 :426, 2009. Crossref, MedlineGoogle Scholar
    • 21. Baldi P, Brunak S, Chauvin Y, Andersen CA, Nielsen H, Assessing the accuracy of prediction algorithms for classification: An overview, Bioinformatics 16 :412–424, 2000. Crossref, MedlineGoogle Scholar
    • 22. de Vries SJ, Bonvin AM, How proteins get in touch: Interface prediction in the study of biomolecular complexes, Curr Protein Pept Sci 9 :394–406, 2008. Crossref, MedlineGoogle Scholar
    • 23. Nussinov R, Schreiber G, Computational Protein-protein Interactions, CRC Press, Boca Raton, 2009. CrossrefGoogle Scholar
    • 24. Qiu Z, Qin C, Jiu M, Wang X, A simple iterative method to optimize protein-ligand-binding residue prediction, J Theor Biol 317 :219–223, 2013. Crossref, MedlineGoogle Scholar
    Published: 19 November 2020