EVALUATING MIXTURE MODELS FOR BUILDING RNA KNOWLEDGE-BASED POTENTIALS
Abstract
Ribonucleic acid (RNA) molecules play important roles in a variety of biological processes. To properly function, RNA molecules usually have to fold to specific structures, and therefore understanding RNA structure is vital in comprehending how RNA functions. One approach to understanding and predicting biomolecular structure is to use knowledge-based potentials built from experimentally determined structures. These types of potentials have been shown to be effective for predicting both protein and RNA structures, but their utility is limited by their significantly rugged nature. This ruggedness (and hence the potential's usefulness) depends heavily on the choice of bin width to sort structural information (e.g. distances) but the appropriate bin width is not known a priori. To circumvent the binning problem, we compared knowledge-based potentials built from inter-atomic distances in RNA structures using different mixture models (Kernel Density Estimation, Expectation Minimization and Dirichlet Process). We show that the smooth knowledge-based potential built from Dirichlet process is successful in selecting native-like RNA models from different sets of structural decoys with comparable efficacy to a potential developed by spline-fitting — a commonly taken approach — to binned distance histograms. The less rugged nature of our potential suggests its applicability in diverse types of structural modeling.
References
- Nat. Methods. 7, 291 (2010). Crossref, Medline, Google Scholar
- Proc. Natl. Acad. Sci. USA. 104, 3177 (2007). Crossref, Medline, Google Scholar
- Curr. Opin. Struct. Biol. 18, 342 (2010). Crossref, Medline, Google Scholar
- PLoS. Comput. Biol. 5, e1000406 (2009). Crossref, Medline, Google Scholar
- RNA 15, 189 (2009). Crossref, Medline, Google Scholar
- RNA 17, 1066 (2011). Crossref, Medline, Google Scholar
- Bioinformatics 27, 1086 (2011). Crossref, Medline, Google Scholar
- Proc. Natl. Acad. Sci. USA. 104, 14664 (2007). Crossref, Medline, Google Scholar
- RNA 16, 1769 (2010). Crossref, Medline, Google Scholar
- J. Mol. Biol. 275, 895 (1998). Crossref, Medline, Google Scholar
- J. Mol. Biol. 213, 859 (1990). Crossref, Medline, Google Scholar
- Proc. Natl. Acad. Sci. USA. 105, 20239 (2008). Crossref, Medline, Google Scholar
- Royal Statistical Society 53, 683 (1991). Google Scholar
- Proteins 44, 223 (2001). Crossref, Medline, Google Scholar
- J. Royal. Stat. Soc. B. (Methodological) 1 (1977). Google Scholar
- Adv. Neural. Inform. Proc. Sys. 12, 554 (2000). Google Scholar
- Neal RM, Markov chain sampling methods for Dirichlet process mixture models, Technical Report No. 9815, Department of Statistics, University of Toronto, 1998 . Google Scholar
- Ann. Math. Stat. 33, 1065 (1962). Crossref, Google Scholar
- J. Mach. Learn. Res. 6, 1705 (2005). Google Scholar
V. Garcia , F. Nielsen and R. Nock , Levels of details for Gaussian mixture models, Computer Vision, ACCV 2009 (2010) pp. 514–525. Google Scholar- Schwander O, Nielsen F, Statistical Signal Processing Workshop (SSP), 2011 . Google Scholar
- Proteins 53, 76 (2003). Crossref, Medline, Google Scholar
- J. Mol. Biol. 257, 716 (1996). Crossref, Medline, Google Scholar
- Protein. Sci. 11, 43 (2002). Google Scholar