DETERMINING RELEVANT FEATURES TO RECOGNIZE ELECTRON DENSITY PATTERNS IN X-RAY PROTEIN CRYSTALLOGRAPHY
Abstract
High-throughput computational methods in X-ray protein crystallography are indispensable to meet the goals of structural genomics. In particular, automated interpretation of electron density maps, especially those at mediocre resolution, can significantly speed up the protein structure determination process. TEXTALTM is a software application that uses pattern recognition, case-based reasoning and nearest neighbor learning to produce reasonably refined molecular models, even with average quality data. In this work, we discuss a key issue to enable fast and accurate interpretation of typically noisy electron density data: what features should be used to characterize the density patterns, and how relevant are they? We discuss the challenges of constructing features in this domain, and describe SLIDER, an algorithm to determine the weights of these features. SLIDER searches a space of weights using ranking of matching patterns (relative to mismatching ones) as its evaluation function. Exhaustive search being intractable, SLIDER adopts a greedy approach that judiciously restricts the search space only to weight values that cause the ranking of good matches to change. We show that SLIDER contributes significantly in finding the similarity between density patterns, and discuss the sensitivity of feature relevance to the underlying similarity metric.
References
- Protein Data Bank (PDB) Annual Report 2003, http://www.rscb.org/pdb/annual_report03.pdf . Google Scholar
-
I. Tsigelny (ed.) , Protein Structure Determination: Bioinformatic Approach ( International University Line , La Jolla , 2002 ) . Google Scholar - Nature Gen. 232, 151 (1999). Google Scholar
- Method Enzymol. 115, 189 (1985). Crossref, Medline, Google Scholar
- Acta Cryst. D55, 1309 (1999). Google Scholar
- Nature 343, 687 (1990). Crossref, Google Scholar
- Nature Struc. Biol. 6, 458 (1999). Crossref, Medline, Google Scholar
- Acta Cryst. D53, 179 (1997). Google Scholar
- Acta Cryst. D57, 1013 (2001). Google Scholar
- Turk D., Towards automatic macromolecular crystal structure determination, in Turk D., Johnson L. (eds.), Methods in Macromolecular Crystallography, NATO Science Series I, Vol. 325, pp. 148–155, 2001 . Google Scholar
-
J. Kolodner , Case-Based Reasoning ( Morgan Kaufmann Publishers , San Mateo , 1993 ) . Crossref, Google Scholar -
D. B. Leake (ed.) , Case-Based Reasoning — Experiences, Lessons and Future Directions ( MIT Press , Cambridge , 1996 ) . Google Scholar - Fix E., Hodges J., Discriminatory analysis, nonparametric discrimination: consistency properties, Technical Report 4, USAF School of Aviation Medicine, Randolph Field, Texas, 1951 . Google Scholar
- Phys. Rev. 103, 1645 (1956). Crossref, Google Scholar
- Science 254, 51 (1991). Crossref, Medline, Google Scholar
- Acta Cryst. 12, 794 (1959). Crossref, Google Scholar
- Method Enzymol. 276, 448 (1997). Crossref, Medline, Google Scholar
- Acta Cryst. A33, 13 (1997). Google Scholar
- Terry A., The CRYSALIS project: hierarchical control of production systems, Technical Report HPP-83-19, Stanford University, 1983 . Google Scholar
- Artificial Intelligence and Molecular Biology , ed.
L. Hunter ( MIT Press , Cambridge , 1993 ) . Google Scholar , - Method Enzymol. 277, 131 (1997). Crossref, Medline, Google Scholar
- EMBO J. 5(4), 819 (1986). Crossref, Medline, Google Scholar
- Proteins 36, 526 (1999). Crossref, Medline, Google Scholar
- J. Mol. Biol. 218, 183 (1991). Crossref, Medline, Google Scholar
- Acta Cryst. A47, 110 (1991). Google Scholar
- Acta Cryst. D56, 965 (2000). Google Scholar
- Crystallographic Computing 7, Proceedings from the Macromolecular Crystallography Computing School , eds.
P. E. Bourne and K. Watenpaugh ( Oxford University Press , Corby, UK , 1996 ) . Google Scholar , - Method Enzymol. 277, 173 (1997). Crossref, Medline, Google Scholar
- Acta Cryst. A27, 436 (1971). Google Scholar
- J. Mol. Biol. 147, 195 (1981). Crossref, Medline, Google Scholar
- Method Enzymol. 374, 244 (2003). Crossref, Medline, Google Scholar
- Acta Cryst. D5, 2043 (2002). Google Scholar
- Acta Cryst. D46, 722 (2000). Google Scholar
- Gopal K., Pai R., Ioerger T. R., Romo T. D., Sacchettini J. C., TEXTAL™: artificial intelligence techniques for automated protein structure determination, in Proceedings the 8th Conference on Innovative Applications of Artificial Intelligence, Acapulco, Mexico, pp. 93–100, 2003 . Google Scholar
- Brünger AT, XPLOR manual, version 3.1, Yale University, 1992 . Google Scholar
- Method Enzymol. 115, 206 (1985). Crossref, Medline, Google Scholar
- Acta Cryst. D50, 695 (1994). Google Scholar
- Smyth B., Cunningham P., The utility problem analyzed: a case-based reasoning perspective, 3rd European Workshop on Case-Based Reasoning, Lausanne, Switzerland, Advances in Case-Based Reasoning, Lecture Notes in Computer Science, pp. 392–399, 1996 . Google Scholar
- Gopal K., Romo T. D., Sacchettini J. C., Ioerger T. R., Evaluation of geometric & probabilistic measures of similarity to retrieve electron density patterns for protein structure determination, Proc ICAI-04, Las Vegas, pp. 427–432, 2004 . Google Scholar
- Gopal K., Romo T. D., Sacchettini J. C., Ioerger T. R., Efficient retrieval of electron density patterns for modeling proteins by X-ray crystallography, Proceedings of the International Conference on Machine Learning & Applications, Louisville, pp. 380–387, 2004 . Google Scholar
- J. Synchrotron. Rad. 11, 53 (2004). Crossref, Medline, Google Scholar
- Artif. Int. 97, 245 (1997). Crossref, Google Scholar
-
R. O. Duda , P. E. Hart and D. G. Stork , Pattern Classification ( John Wiley and Sons Inc , New York , 2001 ) . Google Scholar - Aha D. W., A study of instance-based algorithms for supervised learning tasks: mathematical, empirical, and psychological observations, Ph. D. Thesis, University of California, Irvine, 1990 . Google Scholar
-
H. Liu and H. Motoda (eds.) , Feature Extraction, Construction, and Selection: A Data Mining Perspective ( Kluwer , Boston , 1998 ) . Crossref, Google Scholar - Feature Extraction, Construction and Selection: A Data Mining Perspective , eds.
H. Liu and H. Motoda ( Kluwer , Boston , 1998 ) . Google Scholar , - Langley P., Iba W., Average-case analysis of a nearest neighbor algorithm, Proc. IJCAI-93, Chambery, France, pp. 889–894, 1993 . Google Scholar
- Kira K., Rendell L. A., A practical approach to feature selection, Proceedings of the 9th International Conference on Machine Learning, pp. 249–256, 1992 . Google Scholar
- John G., Kohavi R., Pfleger K., Irrelevant features and the subset selection problem, Proceedings of the 11th International Conference on Machine Learning, pp. 121–129, 1994 . Google Scholar
- Kohavi R., Langley P., Yun Y., The utility of feature weighting in nearest-neighbor algorithms, Proceedings of the European Conference on Machine Learning, 1997 . Google Scholar
- Artif. Int. Rev. 11, 227 (1997). Crossref, Google Scholar
- Lect. Notes. Artif. Int. 455 (1997). Google Scholar
- Artif. Int. 97, 345 (1997). Crossref, Google Scholar
- Jakulin A., Bratko I., Testing the significance of attribute interactions, Proceedings of the ICML-04, Banff, 2004 . Google Scholar
- Acta Cryst. D46, 722 (2000). Google Scholar
-
S. Russel and P. Norvig , Artificial Intelligence: A Modern Approach ( Prentice Hall , New Jersey , 1995 ) . Google Scholar - Protein Sci. 1, 409 (1992). Crossref, Medline, Google Scholar
- Nucleic Acids Res. 28, 235 (1992). Crossref, Medline, Google Scholar
- J. Mol. Biol. 333(4), 683 (2003). Crossref, Medline, Google Scholar
- Structure 6, 1207 (1998). Crossref, Medline, Google Scholar
- J. Biol. Chem. 277(11), 9462 (2002). Crossref, Medline, Google Scholar
- J. Biol. Chem. 277, 11559 (2002). Crossref, Medline, Google Scholar
- Aksoy S., Haralick R. M., Probabilistic vs. geometric similarity measures for image retrieval, Proceedings of the Computer Vision and Pattern Recognition (CPRV), pp. 112–128, 2001 . Google Scholar
- Kontkanen P., Myllymaki P., Silander T., Tirri H., A Bayesian approach for retrieving relevant cases, in Smith P. (ed.), Artificial Intelligence Applications, Proceedings of the EXPERSYS-97, Sunderland, UK, pp. 67–72, 1997 . Google Scholar