KULLBACK–LEIBLER MARKOV CHAIN MONTE CARLO — A NEW ALGORITHM FOR FINITE MIXTURE ANALYSIS AND ITS APPLICATION TO GENE EXPRESSION DATA
Abstract
In this paper, we study Bayesian analysis of nonlinear hierarchical mixture models with a finite but unknown number of components. Our approach is based on Markov chain Monte Carlo (MCMC) methods. One of the applications of our method is directed to the clustering problem in gene expression analysis. From a mathematical and statistical point of view, we discuss the following topics: theoretical and practical convergence problems of the MCMC method; determination of the number of components in the mixture; and computational problems associated with likelihood calculations. In the existing literature, these problems have mainly been addressed in the linear case. One of the main contributions of this paper is developing a method for the nonlinear case. Our approach is based on a combination of methods including Gibbs sampling, random permutation sampling, birth-death MCMC, and Kullback–Leibler distance.
References
-
P. Müller , A generic approach to posterior integration and Gibbs sampling, Technical Report ( Purdue University , West Lafayette, IN , 1990 ) . Google Scholar -
T. Tatarinova , Bayesian analysis of linear and nonlinear mixture models, Doctoral dissertation ( University of Southern California , Los Angeles, CA , 2006 ) . Google Scholar - J. Am. Stat. Assoc. 95, 957 (2000), DOI: 10.2307/2669477. Crossref, Google Scholar
- J. Am. Stat. Assoc. 96(453), 194 (2001), DOI: 10.1198/016214501750333063. Crossref, Google Scholar
- Ann. Stat. 28(1), 40 (2000), DOI: 10.1214/aos/1016120364. Crossref, Google Scholar
- J. R. Stat. Soc. Ser. B 59, 768 (1997). Google Scholar
- Stephens M., Methods for mixtures of normal distributions, Ph.D. thesis, University of Oxford, Oxford, UK, 1997 . Google Scholar
- J. R. Stat. Soc. Ser. B 62, 795 (2000), DOI: 10.1111/1467-9868.00265. Crossref, Google Scholar
- Can. J. Stat. 31(1), 3 (2003), DOI: 10.2307/3315900. Crossref, Google Scholar
- Am. Stat. 46, 167 (1992), DOI: 10.2307/2685208. Google Scholar
- Zhou C., A Bayesian model for curve clustering with application to gene expression analysis, Ph.D. thesis, University of Washington, Seattle, WA, 2003 . Google Scholar
- J. Pharmacokinet. Pharmacodyn. 29, 217 (2002), DOI: 10.1023/A:1020206907668. Google Scholar
D. L. Dowe , Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD98),Lecture Notes in Artificial Intelligence 1394 (Springer-Verlag, Berlin, 1998) pp. 87–95. Google Scholar- Fitzgibbon L. J., Dowe D. L., Allison L., Message from Monte Carlo, Technical Report 107, School of Computer Science and Software Engineering, Monash University, Clayton, Victoria, Australia, 2002 . Google Scholar
- Science 282(5389), 699 (1998), DOI: 10.1126/science.282.5389.699. Crossref, Medline, Google Scholar
- Bayesian. Stat. 7, 721 (2003). Google Scholar
- Proc. Natl. Acad. Sci. USA 95, 14863 (1998), DOI: 10.1073/pnas.95.25.14863. Crossref, Medline, Google Scholar
- Proc. Fifth Berkeley Symp. Math. Stat. Probab. 1, 281 (1967). Google Scholar
- Appl. Stat. 28, 126 (1979), DOI: 10.2307/2346830. Google Scholar
- Proc. IEEE 78, 1464 (1990), DOI: 10.1109/5.58325. Crossref, Google Scholar
- Proc. Natl. Acad. Sci. USA 96, 2907 (1999), DOI: 10.1073/pnas.96.6.2907. Crossref, Medline, Google Scholar
- Pac. Symp. Biocomput. 455 (2000). Medline, Google Scholar
- Yeung K. Y., Ruzzo W. L., An empirical study on principal component analysis for clustering gene expression data, Technical Report UW-CSE-00-11-03, Dept. of Computer Science and Engineering, University of Washington, Seattle, WA, 2000 . Google Scholar
- Proc. Natl. Acad. Sci. USA 97, 10101 (2000), DOI: 10.1073/pnas.97.18.10101. Crossref, Medline, Google Scholar
-
L. Kaufman and P. Rousseeuw , Finding Groups in Data: An Introduction to Cluster Analysis ( Wiley , New York , 1990 ) . Crossref, Google Scholar - Bioinformatics 17, 977 (2001), DOI: 10.1093/bioinformatics/17.10.977. Crossref, Medline, Google Scholar
- Fraley C., Raftery A. E., MCLUST: Software for model-based clustering, density estimation and discriminant analysis, Technical Report, Department of Statistics, University of Washington, Seattle, WA, 2007 . Google Scholar
- J. Am. Stat. Assoc. 97, 611 (2002), DOI: 10.1198/016214502760047131. Crossref, Google Scholar
- Bioinformatics 18, 413 (2002), DOI: 10.1093/bioinformatics/18.3.413. Crossref, Medline, Google Scholar
- Bioinformatics 18, 1194 (2002), DOI: 10.1093/bioinformatics/18.9.1194. Crossref, Medline, Google Scholar
- Bioinformatics 20, 1222 (2004), DOI: 10.1093/bioinformatics/bth068. Crossref, Medline, Google Scholar
- Biometrics 61, 10 (2005), DOI: 10.1111/j.0006-341X.2005.031032.x. Crossref, Medline, Google Scholar
- Bioinformatics 22, 2405 (2006), DOI: 10.1093/bioinformatics/btl406. Crossref, Medline, Google Scholar


