World Scientific
  • Search
  •   
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

A model-based clustering algorithm with covariates adjustment and its application to lung cancer stratification

    https://doi.org/10.1142/S0219720023500191Cited by:0 (Source: Crossref)

    Usually, the clustering process is the first step in several data analyses. Clustering allows identify patterns we did not note before and helps raise new hypotheses. However, one challenge when analyzing empirical data is the presence of covariates, which may mask the obtained clustering structure. For example, suppose we are interested in clustering a set of individuals into controls and cancer patients. A clustering algorithm could group subjects into young and elderly in this case. It may happen because the age at diagnosis is associated with cancer. Thus, we developed CEM-Co, a model-based clustering algorithm that removes/minimizes undesirable covariates’ effects during the clustering process. We applied CEM-Co on a gene expression dataset composed of 129 stage I non-small cell lung cancer patients. As a result, we identified a subgroup with a poorer prognosis, while standard clustering algorithms failed.

    References

    • 1. MacQueen J, Some methods for classification and analysis of multivariate observations, Proc Fifth Berkeley Symp Mathematical Statistics and Probability, University of California Press, pp. 281–297, 1967. Google Scholar
    • 2. Kaufman L, Rousseeuw P, Clustering by Means of Medoids, North-Holland, 1987. Google Scholar
    • 3. Ward Jr, JH, Hierarchical grouping to optimize an objective function, J Am Stat Assoc 58(301) :236–244, 1963. CrossrefGoogle Scholar
    • 4. McLachlan GJ, Peel D, Finite Mixture Models, John Wiley & Sons, 2004. CrossrefGoogle Scholar
    • 5. Ester M, Kriegel HP, Sander J, Xu X, A density-based algorithm for discovering clusters in large spatial databases with noise, Proc Second Int Conf Knowledge Discovery and Data Mining, AAAI Press, pp. 226–231, 1996. Google Scholar
    • 6. Cheng Y, Mean shift, mode seeking, and clustering, IEEE Trans Pattern Anal Mach Intell 17(8) :790–799, 1995. CrossrefGoogle Scholar
    • 7. Bezdek JC, Ehrlich R, Full W, FCM: The fuzzy c-means clustering algorithm, Comput Geosci 10(2–3) :191–203, 1984. CrossrefGoogle Scholar
    • 8. Von Luxburg U, A tutorial on spectral clustering, Stat Comput 17(4) :395–416, 2007. CrossrefGoogle Scholar
    • 9. Bandeen-Roche K, Miglioretti DL, Zeger SL, Rathouz PJ, Latent variable regression for multiple discrete outcomes, J Am Stat Assoc 92(440) :1375–1386, 1997. CrossrefGoogle Scholar
    • 10. Dayton CM, Macready GB, Concomitant-variable latent-class models, J Am Stat Assoc 83(401) :173–178, 1988. CrossrefGoogle Scholar
    • 11. Asparouhov T, Muth́en B, Auxiliary variables in mixture modeling: Three-step approaches using Mplus, Struct Equ Model Multidiscip J 21(3) :329–341, 2014. CrossrefGoogle Scholar
    • 12. Kamata A, Kara Y, Patarapichayatham C, Lan P, Evaluation of analysis approaches for latent class analysis with auxiliary linear growth model, Front Psychol 9 :130, 2018. Crossref, MedlineGoogle Scholar
    • 13. Nylund-Gibson K, Grimm R, Quirk M, Furlong M, A latent transition mixture model using the three-step specification, Struct Equ Model Multidiscip J 21(3) :439–454, 2014. CrossrefGoogle Scholar
    • 14. Vermunt JK, Latent class modeling with covariates: Two improved three-step approaches, Polit Anal 18(4) :450–469, 2010. CrossrefGoogle Scholar
    • 15. Gudicha DW, Vermunt JK, Mixture model clustering with covariates using adjusted three-step approaches, in Algorithms from and for Nature and Life, Springer, pp. 87–94, 2013. CrossrefGoogle Scholar
    • 16. Bolck A, Croon M, Hagenaars J, Estimating latent structure models with categorical variables: One-step versus three-step estimators, Polit Anal 12(1) :3–27, 2004. CrossrefGoogle Scholar
    • 17. Vermunt JK, Magidson J, Latent Gold 4.0 Userś Guide, 2005. Google Scholar
    • 18. Celeux G, Govaert G, A classification EM algorithm for clustering and two stochastic versions, Comput Stat Data Anal 14(3) :315–332, 1992. CrossrefGoogle Scholar
    • 19. Celeux G, Govaert G, Gaussian parsimonious clustering models, Pattern Recognit 28(5) :781–793, 1995. CrossrefGoogle Scholar
    • 20. Dereniowski D, Kubale M, Cholesky factorization of matrices in parallel and ranking of graphs, Int Conf Parallel Processing and Applied Mathematics, Springer, pp. 985–992, 2003. Google Scholar
    • 21. Wilks SS, The large-sample distribution of the likelihood ratio for testing composite hypotheses, Ann Math Stat 9(1) :60–62, 1938. CrossrefGoogle Scholar
    • 22. Hogg RV, McKean J, Craig AT, Introduction to Mathematical Statistics, Pearson Education, 2005. Google Scholar
    • 23. Drton M, Plummer M, A Bayesian information criterion for singular models, J R Stat Soc 79(2) :323–380, 2017. CrossrefGoogle Scholar
    • 24. De Boor C, On calculating with B-splines, J Approx Theory 6(1) :50–62, 1972. CrossrefGoogle Scholar
    • 25. Schmittgen TD, Livak KJ, Analyzing real-time PCR data by the comparative CTmethod, Nat Protoc 3(6) :1101–1108, 2008. Crossref, MedlineGoogle Scholar
    • 26. Ding C, He X, K-means clustering via principal component analysis, Proc Twenty-First Int Conf Machine Learning, Association for Computing Machinery, pp. 29, 2004. CrossrefGoogle Scholar
    • 27. Havens TC, Bezdek JC, Leckie C, Hall LO, Palaniswami M, Fuzzy c-means algorithms for very large data, IEEE Trans Fuzzy Syst 20(6) :1130–1146, 2012. CrossrefGoogle Scholar
    • 28. Zhu C-Q et al., Prognostic and predictive gene signature for adjuvant chemotherapy in resected non-small-cell lung cancer, J Clin Oncol 28(29) :4417–4424, 2010. Crossref, MedlineGoogle Scholar
    • 29. McInnes L, Healy J, Saul N, Großberger L, UMAP: Uniform manifold approximation and projection, J Open Source Softw 3(29) :861, 2018, https://doi.org/10.21105/joss.00861. CrossrefGoogle Scholar
    • 30. Campello RJGB, Moulavi D, Sander J, Density-based clustering based on hierarchical density estimates, in Pei J, Tseng VS, Cao L, Motoda H, Xu G (eds.), Advances in Knowledge Discovery and Data Mining, Springer, Berlin, pp. 160–172, 2013. CrossrefGoogle Scholar
    • 31. McLachlan G, Krishnan T, The EM Algorithm and Extensions, John Wiley & Sons, 2007. Google Scholar
    • 32. Wu CFJ, On the convergence properties of the EM algorithm, Ann Stat 11(1) :95–103, 1983. CrossrefGoogle Scholar