Despite the rapid development, the field of data mining and knowledge discovery (DMKD) is still vaguely defined and lack of integrated descriptions. This situation causes difficulties in teaching, learning, research, and application. This paper surveys a large collection of DMKD literature to provide a comprehensive picture of current DMKD research and classify these research activities into high-level categories using grounded theory approach; it also evaluates the longitudinal changes of DMKD research activities during the last decade.
A short 5-page version of this paper appeared previously at the IEEE ICDM workshops, 18–22 December 2006.
I. Geist , Proc. 2002 ACM Symposium on Applied Computing (SAC) (2002) pp. 508–513. M. Pechenizkiy , S. Puuronen and A. Tsymbal , The iterative and interactive data mining process: The information systems development and knowledge management perspectives, Proc. ICDM'04 Foundations of Data Mining Workshop (2004) pp. 139–146. Y. Yao , L. Zhong and Y. Zhao , A three-layered conceptual framework of data mining, Proc. Foundations of Data Mining Workshop, the 4th IEEE Int. Conf. on Data Mining (ICDM) (IEEE Computer Society Press, 2004) pp. 215–221.
J. B. Macqueen , Some methods for classification and analysis of multivariate observations, Proc. 5th Berkeley Symposium on Mathematical Statistics and Probability1 (University of California Press, Berkeley, 1967) pp. 281–297.
B. Dick, Grounded theory: A thumbnail sketch (2005).
Y. Peng , Recent trends in data mining (DM): Document clustering of DM publications , Proc. 3rd IEEE/SSSM 2006 Int. Conf. Service Systems and Service Management . S. A. Macskassy , Human performance on clustering Web pages: A preliminary study, Proc. Fourth Int. Conf. Knowledge Discovery and Data Mining (KDD) (1998) pp. 264–268.
