A DESCRIPTIVE FRAMEWORK FOR THE FIELD OF DATA MINING AND KNOWLEDGE DISCOVERY
Abstract
Despite the rapid development, the field of data mining and knowledge discovery (DMKD) is still vaguely defined and lack of integrated descriptions. This situation causes difficulties in teaching, learning, research, and application. This paper surveys a large collection of DMKD literature to provide a comprehensive picture of current DMKD research and classify these research activities into high-level categories using grounded theory approach; it also evaluates the longitudinal changes of DMKD research activities during the last decade.
A short 5-page version of this paper appeared previously at the IEEE ICDM workshops, 18–22 December 2006.
References
-
Kdnuggets , Meetings/Conf. in Data Mining, Knowledge Discovery, and Web Mining ( 2007 ) , http://www.kdnuggets.com/meetings/index.html . Google Scholar - Advances in Knowledge Discovery and Data Mining , eds.
U. M. Fayyad ( AAAI/MIT Press , Menlo Park, CA , 1996 ) . Google Scholar , - Int. J. Inform. Tech. Decis. Making 5(4), 585 (2006), DOI: 10.1142/S021962200600226X. Link, Web of Science, Google Scholar
-
J. H. Friedman , Data mining and statistics: What's the connection , Proc. 29th Symposium on the Interface between Computer Science and Statistics , ed.D. Scott ( 1997 ) . Google Scholar - Commun. ACM 39(11), 27 (1996), DOI: 10.1145/240455.240464. Crossref, Web of Science, Google Scholar
- Data Mining Knowledge Discov. 2(4), 311 (1998), DOI: 10.1023/A:1009726428407. Crossref, Web of Science, Google Scholar
I. Geist , Proc. 2002 ACM Symposium on Applied Computing (SAC) (2002) pp. 508–513. Google ScholarM. Pechenizkiy , S. Puuronen and A. Tsymbal , The iterative and interactive data mining process: The information systems development and knowledge management perspectives, Proc. ICDM'04 Foundations of Data Mining Workshop (2004) pp. 139–146. Google ScholarY. Yao , L. Zhong and Y. Zhao , A three-layered conceptual framework of data mining, Proc. Foundations of Data Mining Workshop, the 4th IEEE Int. Conf. on Data Mining (ICDM) (IEEE Computer Society Press, 2004) pp. 215–221. Google Scholar-
B. G. Glaser and A. L. Strauss , The Discovery of Grounded Theory: Strategies for Qualitative Research ( Aldine Publishing Company , New York, NY , 1967 ) . Google Scholar -
A. L. Strauss and J. M. Corbin , Basics of Qualitative Research: Grounded Theory Procedures and Techniques ( Sage Publications , Newbury Park, CA , 1990 ) . Google Scholar - Admin. Sci. Quart. 24, 602 (1979), DOI: 10.2307/2392366. Crossref, Web of Science, Google Scholar
- J. Appl. Behav. Sci. 22(2), 141 (1986), DOI: 10.1177/002188638602200207. Crossref, Web of Science, Google Scholar
- Acad. Manag. J. 33(2), 334 (1990), DOI: 10.2307/256328. Crossref, Web of Science, Google Scholar
- MIS Quart. 17(3), 309 (1993), DOI: 10.2307/249774. Crossref, Web of Science, Google Scholar
- Issues Comp. Pediatr. Nurs. 19(1), 1 (1996), DOI: 10.3109/01460869609026851. Crossref, Google Scholar
- Account. Management Inform. Tech. 9(1), 1 (1999), DOI: 10.1016/S0959-8022(98)00017-4. Crossref, Google Scholar
- Health Policy 73(3), 237 (2005), DOI: 10.1016/j.healthpol.2004.11.013. Crossref, Web of Science, Google Scholar
- Database Adv. Inform. Syst. 32(2), 46 (2001). Crossref, Google Scholar
J. B. Macqueen , Some methods for classification and analysis of multivariate observations, Proc. 5th Berkeley Symposium on Mathematical Statistics and Probability1 (University of California Press, Berkeley, 1967) pp. 281–297. Google Scholar- CLUTO, Karypic Lab, Department of Computer Science and Engineering at the University of Minnesota (2003). Available online at http://www-users.cs.umn.edu/~karypis/cluto/index.html . Google Scholar
-
U. M. Fayyad (eds.) , Advances in Knowledge Discovery and Data Mining ( AAAI/MIT Press , Menlo Park, CA , 1996 ) . Google Scholar - Data Base Adv. Inform. Syst. 32(1), 38 (2001). Crossref, Google Scholar
- IEEE Trans. Knowledge Data Eng. 17(8), 1138 (2005), DOI: 10.1109/TKDE.2005.129. Crossref, Web of Science, Google Scholar
- Data Mining Knowledge Discov. 8(2), 171 (2004), DOI: 10.1023/B:DAMI.0000015870.80026.6a. Crossref, Web of Science, Google Scholar
R. Agrawal , T. Imielinski and A. Swami , Mining association rules between sets of items in large databases, Proc. 1993 ACM SIGMOD Int. Conf. Management of Data (1993) pp. 207–216. Google Scholar- Int. J. Inform. Technol. Decision Making 1, 131 (2002), DOI: 10.1142/S0219622002000038. Link, Google Scholar
- Int. J. Inform. Tech. Decis. Making 4(4), (2005). Google Scholar
- ACM SIGKDD Explor. Newslett. 5(1), 1 (2003), DOI: 10.1145/959242.959245. Crossref, Google Scholar
H. Mannila , Inductive databases, Proc. 9th Int. Workshop on Inductive Logic Programming,Lecture Notes in Artificial Intelligence 1634 (Springer-Verlag, 1999) p. 14. Google ScholarR. Agrawal and G. Psaila , Active data mining, Proc. 1st Int. Conf. Knowledge Discovery in Databases and Data Mining (1995) pp. 3–8. Google Scholar- Commun. ACM 39(11), 24 (1996), DOI: 10.1145/240455.240463. Crossref, Web of Science, Google Scholar
- B. Dick, Grounded theory: A thumbnail sketch (2005). Available at http://www.scu.edu.au/schools/gcm/ar/arp/grounded.html . Google Scholar
- MIS Quart. 4(2), 1 (1980), DOI: 10.2307/249333. Crossref, Web of Science, Google Scholar
- ACM SIGKDD Explor. Newslett. 5(1), 84 (2003), DOI: 10.1145/959242.959253. Crossref, Google Scholar
- SIGKDD Explor. Newslett. 4(2), 118 (2002), DOI: 10.1145/772862.772886. Crossref, Google Scholar
R. Agrawal and R. Srikant , Privacy-preserving data mining, Proc. ACM SIGMOD Conf. Management of Data (2000) pp. 439–450. Google Scholar-
N. K. Denzin , The Research Act , 2nd edn. ( McGraw-Hill , New York , 1978 ) . Google Scholar -
T. B. Ho , D. Cheung and H. Liu (eds.) , Proc. 9th Pacific-Asia Conf.: Advances in Knowledge Discovery and Data Mining 3518 ( Springer, Heidelberg Publisher , Berlin ) . Google Scholar L. Si and R. Jin , Adjusting mixture weights of gaussian mixture model via regularized probabilistic latent semantic analysis, Proc. 9th Pacific-Asia Conf. (2005) pp. 622–631. Google Scholar-
J. Kogan , C. Nicholas and M. Teboulle , Clustering large and high dimensional data , Tutorial of ACM Fourteenth Conf. Information and Knowledge Management (CIKM) ( 2003 ) . Google Scholar - ACM Comput. Surv. 31(3), 264 (1999), DOI: 10.1145/331499.331504. Crossref, Web of Science, Google Scholar
- Mach. Learning 55(3), 311 (2004), DOI: 10.1023/B:MACH.0000027785.44527.d6. Crossref, Web of Science, Google Scholar
-
Y. Peng , Recent trends in data mining (DM): Document clustering of DM publications , Proc. 3rd IEEE/SSSM 2006 Int. Conf. Service Systems and Service Management . Google Scholar S. A. Macskassy , Human performance on clustering Web pages: A preliminary study, Proc. Fourth Int. Conf. Knowledge Discovery and Data Mining (KDD) (1998) pp. 264–268. Google Scholar-
Y. Peng , A hybrid strategy for clustering data mining documents , Proc. Workshops on the Sixth IEEE Int. Conf. Data Mining (ICDM) . Google Scholar - Commun. ACM 18(11), 613 (1975), DOI: 10.1145/361219.361220. Crossref, Web of Science, Google Scholar
- Microsoft SQL Server, (Microsoft Corporation, Redmond, WA, 2005) . Google Scholar
- Program 14(3), 130 (1980). Crossref, Google Scholar
- Y. Zhao and G. Karypis, Criterion functions for document clustering: Experiments and analysis, Technical Report TR #01–40, Department of Computer Science, University of Minnesota, Minneapolis, MN (2001). Available online at: http://citeseer.ist.psu.edu/article/zhao02criterion.html . Google Scholar
J. G. Conrad , Effective document clustering for large heterogeneous law firm collections, Proc. Int. Conf. Artificial Intelligence and Law (ICAIL) (2005) pp. 177–187. Google Scholar-
J. Han and M. Kamber , Data Mining: Concepts and Techniques ( Morgan Kaufmann Publishers , Los Altos, CA , 2000 ) . Google Scholar - Int. J. Inform. Technol. Decision Making 5(4), 703 (2006), DOI: 10.1142/S0219622006002271. Link, Web of Science, Google Scholar
- Int. J. Inform. Tech. Decis. Making 5(4), 597 (2006), DOI: 10.1142/S0219622006002258. Link, Web of Science, Google Scholar
- Functional Genomics: Methods and Protocols , eds.
M. Brownstein , A. Khodursky and D. Conniffe ( Humana Press , 2003 ) , http://citeseer.ist.psu.edu/zhao03clustering.html . Google Scholar , -
J. Han and M. Kamber , Data Mining: Concepts and Techniques , 2nd edn. ( Morgan Kaufmann Publishers , USA , 2006 ) . Google Scholar W. Lin , M. A. Orgun and G. J. Williams , An overview of temporal data mining, Proc. 1st Australian Data Mining Workshop (ADM02), eds.S. J. Simoff , G. J. Williams and M. Hegland (2002) pp. 83–90. Google Scholar- ACM SIGKDD Explorations Newsletter 2(1), 1 (2000), DOI: 10.1145/360402.360406. Crossref, Google Scholar
- KDnuggets, 2005 polls on Data Mining Textbooks. Retrieved on December 8, 2005, from http://www.kdnuggets.com/polls/2005/data_mining_textbooks.htm . Google Scholar
- University of California, Irvine (UCI) machine learning repository (2006). Retrieved May 8, 2006, from http://www.ics.uci.edu/~mlearn/MLRepository.html . Google Scholar
- University of California, Irvine (UCI) KDD Archive (2006). Retrieved May 8, 2006, from http://kdd.ics.uci.edu/ . Google Scholar