PERFORMANCE EVALUATION OF IMPUTATION METHODS FOR INCOMPLETE DATASETS
Abstract
In this study, we compare the performance of four different imputation strategies ranging from the commonly used Listwise Deletion to model based approaches such as the Maximum Likelihood on enhancing completeness in incomplete software project data sets. We evaluate the impact of each of these methods by implementing them on six different real-time software project data sets which are classified into different categories based on their inherent properties. The reliability of the constructed data sets using these techniques are further tested by building prediction models using stepwise regression. The experimental results are noted and the findings are finally discussed.
References
- K. Strike, K. E. Emam, and N. Madhavji, Software Cost Estimation with Incomplete Data, ERB-1071 NRC, http://wwwsel.iit.nrc.ca/~elemam/documents/1071.pdf, also to appear in IEEE Trans. Software Eng . Google Scholar
- Personnel Psychology 47, 537 (1994), DOI: 10.1111/j.1744-6570.1994.tb01736.x. Crossref, Web of Science, Google Scholar
-
R. J. A. Little and D. B. Rubin , Statistical Analysis with Missing Data ( John Wiley , New York , 2002 ) . Crossref, Google Scholar M. Berry and M. F. Vanderbroek , A targeted assessment of the software measurement process, Proc. IEEE Seventh Int. Software Metrics Symp. (2001) pp. 222–235, DOI: 10.1109/METRIC.2001.915531. Google Scholar-
B. W. Boehm , Software Engineering Economics ( Prentice Hall , 1981 ) . Google Scholar -
T. DeMarco , Controlling Software Projects: Management, Measurement, and Estimates ( Prentice-Hall , New York , 1982 ) . Google Scholar - IEEE Trans. on Software Engineering 27(11), 999 (2001), DOI: 10.1109/32.965340. Crossref, Web of Science, Google Scholar
- IEEE Trans. on Software Engineering 27(11), 1014 (2001), DOI: 10.1109/32.965339. Web of Science, Google Scholar
- Psychological Methods 7(2), 147 (2002), DOI: 10.1037/1082-989X.7.2.147. Crossref, Web of Science, Google Scholar
- Biometrika 63, 581 (1976), DOI: 10.1093/biomet/63.3.581. Crossref, Web of Science, Google Scholar
B. Cox and R. Folsom , An empirical investigation of alternate item nonresponse adjustments, Proc. Section on Survey Research Methods (1978) pp. 219–223. Google Scholar-
J. Kaiser , The effectiveness of hot-deck procedures in small samples , Proc. Ann. Meeting of the Am. Statistical Assoc. ( 1983 ) . Google Scholar - Multiple Linear Regression Viewpoints 25, 13 (1998). Google Scholar
- Educational and Psychological Measurement 54(3), 573 (1994), DOI: 10.1177/0013164494054003001. Crossref, Web of Science, Google Scholar
- Structural Equation Modeling 1(4), 287 (1994). Crossref, Web of Science, Google Scholar
- Education and Psychological Measurement 47, 13 (1987), DOI: 10.1177/0013164487471002. Crossref, Web of Science, Google Scholar
B. Ford , Missing Data Procedures: A Comparative Study, Proc. Social Statistics Section (1976) pp. 324–329. Google Scholar- Psychometrika 48(2), 269 (1983), DOI: 10.1007/BF02294022. Crossref, Web of Science, Google Scholar
- British J. Math. and Statistical Psychology 43, 145 (1990). Crossref, Web of Science, Google Scholar
- IEEE Trans. Software Eng. 26(6), 541 (2000), DOI: 10.1109/32.852742. Crossref, Web of Science, Google Scholar
- Q. Song and M. Shepperd, A Short Note on Using Multiple Imputation Techniques for Very Small Data Sets, Technical Report, Empirical Software Engineering Research Group, Bournemouth University, UK, April 2003 . Google Scholar
- Qinbao Song, Martin Shepperd, Michelle Cartwright and Bheki Twala, A New Imputation Method for Small Software Project Data Sets, Technical Report, Empirical Software Engineering Research Group, Bournemouth University, UK, May 2004 . Google Scholar
-
J. L. Schafer , Analysis of Incomplete Multivariate Data ( Chapman and Hall , Boca Raton , 1997 ) . Crossref, Google Scholar -
Salton , Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer ( Addison-Wesley , 1989 ) . Google Scholar -
P. D. Allison , Missing Data, Quantitative Applications in the Social Sciences 136 ( SAGE Publications , 2002 ) . Google Scholar - M. Shepperd and M. Cartwright, Dealing with Missing Software Project Data, November 2002, Technical Report, Empirical Software Engineering Research Group, Bournemouth University, UK . Google Scholar
M. Colledge , Large scale imputation of survey data, Proc. Section on Survey Research Methods (1978) pp. 431–436. Google Scholar- Bioinformatics 17, 520 (2001), DOI: 10.1093/bioinformatics/17.6.520. Crossref, Web of Science, Google Scholar
- incomplete data in sample surveys, theory and bibliographies 2 , eds.
W. Madow , I. Olkin and D. Rubin ( Academic Press , 1983 ) . Google Scholar , -
I. Sande , Hot-Deck Imputation Procedures , Proc. Symp. Incomplete Data in Sample Surveys 3 , eds.W. Madow and I. Olkin ( 1983 ) . Google Scholar - J. Am. Statistical Assoc. 52, 200 (1957), DOI: 10.2307/2280845. Crossref, Web of Science, Google Scholar
- Psychometrika 49, 155 (1984), DOI: 10.1007/BF02294170. Crossref, Web of Science, Google Scholar
-
M. C. Neal , Mx: Statistical Modeling , 2nd edn. ( 1994 ) . Google Scholar - Advanced Structural Equation Modeling, eds.
G. A. Marcoulides and R. E. Schumacker (Lawrence Erlbaum, Mahwah, NJ, 1996) pp. 243–277. Google Scholar , - Structural Equation Modeling 8, 128 (2001), DOI: 10.1207/S15328007SEM0801_7. Crossref, Web of Science, Google Scholar