The Development of Iterated Weighted Jacknife Method and Regression (IWJR) for Estimating Missing Data

Authors

  • จำลอง วงษ์ประเสริฐ
  • บุญชม ศรีสะอาด

Keywords:

ข้อมูลสูญหาย ข้อมูลสูญหายแบบสุ่มอย่างสมบูรณ์ การสุ่มตัวอย่างแบบง่าย

Abstract

The purpose of this study were first to develop the iterated weighted Jacknife method and regression (IWJR) for missing data estimation, and secondly to compare its efficiency of estimation population mean, population variance, and population corretlion and statistical power of a test under missing complete at random (MCAR) and simple random sampling with another four well-defined methods, namely; Listwise deletion (LD), Mean imputation (MI) and Regression imputation (RI). By using simulation and secondary data, the comparisons were made with the following conditions:
i) three sample size (100, 200 and 500) ii) three level of correlation of variables (low  = .3 moderate  = .5 and high  = .7), and iii) four level of percentage of missing data (5%, 10%, 15% and 20%). Moreover, with a simulation data, a study of interaction among sample size, correlation of variables, number of missing data and missing data methods were also performed, at significant level .05.

Hence, the studies are summaried by the following items:

  1. Used simulation data 1) when classified according to sample size. Under the population mean estimation i) 100 and 200 samples RI and IWJR method performed most effectively, and
    ii) 500 samples LD and IWJR performed most effectively. Under the population variance and population correlation estimation i) 100 samples IWJR method performed most effectively, and ii) 200 and 500 samples LD and IWJR performed most effectively. 2) When classified according to correlation of variables. Under the population mean estimation i) low correlation MI RI and IWJR method performed most effectively, and ii) moderate and high correlation, RI and IWJR performed most effectively. Under the population variance estimation low, moderate and high correlation LD and IWJR performed most effectively. Under the population correlation estimation i) low correlation IWJR method performed most effectively, ii) moderate correlation, LD and IWJR performed most effectively, and iii) high correlation, MI RI and IWJR performed most effectively. 3) When classified according to percentage of missing data. Under the population mean estimation i) 5% of missing data, IWJR performed most effectively, ii) 10% of missing data, RI and IWJR performed most effectively, and iii) 15% and 20% of missing data, LD MI RI and IWJR performed most effectively. Under the population variance estimation i) 5% and 10% of missing data, IWJR performed most effectively,
    ii) 15% of missing data, LD and IWJR performed most effectively, and iii) 20% of missing data, LD and IWJR performed most effectively. Under the population correlation estimation i) 5% and 10% of missing data, IWJR performed most effectively, and ii) 15% and 20% of missing data, LD and IWJR performed most effectively.
  2. There was non-significant in the statistical power testing of all five methods of missing data.
  3. Three-way interactions were found: under the population mean estimation i) sample size, correlation of variable, and percentage of missing data and ii) correlation of variable percentage of missing data, and methods of missing. Under the population variance estimation i) sample size, correlation of variables, and percentage of missing data ii) correlation of variables, percentage of missing data, and methods of missing data and iii) sample size, percentage of missing data, methods missing data. Under the population correlation estimation i) sample size, correlation of variables, and percentage of missing data ii) sample size, correlation of variable, and methods of missing and iii) correlation of variable, percentage of missing data, and methods of missing data.
  4. Under the different estimations of population mean, variance, and correlation, the IWJR method demonstrated correspondingly well in both simulation and primary data, and presented robustness within the following conditions: i) the sample size, ii) the correlation of variable, and iii) the percentage of missing data.

References

เชาว์ อินใย. (2547). การพัฒนาวิธีการจัดการข้อมูลสูญหายแบบอีพีเอสเอสอีและการตรวจสอบความแม่นยำและอำนาจการทดสอบเปรียบเทียบกับวิธีอีเอ็มและลิสท์ไวท์:เทคนิคมอนติคาร์โล. วิทยานิพนธ์ปริญญาเอก. พิษณุโลก : มหาวิทยาลัยนเรศวร.
เชาว์ อินใย. (2552). การพัฒนาวิธีการจัดการข้อมูลสูญหายแบบอีพีเอาร์และการตรวจสอบความแม่นยำและอำนาจการทดสอบเปรียบเทียบกับวิธีอีเอ็มและลิสท์ไวท์:เทคนิคมอนติคาร์โล. เลย : มหาวิทยาลัยราชภัฏเลย.
ปรีชา วิจิตรธรรมรส. (2542). ตัวประมาณแจ็คไนฟ์. วารสารพัฒนบริหารศาสตร์, ปีที่ 39(ฉบับที่ 3), กรุงเทพฯ : สถาบันบัณฑิตพัฒนบริหารศาสตร์. ก.ค.-ก.ย. 2542 : หน้า 13-21.
Adam, Carlson. (2001). Data Mining: Finding Nuggets of Knowledge in Mountains of Data. Northwest Science & Technology, Autumn, 24-25.
Beale, E. M. L., & Little, R. J. A. (1975). Missing values in multivariate analysis. Journal of the Royal Statistical Society, 37, 129-145, B.
Brockmeier, L. L., Kromrey, J. D. & Hines, C. V. (1998). Systematically missing data and multiple regression analysis: An empirical comparison of deletion and imputation techniques. Multiple Linear Regression Viewpoints, 25, 20-39.
Chaimongkol , W. (2004). Three composite imputation methods for item nonresponse estimation in sample surveys. Doctor’s Thesis. Bangkok :National Institute of Development Administration.
Chan, L. S., Gilman, J. A., & Dunn, O. J. (1976). Alternative approaches to missing values in discriminant analysis. Journal of the American Statistical Association, 71, 842-844.
Draper, Norman R., Smith Harry. (1998). Applied Regression Analysis. 3rd ed. John Willey & Sons, Inc. NY.
Frane, J.W. (1976). Some simple procedures for handling missing data in multivariate analysis. Psychometrika, 41, 409–415.
Furlow CF, et al. (2007). A Monte Carlo study of the impact of missing data and differential item functioning on theta estimates from two polytomous Rasch family models. Journal of Applied Measurement, 8(4), 388-403.
Gleason, T. C. & Staelin, R. (1975). A proposal for handling missing data. Psvchometrika, 40,
229-252.
Hank John E., Reitsch Arthur G. and Wichern Dean W. (2001). Business Forecasting, 7th ed.
New Jersey. Prentice Hall.
Hegamin-Younger, C. & Forsyth, R. (1998). A comparison of four imputation procedures in a two-variable prediction system. Educational and Psychological Measurement, 58(2),
197-210.
Huisman, M. (1998). Item Nonrespons : Occurrence cause, and Imputation of Missing Answers to Test Item. DSWO Press, Lieden University, The Netherlands.
Landerman LR, Land KC, Pieper CF. (1997). An empirical evaluation of the predictive mean matching method for imputing missing values. Sociological Methods and Research, 26(1), 3–33.
Little, R. J. A. & Schenker, N. (1995). Missing data. In G. Arminger, C. C. Clogg, & M.E. Sobel (Eds.), Handbook of statistical modeling for the social and behavioral sciences. New York.
Little, R. J. A. (1976). Inference about means from incomplete multivariate data. Biometrika, 63, 593-604.
Little, R. J. A., & Rubin, D. B. (1987). Statistical Analysis with Missing Data. New York : John Wiley & Sons.
Peng, C.-Y. J., et al. (2006). Advances in missing data methods and implications for educational research In Sawilowsky, S. (eds). Real data analysis. Greenwich, CT., Information Age Publishing Inc. 31-78.
Quenonuille, M. H. (1956). Notes on Bias in Estimation. Biometrika. 43, 353-360.
Raymond, M. R., & Roberts, D. M. (1987). A comparison of methods for treating incomplete data in selection research. Educational and Psychological Measurement, 47, 13-26.
Raymond, M.R. (1986). Missing Data in Evaluation Research. Eval Health Prof. 9(4), 395-420.
Richard J. R. and Marx, Morris L. (1986). An Introduction to Mathematical Statistics and Its Applications, New Jersey. Prentice-Hall.
Robitszsch, A. and Rupp, A. A. (2009). Impact of Missing Data on the Detection of Differential Item Functioning: The Case of Mantel-Haenszel and Logistic Regression Analysis. Educational and Psychological Measurement, 69(1), 18-34.
Roth, P.L. (1994). Missing Data: A Conceptual Review for Applied Psychology. Journal of Personal Psychology, 47, 537-560.
Rovine, M. J., & Delaney, M. (1990). Missing data estimation in developmental research. In A. Von Eye (Ed.), Statistical methods in longitudinal research, Stanford: Academic Press, 1, 35–79.
Suat SAHINLER and Derviz TOPUZ. (2007). Bootstrap and Jackknife Resampling Algorithms for Estimation of Regression Parameters. Journal of Applied Qualitative Methods, 2(2), Summer 2007, 188-199.
Timm, N. H. (1970). The estimation of variance-covariance and correlation matrices from incomplete data. Psychometrika. 35(4), 417-437.
Viragoontavan, S. (2000). Comparing Six Missing Data Methods within the Discriminant Analysis Context: A Monte Carlo Study. Doctor’s Thesis. Ohio : The Ohio State University.


Wang, Betty Lu-Ti. (2000). Imputation Methods for missing Data in Growth Curve Models. Doctor’s Thesis. California : University of Southern California. Dissertation Abstract International. < http://proquest.umi.com/pqdweb?did =728849541 &sid=2&Fmt= 2&clientId=73599&RQT=309&VName=PQD> October, 13 2009.
Yu, Chong Ho. (2003). Resampling methods: concepts, applications, and justification. Practical Assessment, Research & Evaluation, 8(19). http://pareonline.net/getvn.asp? v=8&n=19 October, 13 2009.
Zhang, B. and Walker, C.M. (2008). Impact of Missing Data on Person Model Fit and Person Trait Estimation. Applied Psychological Measurement, 32(8), 466-479.

Downloads

Published

2021-08-27

How to Cite

วงษ์ประเสริฐ จ., & ศรีสะอาด บ. (2021). The Development of Iterated Weighted Jacknife Method and Regression (IWJR) for Estimating Missing Data. Ubon Ratchathani Journal of Research and Evaluation, 1(1), 143–154. Retrieved from https://so06.tci-thaijo.org/index.php/ubonreseva/article/view/250804