A Comparison of Imputation Methods in Estimating PM2.5 Concentrations Using LSTM Neural Networks
Keywords:
Imputation, PM2.5, Neural Network, LSTMAbstract
Missing data is one of the common problems in data collection for research purposes. It often affects the accuracy and reliability of the data analysis results, and the findings and conclusions from the study. Methods for handling incomplete data are deletion or imputation by replacing missing values with estimates. In this study, we propose imputation methods for PM2.5 concentration prediction using long short-term memory (LSTM) neural networks. There are four approaches, namely mean, linear interpolation (LI), k-nearest neighbors (KNN) and multiple imputation by chained equations (MICE). The root mean square error (RMSE) and mean absolute error (MAE) were used to evaluate the accuracy of PM2.5 predictions, which is an indicator to measure the effectiveness of different imputation methods. Each imputed dataset was applied to four LSTM forecast models. It showed that the KNN (k=11) and MICE methods provided the least errors and therefore were the best imputation techniques. Both gave better results compared to Mean and LI for filling in missing observations. Model 2 had the most accurate forecast among all the data imputation methods.
References
World Health Organization, Switzerland (n.d.)., Air Pollution, URL: https://www.who.int/health-topics/air- pollution#tab=tab_1, accessed on 1/05/2022.
กรมควบคุมมลพิษ, กระทรวงทรัพยากรธรรมชาติและสิ่งแวดล้อม, 2563, รายงานสถานการณ์มลพิษของประเทศไทยปี 2562.
คณะกรรมการสิ่งแวดล้อมแห่งชาติ, 2553, เรื่องกำหนดมาตรฐานฝุ่นละอองขนาดไม่เกิน 2.5 ไมครอนในบรรยากาศโดยทั่วไป, [ระบบออนไลน์], แหล่งที่มา https://www.pcd.go.th/laws/ประกาศคณะกรรมการสิ่งแวดล้อม, เข้าดูเมื่อวันที่ 5 พฤษภาคม 2564.
ชัยยศ ยงค์เจริญชัย, 2562, ฝุ่น: PM2.5 ในเชียงใหม่ขึ้นสูงแตะอันดับหนึ่งของโลก, บีบีซีไทย, [ระบบออนไลน์], แหล่งที่มา https://www.bbc.com/thai/47534868, เข้าดูเมื่อวันที่ 20 เมษายน 2564.
กรีนพีซ, 2565, ความเหลื่อมล้ำใต้ท้องฟ้าเดียวกัน: เมื่อต่างจังหวัดฝุ่นเยอะพอๆ กับกรุงเทพฯ แต่เข้าถึงเครื่องวัดคุณภาพอากาศได้ยากกว่า, [ระบบออนไลน์], แหล่งที่มา https://www.greenpeace.org/thailand/story/25299/climate-airpollution-inquity-of-airpollutin, เข้าดูเมื่อวันที่ 30 มีนาคม 2566.
Ahn, H., Sun, K. and Kim, K.P., 2021, Comparison of Missing Data Imputation Methods in Time Series Forecasting, Computers, Materials and Continua, Vol. 70, September 2021, pp. 767–779.
Little, R.J.A. and Rubin, D.B., 2002, Statistical Analysis with Missing Data, 2nd edition, John Wiley & Sons, Hoboken.
Pollice, A. and Lasinio, G.J., 2009, Two Approaches to Imputation and Adjustment of Air Quality Data from a Composite Monitoring Network, Journal of Data Science, Vol. 7, January 2009, pp. 43–59.
Marwala, T., 2009, Computational Intelligence for Missing Data Imputation, Estimation and Management: Knowledge Optimization Techniques, Information Science Reference, Hershey.
Junger, W.L. and Leon, P.D., 2015, Imputation of Missing Data in Time Series for Air Pollutants, Atmospheric Environment, Vol. 102, February 2015, pp. 96–104.
Saeipourdizaj, K.P., Sarbakhsh, P. and Gholampour, A., 2021, Application of Imputation Methods for Missing Values of PM10 and O3 Data: Interpolation, Moving Average and K-nearest Neighbor Methods, Environmental Health Engineering and Management Journal, Vol. 8, September 2021, pp. 215–226.
Rumaling, M.I., Chee, F.P., Dayou, J. and Chang, J., 2020, Missing Value Imputation for PM10 Concentration in Sabah Using Nearest Neighbour Method (NNM) and Expectation-Maximization (EM) Algorithm, Asian Journal of Atmospheric Environment, Vol. 14, March 2020, pp. 62–72.
Wijesekara, W.M.L.K.N. and Liyanage, L., 2020, Comparison of Imputation Methods for Missing Values in Air Pollution Data: Case Study on Sydney Air Quality Index, In: Arai, K., Kapoor, S. and Bhatia, R. (eds) Advances in Information and Communication, FICC 2020, Advances in Intelligent Systems and Computing, Vol. 1130, Springer, Cham.
Zainuri, N.A., Jemain, A.A. and Muda, N., 2015, A Comparison of Various Imputation Methods for Missing Values in Air Quality Data, Sains Malaysiana, Vol. 44, March 2015, pp. 449–456.
Donders, A.R., Van Der Heijden, G.J., Stijnen, T., and Moons, K.G., 2006, A Gentle Introduction to Imputation of Missing Values. Journal of Clinical Epidemiology, Vol. 59, October 2006, pp. 1087–1091.
Quinteros, M.E, Lu, S., Blazquez, C., Cárdenas-R, J.P., Ossa, X., Delgado-Saborit, J.M., Harrison, R.M. and Ruiz-Rudolph, P., 2019, Use of Data Imputation Tools to Reconstruct Incomplete Air Quality Datasets: A Case-Study in Temuco, Chile. Atmospheric Environment, Vol. 200, March 2019, pp. 40–49.
Wilson, S., 2021, Miceforest: Fast Imputation with Random Forests in Python, Virginia, USA, URL: https:// morioh.com/p/e19cd87c66e3, Accessed on 10 May 2022.
McCulloch, W.S. and Pitts, W., 1943, A Logical Calculus of the Ideas Immanent in Nervous Activity, The Bulletin of Mathematical Biophysics, Vol. 5, December 1943, pp. 115–133.
Hochreiter, S. and Schmidhuber, J., 1997, Long Short-Term Memory, Neural Computation, Vol. 9, November 1997, pp. 1735–1780.
Olah, C., 2015, Understanding LSTM Networks, Anthropic, California, USA, URL: http://colah.github.io/ posts/2015-08-Understanding-LSTMs, Accessed on 10 May 2022.
McCulloch, W.S. and Pitts, W., 1943, A Logical Calculus of the Ideas Immanent in Nervous Activity, The Bulletin of Mathematical Biophysics, Vol. 5, December 1943, pp. 115–133.
Hochreiter, S. and Schmidhuber, J., 1997, Long Short-ธerm Memory, Neural Computation, Vol. 9, November 1997, pp. 1735–1780.
Olah, C., 2015, Understanding LSTM Networks, Anthropic, California, USA, URL: http://colah.github.io/ posts/2015-08-Understanding-LSTMs, Accessed on 10 May 2022.
Downloads
Published
Issue
Section
License
Copyright (c) 2023 Journal of Learning Innovation and Technology

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Publishing Ethics
- The editorial team reserves the right to consider articles that meet the full format and specifications only. If the article does not meet the editorial requirements, the editors have the right to refuse to publish.
- To request a letter of acceptance for publication, the editorial office is issued only if the article is ready to be published unconditionally.
- The peer review of the Journal of Learning Innovation and Technology is final. The article may not be published in the specified volumes until the article has been reviewed and is ready to be published.
- Research related to the ethics of human and animal research must be reviewed by the Institutional Review Board (IRB)
- The submitted articles must not have been published in any other publication before and must not be under consideration by other journals. Published articles are copyright of the JLIT.