DYNAMIC FORECASTING OF AIR POLLUTION IN DELHI ZONE USING MACHINE LEARNING ALGORITHM
Keywords:air pollution, machine learning, support vector machine, regression, classification
The issue of pollution in urban cities is a major problem these days especially in cities like the New Delhi is detected with more number of toxic gases in air, which has deduced the air quality of New Delhi. Thus, predictive analytics play a significant role in predicting the future instances of air quality based on the historical data. Forecasting the air quality of these cities is mandatory to overcome its consequences. Several machines learning algorithm is widely used these days to predict the future instances. Such as Random Forest, support vector machine, regression, classification, and so on. Main pollutants which present in the air are PM2.5, PM10, CO, NO2, SO2 and O3. In this paper we have focused mainly on Data set of New Delhi for predicting ambient air pollution and quality using several machine learning algorithm.
Al-Hadeethi, H., Abdulla, S., Diykh, M., Deo, R.C., Green, J.H. (2020): Adaptive boost LS-SVM classification approach for time-series signal classification in epileptic seizure diagnosis applications. – Expert Systems with Applications 161: 66p.
Appel, K.W., Gilliland, A.B., Sarwar, G., Gilliam, R.C. (2007): Evaluation of the Community Multiscale Air Quality (CMAQ) model version 4.5: sensitivities impacting model performance: part I-ozone. – Atmospheric Environment 41(40): 9603-9615.
Azid, A., Juahir, H., Toriman, M.E., Kamarudin, M.K.A., Saudi, A.S.M., Hasnam, C.N.C., Aziz, N.A.A., Azaman, F., Latif, M.T., Zainuddin, S.F.M., Osman, M.R. (2014): Prediction of the level of air pollution using principal component analysis and artificial neural network techniques: A case study in Malaysia. – Water, Air, & Soil Pollution 225(8): 1-14.
Baawain, M.S., Al-Serihi, A.S. (2014): Systematic approach for the prediction of ground-level air pollution (around an industrial port) using an artificial neural network. – Aerosol and Air Quality Research 14(1): 124-134.
Bai, Y., Li, Y., Wang, X., Xie, J., Li, C. (2016): Air pollutants concentrations forecasting using back propagation neural network based on wavelet decomposition with meteorological conditions. – Atmospheric pollution research 7(3): 557-566.
Bian, G., Liu, J., & Lin, W. (2017). Internet traffic forecasting using boosting LSTM method. – DEStech Transactions on Computer Science and Engineering (csae) 11p.
Box, G.E.P., Jenkins, G.M., Reinsel, G.C., Ljung, G.M. (2015): Time series analysis: Forecasting and control. – Wiley 712p.
Breiman, L. (2001): Random forests. – Machine learning 45(1): 5-32.
Breiman, L. (1996): Bagging predictors. – Machine learning 24(2): 123-140.
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J. (1984): Classification and regression trees. – Wadsworth International Group, Belmont, California 35p.
Central Pollution Control Board (2018): National Air Quality Index. – Ministry of Environment, Forests & Climate Change, Government of India. Available on:
Chan, J.C.W., Paelinckx, D. (2008): Evaluation of Random Forest and Adaboost tree-based ensemble classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery. – Remote Sensing of Environment 112(6): 2999-3011.
Chen, T., Guestrin, C. (2016): Xgboost: A scalable tree boosting system. – In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining 10p.
Chiwewe, T.M., Ditsela, J. (2016): Machine learning based estimation of Ozone using spatio-temporal data from air quality monitoring stations. In IEEE 14th International Conference on Industrial Informatics (INDIN) 6p.
Chowdhury, S., Dey, S., Di Girolamo, L., Smith, K.R., Pillarisetti, A., Lyapustin, A. (2019): Tracking ambient PM2. 5 build-up in Delhi national capital region during the dry season over 15 years using a high-resolution (1 km) satellite aerosol dataset. – Atmospheric Environment 204: 142-150.
Djalalova, I., Delle Monache, L., Wilczak, J. (2015): PM2. 5 analog forecast and Kalman filter post-processing for the Community Multiscale Air Quality (CMAQ) model. – Atmospheric Environment 108: 76-87.
Freund, Y., Schapire, R.E. (1997): A decision-theoretic generalization of on-line learning and an application to boosting. – Journal of Computer and System Sciences 55(1): 119-139.
Ge, S., Wang, S., Xu, Q., Ho, T. (2018): Study on regional air quality impact from a chemical plant emergency shutdown. – Chemosphere 201: 655-666.
Geurts, P., Ernst, D., Wehenkel, L. (2006): Extremely randomized trees. – Machine learning 63(1): 3-42.
Guerreiro, C.B., Foltescu, V., De Leeuw, F. (2014): Air quality status and trends in Europe. – Atmospheric environment 98: 376-384.
Guo, Y., Tang, Q., Gong, D.Y., Zhang, Z. (2017): Estimating ground-level PM2. 5 concentrations in Beijing using a satellite-based geographically and temporally weighted regression model. – Remote Sensing of Environment 198: 140-149.
Hu, K., Rahman, A., Bhrugubanda, H., Sivaraman, V. (2017): HazeEst: Machine learning based metropolitan air pollution estimation from fixed and mobile sensors. – IEEE Sensors Journal 17(11): 3517-3525.
Hu, K., Sivaraman, V., Bhrugubanda, H., Kang, S., Rahman, A. (2016): SVR based dense air pollution estimation model using static and wireless sensor network. – In IEEE SENSORS 3p.
Huang, M., Zhang, T., Wang, J.Y., Zhu, L. (2015): A new air quality forecasting model using data mining and artificial neural network. – IEEE International Conference on Software Engineering and Service Science (ICSESS) 4p.
John, V., Liu, Z., Guo, C., Mita, S., Kidono, K. (2015): Real-time lane estimation using deep features and extra trees regression. – In Image and Video Technology 13p.
Kleine Deters, J., Zalakeviciute, R., Gonzalez, M., Rybarczyk, Y. (2017): Modeling PM2. 5 urban pollution using machine learning and selected meteorological parameters. – Journal of Electrical and Computer Engineering 15p.
Kumar, A., Goyal, P. (2013): Forecasting of air quality index in Delhi using neural network based on principal component analysis. – Pure and Applied Geophysics 170(4): 711-722.
Li, Y., Bao, T., Gong, J., Shu, X., Zhang, K. (2020a): The prediction of dam displacement time series using STL, extra-trees, and stacked LSTM neural network. – IEEE 8:12p.
Li, L., Dai, S., Cao, Z., Hong, J., Jiang, S., Yang, K. (2020b): Using improved gradient-boosted decision tree algorithm based on Kalman filter (GBDT-KF) in time series prediction. – The Journal of Supercomputing 14p.
Li, C., Hsu, N.C., Tsay, S.C. (2011): A study on the potential applications of satellite data in air quality monitoring and forecasting. – Atmospheric Environment 45(22): 3663-3675.
Michanowicz, D.R., Shmool, J.L., Tunno, B.J., Tripathy, S., Gillooly, S., Kinnee, E., Clougherty, J.E. (2016): A hybrid land use regression/AERMOD model for predicting intra-urban variation in PM2. 5. – Atmospheric environment 131: 307-315.
Mihalache, S.F., Popescu, M., Oprea, M. (2015): Particulate matter prediction using ANFIS modelling techniques. – In 19th International Conference on System Theory, Control and Computing (ICSTCC) 5p.
Peng, H., Lima, A.R., Teakles, A., Jin, J., Cannon, A.J., Hsieh, W.W. (2017): Evaluating hourly air quality forecasting in Canada with nonlinear updatable machine learning methods. – Air Quality, Atmosphere & Health 10(2): 195-211.
Petersen, W.B. (1980): User’s guide for HIWAY-2: A highway air pollution model. – Environmental Protection Agency 84p.
Planning Department of Delhi (2012): Economic Survey of Delhi 2014-2015. – The Government of NCT of Delhi. Available on:
Ribeiro, M.H.D.M., dos Santos Coelho, L. (2020): Ensemble approach based on bagging, boosting and stacking for short-term prediction in agribusiness time series. – Applied Soft Computing 86: 33p.
Saxena, S., Mathur, A.K. (2017): Prediction of Respirable Particulate Matter (PM10) concentration using artificial neural network in Kota city. – Asian Journal For Convergence In Technology (AJCT) 3(3): 7p.
Sharma, N., Taneja, S., Sagar, V., Bhatt, A. (2018): Forecasting air pollution load in Delhi using data analysis tools. – Procedia computer science 132: 1077-1085.
Sinnott, R.O., Guan, Z. (2018): Prediction of air pollution through machine learning approaches on the cloud. – In 2018 IEEE/ACM 5th International Conference on Big Data Computing Applications and Technologies (BDCAT) 9p.
Srivastava, C., Singh, S., Singh, A.P. (2018): Estimation of air pollution in Delhi using machine learning techniques. – In 2018 International Conference on Computing, Power and Communication Technologies (GUCON) 6p.
Sun, W., Sun, J. (2017): Daily PM2. 5 concentration prediction based on principal component analysis and LSSVM optimized by cuckoo search algorithm. – Journal of environmental management 188: 144-152.
Tao, J.Y., Wu, Z.M., Yue, D.Z., Tan, X.S., Zeng, Q.Q., & Xia, G.Q. (2020): Performance enhancement of a delay-based Reservoir computing system by using gradient boosting technology. – IEEE 6p.
Tie, X., Geng, F., Peng, L., Gao, W., Zhao, C. (2009): Measurement and modeling of O3 variability in Shanghai, China: Application of the WRF-Chem model. – Atmospheric Environment 43(28): 4289-4302.
Tyralis, H., Papacharalampous, G., Langousis, A. (2021): Super ensemble learning for daily streamflow forecasting: Large-scale demonstration and comparison with multiple machine learning algorithms. – Neural Computing and Applications 33(8): 3053-3068.
Xiao, C., Chen, N., Hu, C., Wang, K., Gong, J., Chen, Z. (2019): Short and mid-term sea surface temperature prediction using time-series satellite data and LSTM-AdaBoost combination approach. –Remote Sensing of Environment 233: 18p.
Zamani Joharestani, M., Cao, C., Ni, X., Bashir, B., Talebiesfandarani, S. (2019): PM2. 5 prediction based on random forest, XGBoost, and deep learning using multisource remote sensing data. – Atmosphere 10(7): 19p.
Zhou, Q., Jiang, H., Wang, J., Zhou, J. (2014): A hybrid model for PM2. 5 forecasting based on ensemble empirical mode decomposition and a general regression neural network. – Science of the Total Environment 496: 264-274.