Skip Navigation Links.
Collapse <span class="m110 colortj mt20 fontw700">Volume 12 (2024)</span>Volume 12 (2024)
Collapse <span class="m110 colortj mt20 fontw700">Volume 11 (2023)</span>Volume 11 (2023)
Collapse <span class="m110 colortj mt20 fontw700">Volume 10 (2022)</span>Volume 10 (2022)
Collapse <span class="m110 colortj mt20 fontw700">Volume 9 (2021)</span>Volume 9 (2021)
Collapse <span class="m110 colortj mt20 fontw700">Volume 8 (2020)</span>Volume 8 (2020)
Collapse <span class="m110 colortj mt20 fontw700">Volume 7 (2019)</span>Volume 7 (2019)
Collapse <span class="m110 colortj mt20 fontw700">Volume 6 (2018)</span>Volume 6 (2018)
Collapse <span class="m110 colortj mt20 fontw700">Volume 5 (2017)</span>Volume 5 (2017)
Collapse <span class="m110 colortj mt20 fontw700">Volume 4 (2016)</span>Volume 4 (2016)
Collapse <span class="m110 colortj mt20 fontw700">Volume 3 (2015)</span>Volume 3 (2015)
Collapse <span class="m110 colortj mt20 fontw700">Volume 2 (2014)</span>Volume 2 (2014)
Collapse <span class="m110 colortj mt20 fontw700">Volume 1 (2013)</span>Volume 1 (2013)
American Journal of Applied Mathematics and Statistics. 2023, 11(3), 89-97
DOI: 10.12691/AJAMS-11-3-2
Original Research

Beyond Tides and Time: Machine Learning’s Triumph in Water Quality Forecasting

Yinpu Li1, Siqi Mao1, , Yaping Yuan2, Ziren Wang3, Yixin Kang1 and Yuanxin Yao1

1Department of Statistics, Florida State University, Tallahassee, USA

2Department of Mathematics and Statistics, University of Massachusetts Amherst, Amherst, USA

3Department of Statistics, Rice University, Houston, USA

Pub. Date: November 02, 2023

Cite this paper

Yinpu Li, Siqi Mao, Yaping Yuan, Ziren Wang, Yixin Kang and Yuanxin Yao. Beyond Tides and Time: Machine Learning’s Triumph in Water Quality Forecasting. American Journal of Applied Mathematics and Statistics. 2023; 11(3):89-97. doi: 10.12691/AJAMS-11-3-2

Abstract

Water resources are essential for sustaining human livelihoods and environmental well-being. Accurate water quality prediction plays a pivotal role in effective resource management and pollution mitigation. In this study, we assess the effectiveness of five distinct predictive models—linear regression, Random Forest, XGBoost, LightGBM, and MLP neural network—in forecasting pH values within the geographical context of Georgia, USA. Notably, LightGBM emerges as the top-performing model, achieving the highest average precision. Our analysis underscores the supremacy of tree-based models in addressing regression challenges, while revealing the sensitivity of MLP neural networks to feature scaling. Intriguingly, our findings shed light on a counter-intuitive discovery: machine learning models, which do not explicitly account for time dependencies and spatial considerations, outperform spatial-temporal models. This unexpected superiority of machine learning models challenges conventional assumptions and highlights their potential for practical applications in water quality prediction. Our research aims to establish a robust predictive pipeline accessible to both data science experts and those without domain-specific knowledge. In essence, we present a novel perspective on achieving high prediction accuracy and interpretability in data science methodologies. Through this study, we redefine the boundaries of water quality forecasting, emphasizing the significance of data-driven approaches over traditional spatial-temporal models. Our findings offer valuable insights into the evolving landscape of water resource management and environmental protection.

Keywords

water quality prediction, linear-regression, random forest, xgboost, lightgbm, mlp neural network

Copyright

Creative CommonsThis work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

References

[1]  Geetha, S., Gouthami, S.: Internet of things enabled real time water quality monitoring system. Smart Water 2(1). 1–19. 2016.
 
[2]  Rajaee, T., Boroumand, A.: Forecasting of chlorophyll-a concentrations in south san francisco bay using five different models. Applied Ocean Research 53, 208–217. 2015.
 
[3]  Araghinejad, S.: Data-driven Modeling: Using MATLAB®in Water Resources and Environmental engineering. Springer Science & Business Media. Vol. 67. 2013.
 
[4]  Nourani, V., Alami, M.T., Vousoughi, F.D.: Self-organizing map clustering technique for ann-based spatiotemporal modeling of groundwater quality parameters. Journal of Hydroinformatics 18(2), 288–309. 2016.
 
[5]  Zare, A., Bayat, V., Daneshkare, A.: Forecasting nitrate concentration in ground-water using artificial neural network and linear regression models. International agrophysics 25(2). 2011.
 
[6]  Huo, S., He, Z., Su, J., Xi, B., Zhu, C.: Using artificial neural network models for eutrophication prediction. Procedia Environmental Sciences 18, 310–316. 2013.
 
[7]  Chang, F.-J., Chen, P.-A., Chang, L.-C., Tsai, Y.-H.: Estimating spatio-temporal dynamics of stream total phosphate concentration by soft computing techniques. Science of the Total Environment 562, 228–236. 2016.
 
[8]  Chen, D.Q., Mao, S.-Q., Niu, X.-F.: Tests and classification methods in adaptive designs with applications. Journal of Applied Statistics 50(6), 1334–1357. 2023.
 
[9]  Li, Y., Linero, A.R., Murray, J.: Adaptive conditional distribution estimation with bayesian decision tree ensembles. Journal of the American Statistical Association, 1–14. 2022.
 
[10]  Henrique, B.M., Sobreiro, V.A., Kimura, H.: Literature review: Machine learning techniques applied to financial market prediction. Expert systems with applications 124, 226–251. 2019.
 
[11]  Lu, H., Ma, X.: Hybrid decision tree-based machine learning models for short-term water quality prediction. Chemosphere 249, 126169. 2020.
 
[12]  Huang, P., Trayler, K., Wang, B., Saeed, A., Oldham, C.E., Busch, B., Hipsey, M.R.: An integrated modelling system for water quality forecasting in an urban eutrophic estuary: The swan-canning estuary virtual observatory. Journal of Marine Systems 199, 103218. 1995.
 
[13]  Wang, S., Peng, H., Liang, S.: Prediction of estuarine water quality using interpretable machine learning approach. Journal of Hydrology 605, 127320. 2022.
 
[14]  Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol.30. 2017.
 
[15]  Li, L., Qiao, J., Yu, G., Wang, L., Li, H.-Y., Liao, C., Zhu, Z.: Interpretable tree-based ensemble model for predicting beach water quality. Water Research 211, 118078. 2022.
 
[16]  Zhang, G., Patuwo, B.E., Hu, M.Y.: Forecasting with artificial neural networks:The state of the art. International journal of forecasting 14(1), 35–62. 1998.
 
[17]  Anmala, J., Meier, O.W., Meier, A.J., Grubbs, S.: Gis and artificial neural network–based water quality model for a stream network in the upper green river basin, Kentucky, USA. Journal of Environmental Engineering 141(5), 04014082. 2015.
 
[18]  Li, L., Jiang, P., Xu, H., Lin, G., Guo, D., Wu, H.: Water quality prediction based on recurrent neural network and improved evidence theory: a case study of qiantang river, China. Environmental Science and Pollution Research 26, 19879–19896. 2019.
 
[19]  Singh, K.P., Basant, A., Malik, A., Jain, G.: Artificial neural network modeling of the river water quality—a case study. Ecological modelling 220(6), 888–895. 2009.
 
[20]  Garc´ıa-Alba, J., B´arcena, J.F., Ugarteburu, C., Garc´ıa, A.: Artificial neural networks as emulators of process-based models to analyse bathing water quality in estuaries. Water research 150, 283–295. 2019.
 
[21]  Peng, Z., Hu, W., Liu, G., Zhang, H., Gao, R., Wei, W.: Development and evaluation of a real-time forecasting framework for daily water quality forecasts for lake chaohu to lead time of six days. Science of the total environment 687, 218–231. 2019.
 
[22]  Zhao, L., Gkountouna, O., Pfoser, D.: Spatial auto-regressive dependency interpretable learning based on spatial topological constraints. ACM Transactions on Spatial Algorithms and Systems (TSAS) 5(3), 1–28. 2019.
 
[23]  Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology 58(1), 267–288 (1996).
 
[24]  Hoerl, A.E., Kennard, R.W.: Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67. 1970.
 
[25]  Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, pp. 785–794. 2016.
 
[26]  Breiman, L.: Random forests. Machine learning 45, 5–32. 2001.
 
[27]  LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324. 1998.
 
[28]  Linero AR, Basak P, Li Y, Sinha D. Bayesian survival tree ensembles with submodel shrinkage. Bayesian Analysis. 2022 Sep;17(3):997-1020.
 
[29]  Li, Y., 2021. Bayesian Ensemble Tree Models for Nonparametric Problems (Doctoral dissertation, The Florida State University).
 
[30]  Mao, S., 2022. Time Series and Machine Learning Models for Financial Markets Forecast (Doctoral dissertation, The Florida State University).