Affiliations 

  • 1 Geographic Information Science Research Group, Ton Duc Thang University, Ho Chi Minh City, Viet Nam; Faculty of Environment and Labour Safety, Ton Duc Thang University, Ho Chi Minh City, Viet Nam. Electronic address: buitiendieu@tdtu.edu.vn
  • 2 School of Engineering, University of Guelph, ON, Canada
  • 3 Department of Watershed Management, Sari Agricultural Science and Natural Resources University, Sari, Iran
  • 4 University of Campania "Luigi Vanvitelli", Department of Environmental, Biological and Pharmaceutical Sciences and Technologies, Via Vivaldi 43, 81100, Caserta, Italy
  • 5 Department of Civil Engineering, Faculty of Engineering & Built Environment, Universiti Kebangsaan, Malaysia
  • 6 Institute of Research and Development, Duy Tan University, Da Nang 550000, Viet Nam. Electronic address: nguyenhoang23@duytan.edu.vn
  • 7 University of Campania "Luigi Vanvitelli", Department of Environmental, Biological and Pharmaceutical Sciences and Technologies, Via Vivaldi 43, 81100, Caserta, Italy; Istituto Nazionale di Geofisica e Vulcanologia, sezione di Napoli - Osservatorio Vesuvuviano, Via Diocleziano 328 - Napoli, Italy
  • 8 Aristotle University of Thessaloniki, Department of Geology, Lab. of Engineering Geology & Hydrogeology, 54124 Thessaloniki, Greece. Electronic address: kazakis@geo.auth.gr
Sci Total Environ, 2020 May 01;715:136836.
PMID: 32007881 DOI: 10.1016/j.scitotenv.2020.136836

Abstract

Groundwater resources constitute the main source of clean fresh water for domestic use and it is essential for food production in the agricultural sector. Groundwater has a vital role for water supply in the Campanian Plain in Italy and hence a future sustainability of the resource is essential for the region. In the current paper novel data mining algorithms including Gaussian Process (GP) were used in a large groundwater quality database to predict nitrate (contaminant) and strontium (potential future increasing) concentrations in groundwater. The results were compared with M5P, random forest (RF) and random tree (RT) algorithms as a benchmark to test the robustness of the modeling process. The dataset includes 246 groundwater quality samples originating from different wells, municipals and agricultural. It was divided for the modeling process into two subgroups by using the 10-fold cross validation technique including 173 samples for model building (training dataset) and 73 samples for model validation (testing dataset). Different water quality variables including T, pH, EC, HCO3-, F-, Cl-, SO42-, Na+, K+, Mg2+, and Ca2+ have been used as an input to the models. At first stage, different input combinations have been constructed based on correlation coefficient and thus the optimal combination was chosen for the modeling phase. Different quantitative criteria alongside with visual comparison approach have been used for evaluating the modeling capability. Results revealed that to obtain reliable results also variables with low correlation should be considered as an input to the models together with those variables showing high correlation coefficients. According to the model evaluation criteria, GP algorithm outperforms all the other models in predicting both nitrate and strontium concentrations followed by RF, M5P and RT, respectively. Result also revealed that model's structure together with the accuracy and structure of the data can have a relevant impact on the model's results.

* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.