Affiliations 

  • 1 Department of Earth and Environmental Studies, Montclair State University, New Jersey, USA; The Center for Artificial Intelligence and Environmental Sustainability (CAIES) Foundation, Patna, Bihar, India. Electronic address: sushantorama@gmail.com
  • 2 Department of Earth and Environmental Studies, Montclair State University, New Jersey, USA. Electronic address: taylorr@mail.montclair.edu
  • 3 Centre for Advanced Modelling and Geospatial Information Systems (CAMGIS), School of Civil and Environmental Engineering, University of Technology Sydney, NSW 2007, Australia; Department of Energy and Mineral Resources Engineering, Sejong University, Choongmu-gwan, 209 Neungdong-ro Gwangjin-gu, Seoul 05006, Republic of Korea; Center of Excellence for Climate Change Research, King Abdulaziz University, P. O. Box 80234, Jeddah 21589, Saudi Arabia; Earth Observation Centre, Institute of Climate Change, Universiti Kebangsaan Malaysia, 43600 UKM, Bangi, Selangor, Malaysia. Electronic address: Biswajeet.Pradhan@uts.edu.au
  • 4 College of Natural Resources, Department of Rangeland and Watershed Management Sciences, University of Kurdistan, Sanandaj, Iran. Electronic address: a.shirzadi@uok.ac.ir
  • 5 Department of Geotechnical Engineering, University of Transport Technology, 54 Trieu Khuc, Thanh Xuan, Ha Noi, Viet Nam. Electronic address: binhpt@utt.edu.vn
Ecotoxicol Environ Saf, 2022 Feb 01;232:113271.
PMID: 35121252 DOI: 10.1016/j.ecoenv.2022.113271

Abstract

This study evaluates state-of-the-art machine learning models in predicting the most sustainable arsenic mitigation preference. A Gaussian distribution-based Naïve Bayes (NB) classifier scored the highest Area Under the Curve (AUC) of the Receiver Operating Characteristic curve (0.82), followed by Nu Support Vector Classification (0.80), and K-Neighbors (0.79). Ensemble classifiers scored higher than 70% AUC, with Random Forest being the top performer (0.77), and Decision Tree model ranked fourth with an AUC of 0.77. The multilayer perceptron model also achieved high performance (AUC=0.75). Most linear classifiers underperformed, with the Ridge classifier at the top (AUC=0.73) and perceptron at the bottom (AUC=0.57). A Bernoulli distribution-based Naïve Bayes classifier was the poorest model (AUC=0.50). The Gaussian NB was also the most robust ML model with the slightest variation of Kappa score on training (0.58) and test data (0.64). The results suggest that nonlinear or ensemble classifiers could more accurately understand the complex relationships of socio-environmental data and help develop accurate and robust prediction models of sustainable arsenic mitigation. Furthermore, Gaussian NB is the best option when data is scarce.

* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.