This study evaluates state-of-the-art machine learning models in predicting the most sustainable arsenic mitigation preference. A Gaussian distribution-based Naïve Bayes (NB) classifier scored the highest Area Under the Curve (AUC) of the Receiver Operating Characteristic curve (0.82), followed by Nu Support Vector Classification (0.80), and K-Neighbors (0.79). Ensemble classifiers scored higher than 70% AUC, with Random Forest being the top performer (0.77), and Decision Tree model ranked fourth with an AUC of 0.77. The multilayer perceptron model also achieved high performance (AUC=0.75). Most linear classifiers underperformed, with the Ridge classifier at the top (AUC=0.73) and perceptron at the bottom (AUC=0.57). A Bernoulli distribution-based Naïve Bayes classifier was the poorest model (AUC=0.50). The Gaussian NB was also the most robust ML model with the slightest variation of Kappa score on training (0.58) and test data (0.64). The results suggest that nonlinear or ensemble classifiers could more accurately understand the complex relationships of socio-environmental data and help develop accurate and robust prediction models of sustainable arsenic mitigation. Furthermore, Gaussian NB is the best option when data is scarce.
In this study, we introduced a novel hybrid artificial intelligence approach of rotation forest (RF) as a Meta/ensemble classifier based on alternating decision tree (ADTree) as a base classifier called RF-ADTree in order to spatially predict gully erosion at Klocheh watershed of Kurdistan province, Iran. A total of 915 gully erosion locations along with 22 gully conditioning factors were used to construct a database. Some soft computing benchmark models (SCBM) including the ADTree, the Support Vector Machine by two kernel functions such as Polynomial and Radial Base Function (SVM-Polynomial and SVM-RBF), the Logistic Regression (LR), and the Naïve Bayes Multinomial Updatable (NBMU) models were used for comparison of the designed model. Results indicated that 19 conditioning factors were effective among which distance to river, geomorphology, land use, hydrological group, lithology and slope angle were the most remarkable factors for gully modeling process. Additionally, results of modeling concluded the RF-ADTree ensemble model could significantly improve (area under the curve (AUC) = 0.906) the prediction accuracy of the ADTree model (AUC = 0.882). The new proposed model had also the highest performance (AUC = 0.913) in comparison to the SVM-Polynomial model (AUC = 0.879), the SVM-RBF model (AUC = 0.867), the LR model (AUC = 0.75), the ADTree model (AUC = 0.861) and the NBMU model (AUC = 0.811).