Soil erosion by wind poses a significant threat to various regions across the globe, such as drylands in the Middle East and Iran. Wind erosion hazard maps can assist in identifying the regions of highest wind erosion risk and are a valuable tool for the mitigation of its destructive consequences. This study aims to map wind erosion hazards by developing an interpretable (explainable) model based on machine learning (ML) and Shapley additive exPlanation (SHAP) interpretation techniques. Four ML models, namely random forest (RF), support vector machine (SVM), extreme gradient boosting (XGB), and quadratic discriminant analysis (QDA) were used. Thirteen features associated with wind erosion were mapped spatially and then subjected to a multivariate adaptive regression spline (MARS) feature selection algorithm, and then, tolerance coefficient (TC) and variance inflation factor (VIF) statistical tests were used to explore multicollinearity among the variables. MARS analysis shows that eight features consisting of elevation (or DEM), soil bulk density, precipitation, aspect, slope, soil sand content, vegetation cover (or NDVI), and lithology were the most effective for wind erosion, while no collinearity existed among these variables. The ML models were used for ranking the effective features, and the research introduces the application of an interpretable ML model for the interpretation of predictive model's output. The ranking of effective features by RF-as the most typical ML model-revealed that elevation and soil bulk density were the two most important features. According to the area under the receiver operating characteristic curve (AUROC) (with a value > 90%) and precision-recall (PR) (with a value > 90%) curves, all four ML models performed with great accuracy. According to the PR curve, the SVM model performed slightly better than others, and its results revealed that 20.9%, 23%, and 16.6% of the total area in Hormozgan Province is characterized by moderate, high, and very high hazard classes to wind erosion, respectively. SHAP revealed that soil sand content and elevation are the most important variables contributing to the predictive model output. Overall, our research is one of the pioneering applications of interpretable ML models in mapping wind erosion hazards in Southern Iran. We recommend that future research should address the aspect of interpretability in order to better understand predictive model outputs.
* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.