Due to excessive streamflow (SF), Peninsular Malaysia has historically experienced floods and droughts. Forecasting streamflow to mitigate municipal and environmental damage is therefore crucial. Streamflow prediction has been extensively demonstrated in the literature to estimate the continuous values of streamflow level. Prediction of continuous values of streamflow is not necessary in several applications and at the same time it is very challenging task because of uncertainty. A streamflow category prediction is more advantageous for addressing the uncertainty in numerical point forecasting, considering that its predictions are linked to a propensity to belong to the pre-defined classes. Here, we formulate streamflow prediction as a time series classification with discrete ranges of values, each representing a class to classify streamflow into five or ten, respectively, using machine learning approaches in various rivers in Malaysia. The findings reveal that several models, specifically LSTM, outperform others in predicting the following n-time steps of streamflow because LSTM is able to learn the mapping between streamflow time series of 2 or 3 days ahead more than support vector machine (SVM) and gradient boosting (GB). LSTM produces higher F1 score in various rivers (by 5% in Johor, 2% in Kelantan and Melaka and Selangor, 4% in Perlis) in 2 days ahead scenario. Furthermore, the ensemble stacking of the SVM and GB achieves high performance in terms of F1 score and quadratic weighted kappa. Ensemble stacking gives 3% higher F1 score in Perak river compared to SVM and gradient boosting.
Groundwater, the world's most abundant source of freshwater, is rapidly depleting in many regions due to a variety of factors. Accurate forecasting of groundwater level (GWL) is essential for effective management of this vital resource, but it remains a complex and challenging task. In recent years, there has been a notable increase in the use of machine learning (ML) techniques to model GWL, with many studies reporting exceptional results. In this paper, we present a comprehensive review of 142 relevant articles indexed by the Web of Science from 2017 to 2023, focusing on key ML models, including artificial neural networks (ANN), adaptive neuro-fuzzy inference systems (ANFIS), support vector regression (SVR), evolutionary computing (EC), deep learning (DL), ensemble learning (EN), and hybrid-modeling (HM). We also discussed key modeling concepts such as dataset size, data splitting, input variable selection, forecasting time-step, performance metrics (PM), study zones, and aquifers, highlighting best practices for optimal GWL forecasting with ML. This review provides valuable insights and recommendations for researchers and water management agencies working in the field of groundwater management and hydrology.
The impact of the suspended sediment load (SSL) on environmental health, agricultural operations, and water resources planning, is significant. The deposit of SSL restricts the streamflow region, affecting aquatic life migration and finally causing a river course shift. As a result, data on suspended sediments and their fluctuations are essential for a number of authorities especially for water resources decision makers. SSL prediction is often difficult due to a number of issues such as site-specific data, site-specific models, lack of several substantial components to use in prediction, and complexity its pattern. In the past two decades, many machine learning algorithms have shown huge potential for SSL river prediction. However, these models did not provide very reliable results, which led to the conclusion that the accuracy of SSL prediction should be improved. As a result, in order to solve past concerns, this research proposes a Long Short-Term Memory (LSTM) model for SSL prediction. The proposed model was applied for SSL prediction in Johor River located in Malaysia. The study allocated data for suspended sediment load and river flow for period 2010 to 2020. In the current research, four alternative models-Multi-Layer Perceptron (MLP) neural network, Support Vector Regression (SVR), Random Forest (RF), and Long Short-term Memory (LSTM) were investigated to predict the suspended sediment load. The proposed model attained a high correlation value between predicted and actual SSL (0.97), with a minimum RMSE (148.4 ton/day and a minimum MAE (33.43 ton/day). and can thus be generalized for application in similar rivers around the world.
Box-Wilson design (BWD) model was applied to determine the optimum values of influencing parameters in anaerobic fermentation to produce hydrogen using Clostridium saccharoperbutylacetonicum N1-4 (ATCC 13564). The main focus of the study was to find the optimal relationship between the hydrogen yield and three variables including initial substrate concentration, initial medium pH and reaction temperature. Microbial growth kinetic parameters for hydrogen production under anaerobic conditions were determined using the Monod model with incorporation of a substrate inhibition term. The values of micro(max) (maximum specific growth rate) and K, (saturation constant) were 0.398 h(-1) and 5.509 g L(-1), respectively, using glucose as the substrate. The experimental substrate and biomass-concentration profiles were in good agreement with those obtained by the kinetic-model predictions. By varying the conditions of the initial substrate concentration (1-40 g L(-1)), reaction temperature (25-40 degrees C) and initial medium pH (4-8), the model predicted a maximum hydrogen yield of 3.24 mol H2 (mol glucose)(-1). The experimental data collected utilising this design was successfully fitted to a second-order polynomial model. An optimum operating condition of 10 g L(-1) initial substrate concentration, 37 degrees C reaction temperature and 6.0 +/- 0.2 initial medium pH gave 80% of the predicted maximum yield of hydrogen where as the experimental yield obtained in this study was 77.75% exhibiting a close accuracy between estimated and experimental values. This is the first report to predict bio-hydrogen yield by applying Box-Wilson Design in anaerobic fermentation while optimizing the effects of environmental factors prevailing there by investigating the effects of environmental factors.
A modeling framework utilizing the coactive neuro-fuzzy inference system (CANFIS) has been developed for multi-lead time groundwater level (GWL) forecasting in four different wells located in Texas and Florida, USA. Various model input combinations, including GWL, precipitation, temperature, and surface water level variables, have been derived based on proposed correlation analysis using singular spectrum analysis (SSA) remainders. The models have been trained on data subsets of varying lengths to identify the optimal training data duration. Additionally, we have introduced the bagging ensemble learning method to enhance the performance of the CANFIS model. As part of a comprehensive model evaluation process, the best-performing CANFIS model for each forecasting scenario has undergone uncertainty analysis using bootstrap sampling. Our results reveal that the CANFIS model performs satisfactorily for daily forecasting but leaves room for improvement in monthly forecasting, particularly for two-month and three-month ahead forecasts. Moreover, we have identified several optimal input combinations, highlighting the significance of the temperature variable in monthly forecasting. Furthermore, our findings indicate that additional training data does not necessarily lead to improved performance. The ensemble CANFIS model has demonstrated significant performance enhancement, particularly for monthly forecasting. Finally, the CANFIS model uncertainty analysis has shown satisfactory results for daily forecasting scenarios, while monthly forecasting models exhibit higher uncertainties, particularly during periods with distinctly different GWL fluctuation patterns.
In the past few decades, there has been a rapid growth in the concentration of nitrogenous compounds such as nitrate-nitrogen and ammonia-nitrogen in rivers, primarily due to increasing agricultural and industrial activities. These nitrogenous compounds are mainly responsible for eutrophication when present in river water, and for 'blue baby syndrome' when present in drinking water. High concentrations of these compounds in rivers may eventually lead to the closure of treatment plants. This study presents a training and a selection approach to develop an optimum artificial neural network model for predicting monthly average nitrate-N and monthly average ammonia-N. Several studies have predicted these compounds, but most of the proposed procedures do not involve testing various model architectures in order to achieve the optimum predicting model. Additionally, none of the models have been trained for hydrological conditions such as the case of Malaysia. This study presents models trained on the hydrological data from 1981 to 2017 for the Langat River in Selangor, Malaysia. The model architectures used for training are General Regression Neural Network (GRNN), Multilayer Neural Network and Radial Basis Function Neural Network (RBFNN). These models were trained for various combinations of internal parameters, input variables and model architectures. Post-training, the optimum performing model was selected based on the regression and error values and plot of predicted versus observed values. Optimum models provide promising results with a minimum overall regression value of 0.92.
Following the publication of the article it has come to the authors' attention that the first panel of Fig. 11 has been repeated with the second panel of Fig. 11.
Suspended sediment load (SSL) estimation is a required exercise in water resource management. This article proposes the use of hybrid artificial neural network (ANN) models, for the prediction of SSL, based on previous SSL values. Different input scenarios of daily SSL were used to evaluate the capacity of the ANN-ant lion optimization (ALO), ANN-bat algorithm (BA) and ANN-particle swarm optimization (PSO). The Goorganrood basin in Iran was selected for this study. First, the lagged SSL data were used as the inputs to the models. Next, the rainfall and temperature data were used. Optimization algorithms were used to fine-tune the parameters of the ANN model. Three statistical indexes were used to evaluate the accuracy of the models: the root-mean-square error (RMSE), mean absolute error (MAE) and Nash-Sutcliffe efficiency (NSE). An uncertainty analysis of the predicting models was performed to evaluate the capability of the hybrid ANN models. A comparison of models indicated that the ANN-ALO improved the RMSE accuracy of the ANN-BA and ANN-PSO models by 18% and 26%, respectively. Based on the uncertainty analysis, it can be surmised that the ANN-ALO has an acceptable degree of uncertainty in predicting daily SSL. Generally, the results indicate that the ANN-ALO is applicable for a variety of water resource management operations.
Iraq is facing a dire water crisis due to the decrease in water quantities flow in Tigris and Euphrates Rivers. Due to population growth, several studies estimated the water shortage in 2035 to be 44 Billion Cubic Meter (BCM). Thus, Water Budget-Salt Balance Model (WBSBM) has been developed, applied and examined for the Euphrates River basin to compute the net water saving from Non-Conventional Water Resources (NCWRs). WBSBM includes 4-stages; the first is to identify the required data correspond to the conventional water resources in the study-area. The second stage is demonstrating the water-users activities. Thirdly, develop model through the proposed NCWR projects that reflect the required data. The final stage involves net water saving computation while applying all the NCWR projects simultaneously. The results obtained the optimal potential net water saving amount, which are 6.823 and 6.626 BCM/year in 2025 and 2035, respectively. In conclusion, the proposed WBSBM model has comprehensively examined different scenarios of utilizing NCWRs and has determined the optimal potential the net water saving amounts.
To maintain human health and purity of drinking water, it is crucial to eliminate harmful chemicals such as nitrophenols and azo dyes, considering their natural presence in the surroundings. In this particular research study, the application of machine learning techniques was employed in order to make an estimation of the performance of reduction catalysis in the context of ecologically detrimental nitrophenols and azo dyes contaminants. The catalyst utilized in the experiment was Ag@CMC, which proved to be highly effective in eliminating various contaminants found in water, like 4-nitrophenol (4-NP). The experiments were carefully conducted at various time intervals, and the machine learning procedures used in this study were all employed to forecast catalytic performance. The evaluation of the performance of such algorithms were done by means of Mean Absolute Error. The noteworthy findings of this research indicated that the ADAM and LSTM algorithm exhibited the most favourable performance in the case of toxic compounds i.e. 4-NP. Moreover, the Ag@CMC catalyst demonstrated an impressive reduction efficiency of 98 % against nitrophenol in just 8 min. Thus, based on these compelling results, it can be concluded that Ag@CMC works as a highly effective catalyst for practical applications in real-world scenarios.
In consideration of the distinct behavior of machine learning (ML) algorithms, six well-defined ML used were carried out in this study for predicting sea level on a day-to-day basis. Data compiled from 1985 to 2018 was utilized for training and testing the developed models. An assessment of the multiple statistics-driven regression algorithms resulted such that each tested location was associated with a particular preferred model. The following were the developed best models for their respective study areas: In Peninsular Malaysia, the interactions linear regression model was the best at Pulau Langkawi (RMSE = 19.066), the Matern 5/2 gaussian process regression model at Geting (RMSE = 49.891), and the trilayered artificial neural network at Pulau Pinang (RMSE = 20.026), while the linear regression model was the best at Sandakan in Sabah, East Malaysia (RMSE = 14.054). Other metrics, such as MAE and R-square, were also at their best values, each providing its best values, further substantiating the RMSE respectively, at each of the study areas. These empirical statistics (or metrics) also revealed that despite employing sea level as the sole parameter, results obtained were exceptional better when utilizing a 7-day lag, regardless of the model used. Notably, lag variables with less than a 7-day lag could degrade the model's accuracy in representing ground reality. The study emphasizes the importance of thorough training and testing of ML to aid decision-makers in developing mitigation actions for the climate change phenomena of sea level rise through reliable ML.
Multi-walled carbon nanotubes (CNTs) functionalized with a deep eutectic solvent (DES) were utilized to remove mercury ions from water. An artificial neural network (ANN) technique was used for modelling the functionalized CNTs adsorption capacity. The amount of adsorbent dosage, contact time, mercury ions concentration and pH were varied, and the effect of parameters on the functionalized CNT adsorption capacity is observed. The (NARX) network, (FFNN) network and layer recurrent (LR) neural network were used. The model performance was compared using different indicators, including the root mean square error (RMSE), relative root mean square error (RRMSE), mean absolute percentage error (MAPE), mean square error (MSE), correlation coefficient (R2) and relative error (RE). Three kinetic models were applied to the experimental and predicted data; the pseudo second-order model was the best at describing the data. The maximum RE, R2 and MSE were 9.79%, 0.9701 and 1.15 × 10-3, respectively, for the NARX model; 15.02%, 0.9304 and 2.2 × 10-3 for the LR model; and 16.4%, 0.9313 and 2.27 × 10-3 for the FFNN model. The NARX model accurately predicted the adsorption capacity with better performance than the FFNN and LR models.
Reference evapotranspiration (ET0) plays a fundamental role in irrigated agriculture. The objective of this study is to simulate monthly ET0 at a meteorological station in India using a new method, an improved support vector machine (SVM) based on the cuckoo algorithm (CA), which is known as SVM-CA. Maximum temperature, minimum temperature, relative humidity, wind speed and sunshine hours were selected as inputs for the models used in the simulation. The results of the simulation using SVM-CA were compared with those from experimental models, genetic programming (GP), model tree (M5T) and the adaptive neuro-fuzzy inference system (ANFIS). The achieved results demonstrate that the proposed SVM-CA model is able to simulate ET0 more accurately than the GP, M5T and ANFIS models. Two major indicators, namely, root mean square error (RMSE) and mean absolute error (MAE), indicated that the SVM-CA outperformed the other methods with respective reductions of 5-15% and 5-17% compared with the GP model, 12-21% and 10-22% compared with the M5T model, and 7-15% and 5-18% compared with the ANFIS model, respectively. Therefore, the proposed SVM-CA model has high potential for accurate simulation of monthly ET0 values compared with the other models.
In nature, streamflow pattern is characterized with high non-linearity and non-stationarity. Developing an accurate forecasting model for a streamflow is highly essential for several applications in the field of water resources engineering. One of the main contributors for the modeling reliability is the optimization of the input variables to achieve an accurate forecasting model. The main step of modeling is the selection of the proper input combinations. Hence, developing an algorithm that can determine the optimal input combinations is crucial. This study introduces the Genetic algorithm (GA) for better input combination selection. Radial basis function neural network (RBFNN) is used for monthly streamflow time series forecasting due to its simplicity and effectiveness of integration with the selection algorithm. In this paper, the RBFNN was integrated with the Genetic algorithm (GA) for streamflow forecasting. The RBFNN-GA was applied to forecast streamflow at the High Aswan Dam on the Nile River. The results showed that the proposed model provided high accuracy. The GA algorithm can successfully determine effective input parameters in streamflow time series forecasting.
Solar energy is a major type of renewable energy, and its estimation is important for decision-makers. This study introduces a new prediction model for solar radiation based on support vector regression (SVR) and the improved particle swarm optimization (IPSO) algorithm. The new version of algorithm attempts to enhance the global search ability for the PSO. In practice, the SVR method has a few parameters that should be determined through a trial-and-error procedure while developing the prediction model. This procedure usually leads to non-optimal choices for these parameters and, hence, poor prediction accuracy. Therefore, there is a need to integrate the SVR model with an optimization algorithm to achieve optimal choices for these parameters. Thus, the IPSO algorithm, as an optimizer is integrated with SVR to obtain optimal values for the SVR parameters. To examine the proposed model, two solar radiation stations, Adana, Antakya and Konya, in Turkey, are considered for this study. In addition, different models have been tested for this prediction, namely, the M5 tree model (M5T), genetic programming (GP), SVR integrated with four different optimization algorithms SVR-PSO, SVR-IPSO, Genetic Algorithm (SVR-GA), FireFly Algorithm (SVR-FFA) and the multivariate adaptive regression (MARS) model. The sensitivity analysis is performed to achieve the highest accuracy level of the prediction by choosing different input parameters. Several performance measuring indices have been considered to examine the efficiency of all the prediction methods. The results show that SVR-IPSO outperformed M5T and MARS.