The extraction of relevant wavelengths from a large dataset of Near Infrared Spectroscopy (NIRS) is a significant challenge in vibrational spectroscopy research. Nonetheless, this process allows the improvement in the chemical interpretability by emphasizing the chemical entities related to the chemical parameters of samples. With the complexity in the dataset, it may be possible that irrelevant wavelengths are still included in the multivariate calibration. This yields the computational process to become unnecessary complex and decreases the accuracy and robustness of the model. In multivariate analysis, Partial Least Square Regression (PLSR) is a method commonly used to build a predictive model from NIR spectral data. However, in the PLSR method and common commercial chemometrics software, there is no standard wavelength selection procedure applied to screen the irrelevant wavelengths. In this study, a new robust wavelength selection procedure called the modified VIP-MCUVE (mod-VIP-MCUVE) using Filter-Wrapper method and input scaling strategy is introduced. The proposed method combines the modified Variable Importance in Projection (VIP) and modified Monte Carlo Uninformative Variable Elimination (MCUVE) to calculate the scale matrix of the input variable. The modified VIP uses the orthogonal components of Partial Least Square (PLS) in investigating the informative variable in the model by applying the amount of variation both in X and y{SSX,SSY}, simultaneously. The modified MCUVE uses a robust reliability coefficient and a robust tolerance interval in the selection procedure. To evaluate the superiority of the proposed method, the classical VIP, MCUVE, and autoscaling procedure in classical PLSR were also included in the evaluation. Using artificial data with Monte Carlo simulation and NIR spectral data of oil palm (Elaeis guineensis Jacq.) fruit mesocarp, the study shows that the proposed method offers advantages to improve model interpretability, to be computationally extensive, and to produce better model accuracy.
In cancer studies, the prediction of cancer outcome based on a set of prognostic variables has been a long-standing topic of interest. Current statistical methods for survival analysis offer the possibility of modelling cancer survivability but require unrealistic assumptions about the survival time distribution or proportionality of hazard. Therefore, attention must be paid in developing nonlinear models with less restrictive assumptions. Artificial neural network (ANN) models are primarily useful in prediction when nonlinear approaches are required to sift through the plethora of available information. The applications of ANN models for prognostic and diagnostic classification in medicine have attracted a lot of interest. The applications of ANN models in modelling the survival of patients with gastric cancer have been discussed in some studies without completely considering the censored data. This study proposes an ANN model for predicting gastric cancer survivability, considering the censored data. Five separate single time-point ANN models were developed to predict the outcome of patients after 1, 2, 3, 4, and 5 years. The performance of ANN model in predicting the probabilities of death is consistently high for all time points according to the accuracy and the area under the receiver operating characteristic curve.
In practice, the collected spectra are very often composes of complex overtone and many overlapping peaks which may lead to misinterpretation because of its significant nonlinear characteristics. Using linear solution might not be appropriate. In addition, with a high-dimension of dataset due to large number of observations and data points the classical multiple regressions will neglect to fit. These complexities commonly will impact to multicollinearity problem, furthermore the risk of contamination of multiple outliers and high leverage points also increases. To address these problems, a new method called Kernel Partial Diagnostic Robust Potential (KPDRGP) is introduced. The method allows the nonlinear solution which maps nonlinearly the original input
X
matrix into higher dimensional feature mapping with corresponds to the Reproducing Kernel Hilbert Spaces (RKHS). In dimensional reduction, the method replaces the dot products calculation of elements in the mapped data to a nonlinear function in the original input space. To prevent the contamination of the multiple outlier and high leverage points the robust procedure using Diagnostic Robust Generalized Potentials (DRGP) algorithm was used. The results verified that using the simulation and real data, the proposed KPDRGP method was superior to the methods in the class of non-kernel and some other robust methods with kernel solution.
Since the first coronavirus disease 2019 (COVID-19) outbreak appeared in Wuhan, mainland China on December 31, 2019, the geographical spread of the epidemic was swift. Malaysia is one of the countries that were hit substantially by the outbreak, particularly in the second wave. This study aims to simulate the infectious trend and trajectory of COVID-19 to understand the severity of the disease and determine the approximate number of days required for the trend to decline. The number of confirmed positive infectious cases [as reported by Ministry of Health, Malaysia (MOH)] were used from January 25, 2020 to March 31, 2020. This study simulated the infectious count for the same duration to assess the predictive capability of the Susceptible-Infectious-Recovered (SIR) model. The same model was used to project the simulation trajectory of confirmed positive infectious cases for 80 days from the beginning of the outbreak and extended the trajectory for another 30 days to obtain an overall picture of the severity of the disease in Malaysia. The transmission rate, β also been utilized to predict the cumulative number of infectious individuals. Using the SIR model, the simulated infectious cases count obtained was not far from the actual count. The simulated trend was able to mimic the actual count and capture the actual spikes approximately. The infectious trajectory simulation for 80 days and the extended trajectory for 110 days depicts that the inclining trend has peaked and ended and will decline towards late April 2020. Furthermore, the predicted cumulative number of infectious individuals tallies with the preparations undertaken by the MOH. The simulation indicates the severity of COVID-19 disease in Malaysia, suggesting a peak of infectiousness in mid-March 2020 and a probable decline in late April 2020. Overall, the study findings indicate that outbreak control measures such as the Movement Control Order (MCO), social distancing and increased hygienic awareness is needed to control the transmission of the outbreak in Malaysia.