Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.
Matched MeSH terms: Principal Component Analysis/methods*
The motivation behind this research is to innovatively combine new methods like wavelet, principal component analysis (PCA), and artificial neural network (ANN) approaches to analyze trade in today's increasingly difficult and volatile financial futures markets. The main focus of this study is to facilitate forecasting by using an enhanced denoising process on market data, taken as a multivariate signal, in order to deduct the same noise from the open-high-low-close signal of a market. This research offers evidence on the predictive ability and the profitability of abnormal returns of a new hybrid forecasting model using Wavelet-PCA denoising and ANN (named WPCA-NN) on futures contracts of Hong Kong's Hang Seng futures, Japan's NIKKEI 225 futures, Singapore's MSCI futures, South Korea's KOSPI 200 futures, and Taiwan's TAIEX futures from 2005 to 2014. Using a host of technical analysis indicators consisting of RSI, MACD, MACD Signal, Stochastic Fast %K, Stochastic Slow %K, Stochastic %D, and Ultimate Oscillator, empirical results show that the annual mean returns of WPCA-NN are more than the threshold buy-and-hold for the validation, test, and evaluation periods; this is inconsistent with the traditional random walk hypothesis, which insists that mechanical rules cannot outperform the threshold buy-and-hold. The findings, however, are consistent with literature that advocates technical analysis.
The location model proposed in the past is a predictive discriminant rule that can classify new observations into one
of two predefined groups based on mixtures of continuous and categorical variables. The ability of location model to
discriminate new observation correctly is highly dependent on the number of multinomial cells created by the number
of categorical variables. This study conducts a preliminary investigation to show the location model that uses maximum
likelihood estimation has high misclassification rate up to 45% on average in dealing with more than six categorical
variables for all 36 data tested. Such model indicated highly incorrect prediction as this model performed badly for
large categorical variables even with large sample size. To alleviate the high rate of misclassification, a new strategy
is embedded in the discriminant rule by introducing nonlinear principal component analysis (NPCA) into the classical
location model (cLM), mainly to handle the large number of categorical variables. This new strategy is investigated
on some simulation and real datasets through the estimation of misclassification rate using leave-one-out method. The
results from numerical investigations manifest the feasibility of the proposed model as the misclassification rate is
dramatically decreased compared to the cLM for all 18 different data settings. A practical application using real dataset
demonstrates a significant improvement and obtains comparable result among the best methods that are compared. The
overall findings reveal that the proposed model extended the applicability range of the location model as previously it
was limited to only six categorical variables to achieve acceptable performance. This study proved that the proposed
model with new discrimination procedure can be used as an alternative to the problems of mixed variables classification,
primarily when facing with large categorical variables.
The study to determine the concentrations of dissolved heavy metals in the Sungai Semenyih and to use the environmetric
methods to evaluate the influence of different pollution sources on heavy metals concentrations was carried out. Cluster
analysis (CA) classified 8 sampling stations into two clusters based on the similarity of sampling stations characteristics,
cluster 1 included stations 1, 2, 3 and 4 (low pollution area), whereas cluster 2 comprised of stations 5, 6, 7 and 8
(high pollution area). Principal component analysis (PCA) of the two datasets yield two factors for low pollution area
and three factors for the high pollution area at Eigenvalues >1, representing 92.544% and 100% of the total variance
in each heavy metals data sets and allowed to gather selected heavy metals based on the anthropogenic and lithologic
sources of contamination.
Statistical classification remains the most useful statistical tool for forensic chemists to assess the relationships between samples. Many clustering techniques such as principal component analysis and hierarchical cluster analysis have been employed to analyze chemical data for pattern recognition. Due to the feeble foundation of this statistics knowledge among novice drug chemists, a tetrahedron method was designed to simulate how advanced chemometrics operates. In this paper, the development of the graphical tetrahedron and computational matrices derived from the possible tetrahedrons are discussed. The tetrahedron method was applied to four selected parameters obtained from nine illicit heroin samples. Pattern analysis and mathematical computation of the differences in areas for assessing the dissimilarity between the nine tetrahedrons were found to be user-convenient and straightforward for novice cluster analysts.
Visible and near infrared spectroscopy is a non-destructive, green, and rapid technology that can be utilized to estimate the components of interest without conditioning it, as compared with classical analytical methods. The objective of this paper is to compare the performance of artificial neural network (ANN) (a nonlinear model) and principal component regression (PCR) (a linear model) based on visible and shortwave near infrared (VIS-SWNIR) (400-1000 nm) spectra in the non-destructive soluble solids content measurement of an apple. First, we used multiplicative scattering correction to pre-process the spectral data. Second, PCR was applied to estimate the optimal number of input variables. Third, the input variables with an optimal amount were used as the inputs of both multiple linear regression and ANN models. The initial weights and the number of hidden neurons were adjusted to optimize the performance of ANN. Findings suggest that the predictive performance of ANN with two hidden neurons outperforms that of PCR.
Combined Support Vector Machine (SVM) and Principal Component Analysis (PCA) was used to recognize the infant cries with asphyxia. SVM classifier based on features selected by the PCA was trained to differentiate between pathological and healthy cries. The PCA was applied to reduce dimensionality of the vectors that serve as inputs to the SVM. The performance of the SVM utilizing linear and RBF kernel was examined. Experimental results showed that SVM with RBF kernel yields good performance. The classification accuracy in classifying infant cry with asphyxia using the SVM-PCA is 95.86%.
Matched MeSH terms: Principal Component Analysis/methods*
An improved classification of Orthosiphon stamineus using a data fusion technique is presented. Five different commercial sources along with freshly prepared samples were discriminated using an electronic nose (e-nose) and an electronic tongue (e-tongue). Samples from the different commercial brands were evaluated by the e-tongue and then followed by the e-nose. Applying Principal Component Analysis (PCA) separately on the respective e-tongue and e-nose data, only five distinct groups were projected. However, by employing a low level data fusion technique, six distinct groupings were achieved. Hence, this technique can enhance the ability of PCA to analyze the complex samples of Orthosiphon stamineus. Linear Discriminant Analysis (LDA) was then used to further validate and classify the samples. It was found that the LDA performance was also improved when the responses from the e-nose and e-tongue were fused together.
Matched MeSH terms: Principal Component Analysis/methods
Color is one of the most prominent features of an image and used in many skin and face detection applications. Color space transformation is widely used by researchers to improve face and skin detection performance. Despite the substantial research efforts in this area, choosing a proper color space in terms of skin and face classification performance which can address issues like illumination variations, various camera characteristics and diversity in skin color tones has remained an open issue. This research proposes a new three-dimensional hybrid color space termed SKN by employing the Genetic Algorithm heuristic and Principal Component Analysis to find the optimal representation of human skin color in over seventeen existing color spaces. Genetic Algorithm heuristic is used to find the optimal color component combination setup in terms of skin detection accuracy while the Principal Component Analysis projects the optimal Genetic Algorithm solution to a less complex dimension. Pixel wise skin detection was used to evaluate the performance of the proposed color space. We have employed four classifiers including Random Forest, Naïve Bayes, Support Vector Machine and Multilayer Perceptron in order to generate the human skin color predictive model. The proposed color space was compared to some existing color spaces and shows superior results in terms of pixel-wise skin detection accuracy. Experimental results show that by using Random Forest classifier, the proposed SKN color space obtained an average F-score and True Positive Rate of 0.953 and False Positive Rate of 0.0482 which outperformed the existing color spaces in terms of pixel wise skin detection accuracy. The results also indicate that among the classifiers used in this study, Random Forest is the most suitable classifier for pixel wise skin detection applications.
Principal component analysis (PCA) is capable of handling large sets of data. However, lack of consistent method in data pre-treatment and its importance are the limitations in PCA applications. This study examined pre-treatments methods (log (x + 1) transformation, outlier removal, and granulometric and geochemical normalization) on dataset of Mengkabong Lagoon, Sabah, mangrove surface sediment at high and low tides. The study revealed that geochemical normalization using Al with outliers removal resulted in a better classification of the mangrove surface sediment than that outliers removal, granulometric normalization using clay and log (x + 1) transformation. PCA output using geochemical normalization with outliers removal demonstrated associations between environmental variables and tides of mangrove surface sediment, Mengkabong Lagoon, Sabah. The PCA outputs at high and low tides also provided to better interpret information about the sediment and its controlling factors in the intertidal zone. The study showed data pre-treatment method to be a useful procedure to standardize the datasets and reducing the influence of outliers.
Matched MeSH terms: Principal Component Analysis/methods*
In this dataset, we distinguish 15 accessions of Garcinia mangostana from Peninsular Malaysia using Fourier transform-infrared spectroscopy coupled with chemometric analysis. We found that the position and intensity of characteristic peaks at 3600-3100 cm(-) (1) in IR spectra allowed discrimination of G. mangostana from different locations. Further principal component analysis (PCA) of all the accessions suggests the two main clusters were formed: samples from Johor, Melaka, and Negeri Sembilan (South) were clustered together in one group while samples from Perak, Kedah, Penang, Selangor, Kelantan, and Terengganu (North and East Coast) were in another clustered group.
Parkinson's disease (PD) is a member of a larger group of neuromotor diseases marked by the progressive death of dopamineproducing cells in the brain. Providing computational tools for Parkinson disease using a set of data that contains medical information is very desirable for alleviating the symptoms that can help the amount of people who want to discover the risk of disease at an early stage. This paper proposes a new hybrid intelligent system for the prediction of PD progression using noise removal, clustering and prediction methods. Principal Component Analysis (PCA) and Expectation Maximization (EM) are respectively employed to address the multi-collinearity problems in the experimental datasets and clustering the data. We then apply Adaptive Neuro-Fuzzy Inference System (ANFIS) and Support Vector Regression (SVR) for prediction of PD progression. Experimental results on public Parkinson's datasets show that the proposed method remarkably improves the accuracy of prediction of PD progression. The hybrid intelligent system can assist medical practitioners in the healthcare practice for early detection of Parkinson disease.
Pyrolysis-gas chromatography-mass spectrometry (Py-GC-MS) has been recognised as an effective technique to analyse car paint. This study was conducted to assess the combination of Py-GC-MS and chemometric techniques to classify car paint primer, the inner layer of car paint system. Fifty car paint primer samples from various manufacturers were analysed using Py-GC-MS, and data set of identified pyrolysis products was subjected to principal component analysis (PCA) and discriminant analysis (DA). The PCA rendered 16 principal components with 86.33% of the total variance. The DA was useful to classify the car paint primer samples according to their types (1k and 2k primer) with 100% correct classification in the test set for all three modes (standard, stepwise forward and stepwise backward). Three compounds, indolizine, 1,3-benzenedicarbonitrile and p-terphenyl, were the most significant compounds in discriminating the car paint primer samples.
The issue of classifying objects into groups when measured variables in an experiment are mixed has attracted the attention of statisticians. The Smoothed Location Model (SLM) appears to be a popular classification method to handle data containing both continuous and binary variables simultaneously. However, SLM is infeasible for a large number of binary variables due to the occurrence of numerous empty cells. Therefore, this study aims to construct new SLMs by integrating SLM with two variable extraction techniques, Principal Component Analysis (PCA) and two types of Multiple Correspondence Analysis (MCA) in order to reduce the large number of mixed variables, primarily the binary ones. The performance of the newly constructed models, namely the SLM+PCA+Indicator MCA and SLM+PCA+Burt MCA are examined based on misclassification rate. Results from simulation studies for a sample size of n=60 show that the SLM+PCA+Indicator MCA model provides perfect classification when the sizes of binary variables (b) are 5 and 10. For b=20, the SLM+PCA+Indicator MCA model produces misclassification rates of 0.3833, 0.6667 and 0.3221 for n=60, n=120 and n=180, respectively. Meanwhile, the SLM+PCA+Burt MCA model provides a perfect classification when the sizes of the binary variables are 5, 10, 15 and 20 and yields a small misclassification rate as 0.0167 when b=25. Investigations into real dataset demonstrate that both of the newly constructed models yield low misclassification rates with 0.3066 and 0.2336 respectively, in which the SLM+PCA+Burt MCA model performed the best among all the classification methods compared. The findings reveal that the two new models of SLM integrated with two variable extraction techniques can be good alternative methods for classification purposes in handling mixed variable problems, mainly when dealing with large binary variables.
In this dataset, we differentiate four different tissues of Cosmos caudatus Kunth (leaves, flowers, stem and root) obtained from UKM Bangi plot, based on Fourier transform-infrared spectroscopy. Different tissues of C. caudatus demonstrated the position and intensity of characteristic peaks at 4000-450 cm-1. Principal component analysis (PCA) shows three main groups were formed. The samples from leaves and flowers were found to be clustered together in one group, while the samples from stems and roots were clustered into two separate groups, respectively. This data provides an insight into the fingerprint identification and distribution of metabolites in the different organs of this species.
The study is conducted to evaluate the significance of solar irradiance, ambient temperature and relative humidity as predictors and to quantify the relative contribution of these ambient parameters as predictors for photovoltaic module temperature model. The module temperature model was developed from experimental data of mono-crystalline and poly-crystalline PV modules retrofitted on metal roof in Klang Valley. The model was developed and analyzed using Multiple Linear Regressions (MLR) and Principle Component Analysis (PCA) Techniques. Solar irradiance, ambient temperature and relative humidity have been proven to be the significant predictors for module temperature. For poly-crystalline PV module, the relative contribution of solar irradiance, ambient temperature and relative humidity are 64.28 %, 17.45 % and 12.64 % respectively. For mono-crystalline PV module, the relative contribution of solar irradiance, ambient temperature and relative humidity are 66.12 %, 17.46 % and 12.48 % respectively. Thus, there is no significant difference in terms of relative contribution of these ambient parameters towards photovoltaic module temperature between poly-crystalline and mono-crystalline PV module technologies.
Spectral data is often required to be pre-processed prior to applying a multivariate modelling technique. Baseline correction of spectral data is one of the most important and frequently applied pre-processing procedures. This preliminary study aims to investigate the impacts of six types of baseline correction algorithms on classifying 150 infrared spectral data of three varieties of paper. The algorithms investigated were Iterative Restricted Least Squares, Asymmetric Least Squares (ALS), Low-pass FFT Filter, Median Window (MW), Fill Peaks and Modified Polynomial Fitting. Processed spectral data were then analysed using Principal Component Analysis (PCA) to visually examine the clustering among the three varieties of paper. Results show that separation among the three varieties of paper is greatly improved after baseline correction via ALS, FP and MW algorithms.
The first carpometacarpal (CMC) joint, located at the base of the thumb and formed by the junction between the first metacarpal and trapezium, is a common site for osteoarthritis of the hand. The shape of both the first metacarpal and trapezium contributes to the intrinsic bony stability of the joint, and variability in the morphology of both these bones can affect the joint's function. The objectives of this study were to quantify the morphological variation in the complete metacarpal and trapezium and determine any correlation between anatomical features of these two components of the first CMC joint. A multi-object statistical shape modelling pipeline, consisting of scaling, hierarchical rigid registration, non-rigid registration and projection pursuit principal component analysis, was implemented. Four anatomical measures were quantified from the shape model, namely the first metacarpal articular tilt and torsion angles and the trapezium length and width. Variations in the first metacarpal articular tilt angle (- 6.3°
Water pollution has become a growing threat to human society and natural ecosystem in recent decades, increasing the need to better understand the variabilities of pollutants within aquatic systems. This study presents the application of two chemometric techniques, namely, cluster analysis (CA) and principal component analysis (PCA). This is to classify and identify the water quality variables into groups of similarities or dissimilarities and to determine their significance. Six stations along Kinta River, Perak, were monitored for 30 physical and chemical parameters during the period of 1997-2006. Using CA, the 30 physical and chemical parameters were classified into 4 clusters; PCA was applied to the datasets and resulted in 10 varifactors with a total variance of 78.06%. The varifactors obtained indicated the significance of each of the variables to the pollution of Kinta River.
The purpose of this study is to determine the concentration of the selected elemental composition in a multi-storey hostel. Dust samples were taken from three random rooms at each level of the student hostel by sweeping the floor. The concentrations of elements (Cd, Cu, Fe, Pb and Zn) were determined by using Inductively Coupled Plasma-Optical Emission Spectrometer (ICPOES) after digestion with nitric acid and sulfuric acid solutions. Dust samples analysis has shown the different levels of sampling point does not affect the concentration of the elements. The concentration of elements in investigated microenvironment was in the order of Fe > Zn > Cu > Pb > Cd. The correlation analysis was applied to elements variable in order to identify the sources of an airborne contaminant. It was discovered the strong positive correlation between Cu-Zn which indicates the sources come from traffic emission and street dust. This result was supported by the Principal Component Analysis (PCA) that revealed the presence of elements in the student hostel originated from the outdoor sources.