Robust statistical tools were applied on the water quality datasets with the aim of determining the most significance parameters and their contribution towards temporal water quality variation. Surface water samples were collected from four different sampling points during dry and wet seasons and analyzed for their physicochemical constituents. Discriminant analysis (DA) provided better results with great discriminatory ability by using five parameters with (P < 0.05) for dry season affording more than 96% correct assignation and used five and six parameters for forward and backward stepwise in wet season data with P-value (P < 0.05) affording 68.20% and 82%, respectively. Partial correlation results revealed that there are strong (r(p) = 0.829) and moderate (r(p) = 0.614) relationships between five-day biochemical oxygen demand (BOD(5)) and chemical oxygen demand (COD), total solids (TS) and dissolved solids (DS) controlling for the linear effect of nitrogen in the form of ammonia (NH(3)) and conductivity for dry and wet seasons, respectively. Multiple linear regression identified the contribution of each variable with significant values r = 0.988, R(2) = 0.976 and r = 0.970, R(2) = 0.942 (P < 0.05) for dry and wet seasons, respectively. Repeated measure t-test confirmed that the surface water quality varies significantly between the seasons with significant value P < 0.05.
The study was aimed to determine the variation in taxonomic diversity of Polynemus paradiseus based on morphometric and meristic analyses of samples collected from three coastal rivers of Bangladesh (Payra, Tentulia and Kirtonkhola). A total of 105 individuals ranging at 10-20 cm in total length (TL) and 7.91-60.64 g in body weight (BW) were sampled using Been nets and Kachal and Veshal nets. Significant differences were observed in 24 out of 25 morphometric measurements and 6 out of 10 meristic counts among the populations. In morphometric measurements, the first discriminant function (DF1) was accounted for 78.6% and the second discriminant function (DF2) was accounted for 21.4% of among groups variability, explaining 100% of total among group variability. A dendrogram based on morphometric data showed that the Tentulia and Kirtankhola populations showed high degree of overlapping and these two populations were highly different from Payra river population. The canonical graph also showed that the populations of Tentulia and Kirtankhola rivers were more closely related comparing with Payra river population for isometric condition. These findings may provide useful information for the conservation and sustainable management of this important fish.
Ariid catfishes, belong to family Ariidae is considered as one of the taxonomically problematic groups, which is still under review by fish taxonomist globally. Species level identification of some ariids often resulted in species misidentification because of their complex characters and very similar morphological characters within genera. A vigilant and detail observation is very important during the species level identification of ariid species. In these contexts, this study was carried out in order to determine the morphological variations of one of the ariid genera, Plicofollis, which have been giving misleading taxonomic information in the south-east Asian countries. A Truss network technique was used throughout the study period. The study was conducted based on 20 truss measurements using 22 to 23 specimens per species, namely P. argyropleuron, P. nella and P. tenuispinis found in Peninsular Malaysian waters. Morphological variations were determined using a multivariate technique of discriminant function analysis (DFA). The results obtained in this study showed that discriminant analysis using truss network measurements has produced very clear separations of all the species in Plicofollis group. Several important morphological characters have been identified, which represent body depth and caudal regions of the fish. The documentary evidences of these variables could be considered as the constructive functional features, which could enable us to assess more accurately to distinguish the species within this complex Ariidae family.
Wild stocks of endangered mrigal carp, Cirrhinus cirrhosus (Bloch 1795), continues to decline rapidly in the Indo-Ganges river basin. With an objective to evaluate its population status, landmark-based morphometric and meristic variations among three different stocks viz., hatchery (Jessore), baor (Gopalganj) and river (Faridpur) in Bangladesh were studied. Significant differences were observed in 10 of the 15 morphometric measurements viz., head length, standard length, fork length, length of base of spinous, pre-orbital length, eye length, post-orbital length, length of upper jaw, height of pelvic fin and barbel length, two of the 8 meristic counts viz., scales above the lateral line and pectoral fin rays and 10 of the 22 truss network measurements viz., 1 to 10, 2 to 3, 2 to 8, 2 to 9, 2 to 10, 3 to 4, 3 to 8, 4 to 5, 4 to 7 and 9 to 10 among the stocks. For morphometric and landmark measurements, the 1st discriminant function (DF) accounted for 58.1% and the 2nd DF accounted for 41.9% of the among-group variability. In discriminant space, the river stock was isolated from the other two stocks. On the other hand, baor and hatchery stocks formed a very compact cluster. A dendrogram based on the hierarchical cluster analysis using morphometric and truss distance data placed the hatchery and baor in one cluster and the river in another cluster and the distance between the river and hatchery populations was the highest. Morphological differences among stocks are expected, because of their geographical isolation and their origin from different ancestors. The baseline information derived from the present study would be useful for genetic studies and in the assessment of environmental impacts on C. cirrhosus populations in Bangladesh.
The approach of this paper is to predict the sand mass distribution in an urban stormwater holding pond at the Stormwater Management And Road Tunnel (SMART) Control Centre, Malaysia, using simulated depth average floodwater velocity diverted into the holding during storm events. Discriminant analysis (DA) was applied to derive the classification function to spatially distinguish areas of relatively high and low sand mass compositions based on the simulated water velocity variations at corresponding locations of gravimetrically measured sand mass composition of surface sediment samples. Three inflow parameter values, 16, 40 and 80 m(3) s(-1), representing diverted floodwater discharge for three storm event conditions were fixed as input parameters of the hydrodynamic model. The sand (grain size > 0.063 mm) mass composition of the surface sediment measured at 29 sampling locations ranges from 3.7 to 45.5%. The sampling locations of the surface sediment were spatially clustered into two groups based on the sand mass composition. The sand mass composition of group 1 is relatively lower (3.69 to 12.20%) compared to group 2 (16.90 to 45.55%). Two Fisher's linear discriminant functions, F 1 and F 2, were generated to predict areas; both consist of relatively higher and lower sand mass compositions based on the relationship between the simulated flow velocity and the measured surface sand composition at corresponding sampling locations. F 1 = -9.405 + 4232.119 × A - 1795.805 × B + 281.224 × C, and F 2 = -2.842 + 2725.137 × A - 1307.688 × B + 231.353 × C. A, B and C represent the simulated flow velocity generated by inflow parameter values of 16, 40 and 80 m(3) s(-1), respectively. The model correctly predicts 88.9 and 100.0% of sampling locations consisting of relatively high and low sand mass percentages, respectively, with the cross-validated classification showing that, overall, 82.8% are correctly classified. The model predicts that 31.4% of the model domain areas consist of high-sand mass composition areas and the remaining 68.6% comprise low-sand mass composition areas.
Pyrolysis-gas chromatography-mass spectrometry (Py-GC-MS) has been recognised as an effective technique to analyse car paint. This study was conducted to assess the combination of Py-GC-MS and chemometric techniques to classify car paint primer, the inner layer of car paint system. Fifty car paint primer samples from various manufacturers were analysed using Py-GC-MS, and data set of identified pyrolysis products was subjected to principal component analysis (PCA) and discriminant analysis (DA). The PCA rendered 16 principal components with 86.33% of the total variance. The DA was useful to classify the car paint primer samples according to their types (1k and 2k primer) with 100% correct classification in the test set for all three modes (standard, stepwise forward and stepwise backward). Three compounds, indolizine, 1,3-benzenedicarbonitrile and p-terphenyl, were the most significant compounds in discriminating the car paint primer samples.
Physical properties of ripe banana flour were studied in Cavendish and Dream banana, in order to distinguish the two varieties. Flour was analyzed for pH, total soluble solids (TSS), water holding capacity
(WHC) and oil holding capacity (OHC) at 40, 60 and 80 °C, color values L*, a* and b*, back extrusion force
and viscosity. Physical properties data were analyzed by cluster analysis (CA) and discriminant analysis (DA). CA showed that the two types of flour were different in terms of selected physical properties. DA indicated that WHC at 60 °C was the main contributor in discriminating the two types of flour.
Fish vocalisation is often a major component of underwater soundscapes. Therefore, interpretation of these soundscapes requires an understanding of the vocalisation characteristics of common soniferous fish species. This study of captive female bluefin gurnard, Chelidonichthys kumu, aims to formally characterise their vocalisation sounds and daily pattern of sound production. Four types of sound were produced and characterised, twice as many as previously reported in this species. These sounds fit two aural categories; grunt and growl, the mean peak frequencies for which ranged between 129 to 215 Hz. This species vocalized throughout the 24 hour period at an average rate of (18.5 ± 2.0 sounds fish-1 h-1) with an increase in vocalization rate at dawn and dusk. Competitive feeding did not elevate vocalisation as has been found in other gurnard species. Bluefin gurnard are common in coastal waters of New Zealand, Australia and Japan and, given their vocalization rate, are likely to be significant contributors to ambient underwater soundscape in these areas.
Red fruit (Pandanus conoideus Lam) is endemic plant of Papua, Indonesia and Papua New Guinea. The price of its oil (red fruit oil, RFO) is 10-15 times higher than that of common vegetable oils; consequently, RFO is subjected to adulteration with lower price oils. Among common vegetable oils, canola oil (CaO) and rice bran oil (RBO) have similar fatty acid profiles to RFO as indicated by the score plot of principal component analysis; therefore, CaO and RBO are potential adulterants in RFO.
The task of identifying firearms from forensic ballistics specimens is exacting in crime investigation since the last two decades. Every firearm, regardless of its size, make and model, has its own unique 'fingerprint'. These fingerprints transfer when a firearm is fired to the fired bullet and cartridge case. The components that are involved in producing these unique characteristics are the firing chamber, breech face, firing pin, ejector, extractor and the rifling of the barrel. These unique characteristics are the critical features in identifying firearms. It allows investigators to decide on which particular firearm that has fired the bullet. Traditionally the comparison of ballistic evidence has been a tedious and time-consuming process requiring highly skilled examiners. Therefore, the main objective of this study is the extraction and identification of suitable features from firing pin impression of cartridge case images for firearm recognition. Some previous studies have shown that firing pin impression of cartridge case is one of the most important characteristics used for identifying an individual firearm. In this study, data are gathered using 747 cartridge case images captured from five different pistols of type 9mm Parabellum Vektor SP1, made in South Africa. All the images of the cartridge cases are then segmented into three regions, forming three different set of images, i.e. firing pin impression image, centre of firing pin impression image and ring of firing pin impression image. Then geometric moments up to the sixth order were generated from each part of the images to form a set of numerical features. These 48 features were found to be significantly different using the MANOVA test. This high dimension of features is then reduced into only 11 significant features using correlation analysis. Classification results using cross-validation under discriminant analysis show that 96.7% of the images were classified correctly. These results demonstrate the value of geometric moments technique for producing a set of numerical features, based on which the identification of firearms are made.
Tongkat Ali (Eurycoma longifolia) is one of the most popular tropical herbal plants as it is believed to enhance virility and sexual prowess. This study looked examined chromatographic fingerprint of Tongkat Ali roots and its products generated using online solid phase-extraction liquid chromatography (SPE-LC) combined with chemometric approaches. The aim was to determine its quality. Pressurised liquid extraction (PLE) technique was used prior to online SPE-LC using polystyrene divinyl benzene (PSDVB) and C18 columns. Seventeen Tongkat Ali roots and 10 products (capsules) were analysed. Chromatographic dataset was subjected to chemometric techniques, namely cluster analysis (CA), discriminant analysis (DA) and principal component analysis (PCA) using 37 selected peaks. The samples were grouped into three clusters based on their quality. The PCA resulted in 11 latent factors describing 90.8% of the whole variance. Pattern matching analysis showed no significant difference (p>0.05) between the roots and products within the same CA grouping. The findings showed the combination of chromatographic fingerprint and chemometric techniques provided comprehensive evaluation for efficient quality control of Tongkat Ali formulation.
Partial least squares-discriminant analysis (PLS-DA) is a versatile algorithm that can be used for predictive and descriptive modelling as well as for discriminative variable selection. However, versatility is both a blessing and a curse and the user needs to optimize a wealth of parameters before reaching reliable and valid outcomes. Over the past two decades, PLS-DA has demonstrated great success in modelling high-dimensional datasets for diverse purposes, e.g. product authentication in food analysis, diseases classification in medical diagnosis, and evidence analysis in forensic science. Despite that, in practice, many users have yet to grasp the essence of constructing a valid and reliable PLS-DA model. As the technology progresses, across every discipline, datasets are evolving into a more complex form, i.e. multi-class, imbalanced and colossal. Indeed, the community is welcoming a new era called big data. In this context, the aim of the article is two-fold: (a) to review, outline and describe the contemporary PLS-DA modelling practice strategies, and (b) to critically discuss the respective knowledge gaps that have emerged in response to the present big data era. This work could complement other available reviews or tutorials on PLS-DA, to provide a timely and user-friendly guide to researchers, especially those working in applied research.
This paper outlines the application of chemometrics and pattern recognition tools to classify palm oil using Fourier Transform Mid Infrared spectroscopy (FT-MIR). FT-MIR spectroscopy is used as an effective analytical tool in order to categorise the oil into the category of unused palm oil and used palm oil for frying. The samples used in this study consist of 28 types of pure palm oil, and 28 types of frying palm oils. FT-MIR spectral was obtained in absorbance mode at the spectral range from 650 cm -1 to 4000 cm -1 using FT-MIR-ATR sample handling. The aim of this work is to develop fast method in discriminating the palm oils by implementing Partial Least Square Discriminant Analysis (PLS-DA), Learning Vector Quantisation (LVQ) and Support Vector Machine (SVM). Raw FT-MIR spectra were subjected to Savitzky-Golay smoothing and standardized before developing the classification models. The classification model was validated through finding the value of percentage correctly classified by test set for every model in order to show which classifier provided the best classification. In order to improve the performance of the classification model, variable selection method known as t-statistic method was applied. The significant variable in developing classification model was selected through this method. The result revealed that PLSDA classifier of the standardized data with application of t-statistic showed the best performance with highest percentage correctly classified among the classifiers.
BACKGROUND: This study was carried out to evaluate the accuracy of sex estimation by discriminant analysis and stepwise discriminant analysis equations generated from metatarsal bones in a Thai population.
MATERIAL AND METHODS: The testing samples utilized in this study consisted of 50 skeletons (25 males and 25 females) obtained from the Khon Kaen University Skeletal Collection, Department of Anatomy, Faculty of Medicine, Khon Kaen University. Seven measurements of metatarsal bones were measured in centimeters, using either a mini-osteometric board (MOB) or a sliding caliper. The values measured from the Khon Kaen Skeletal Collection were used to determine the accuracy and applicability of sex determination, as predicted by Y1-Y6 equations which were generated from a Chiang Mai Skeletal Collection.
RESULTS: The percentage of sex determination accuracies predicted from the Y1-Y6 equations demonstrated accuracy rates of 80-95.6.
CONCLUSIONS: The Chiang Mai sex determination equations, generated from metatarsal bones by discriminant analysis (Y1-Y3) and stepwise discriminant analysis (Y4-Y6), demonstrated high accuracy rates of prediction, suggesting that these equations may be useful for sex determination within the Thai population.
KEYWORDS: Foot; Metatarsal bones; Sex determination; Thailand
In response to our review paper [L. C. Lee et al., Analyst, 2018, 143, 3526-3539], we present a study that compares empirical differences between PLS1-DA and PLS2-DA algorithms in modelling a colossal ATR-FTIR spectral dataset. Over the past two decades, partial least squares-discriminant analysis (PLS-DA) has gained wide acceptance and huge popularity in the field of applied research, partly due to its dimensionality reduction capability and ability to handle multicollinear and correlated variables. To solve a K-class problem (K > 2) using PLS-DA and high-dimensional data like infrared spectra, one can construct either K one-versus-all PLS1-DA models or only one PLS2-DA model. The aim of this work is to explore empirical differences between the two PLS-DA algorithms in modeling a colossal ATR-FTIR spectral dataset. The practical task is to build a prediction model using the imbalanced, high dimensional, colossal and multi-class ATR-FTIR spectra of blue gel pen inks. Four different sub-datasets were prepared from the principal dataset by considering the raw and asymmetric least squares (AsLS) preprocessed forms: (a) Raw-global region; (b) Raw-local region; (c) AsLS-global region; and (d) AsLS-local region. A series of 50 models which includes the first 50 PLS components incrementally was constructed repeatedly using the four sub-datasets. Each model was evaluated using six different variants of v-fold cross validation, autoprediction and external testing methods. As a result, each PLS-DA algorithm was represented by a number of figures of merit. The differences between PLS1-DA and PLS2-DA algorithms were assessed using hypothesis tests with respect to model accuracy, stability and fitting. On the other hand, confusion matrices of the two PLS-DA algorithms were inspected carefully for assessment of model parsimony. Overall, both the algorithms presented satisfactory model accuracy and stability. Nonetheless, PLS1-DA models showed significantly higher accuracy rates than PLS2-DA models, whereas PLS2-DA models seem to be much more stable compared to PLS1-DA models. Eventually, PLS2-DA also proved to be less prone to overfitting and is more parsimonious than PLS1-DA. In conclusion, the relatively high accuracy of the PLS1-DA algorithm is achieved at the cost of rather low parsimony and stability, and with an increased risk of overfitting.
Examination of the brain's condition with the Electroencephalogram (EEG) can be helpful to predict abnormality and cerebral activities. The purpose of this study was to develop an Automated Diagnostic Tool (ADT) to investigate and classify the EEG signal patterns into normal and schizophrenia classes. The ADT implements a sequence of events, such as EEG series splitting, non-linear features mining, t-test assisted feature selection, classification and validation. The proposed ADT is employed to evaluate a 19-channel EEG signal collected from normal and schizophrenia class volunteers. A dataset was created by splitting the raw 19-channel EEG into a sequence of 6250 sample points, which was helpful to produce 1142 features of normal and schizophrenia class patterns. Non-linear feature extraction was then implemented to mine 157 features from each EEG pattern, from which 14 of the principal features were identified based on significance. Finally, a signal classification practice with Decision-Tree (DT), Linear-Discriminant analysis (LD), k-Nearest-Neighbour (KNN), Probabilistic-Neural-Network (PNN), and Support-Vector-Machine (SVM) with various kernels was implemented. The experimental outcome showed that the SVM with Radial-Basis-Function (SVM-RBF) offered a superior average performance value of 92.91% on the considered EEG dataset, as compared to other classifiers implemented in this work.
The prediction models of MWQI in mangrove and estuarine zones were constructed. The 2011-2015 data employed in this study entailed 13 parameters from six monitoring stations in West Malaysia. Spatial discriminant analysis (SDA) had recommended seven significant parameters to develop the MWQI which were DO, TSS, O&G, PO4, Cd, Cr and Zn. These selected parameters were then used to develop prediction models for the MWQI using artificial neural network (ANN) and multiple linear regressions (MLR). The SDA-ANN model had higher R2 value for training (0.9044) and validation (0.7113) results than SDA-MLR model and was chosen as the best model in mangrove estuarine zone. The SDA-ANN model had also demonstrated lower RMSE (5.224) than the SDA-MLR (12.7755). In summary, this work suggested that ANN was an effective tool to compute the MWQ in mangrove estuarine zone and a powerful alternative prediction model as compared to the other modelling methods.
Currently, the authentication of virgin coconut oil (VCO) has become very important due to the possible adulteration of VCO with cheaper plant oils such as corn (CO) and sunflower (SFO) oils. Methods involving Fourier transform mid infrared (FT-MIR) spectroscopy combined with chemometrics techniques (partial least square (PLS) and discriminant analysis (DA)) were developed for quantification and classification of CO and SFO in VCO. MIR spectra of oil samples were recorded at frequency regions of 4000-650cm-1 on horizontal attenuated total reflectance (HATR) attachment of FTIR. DA can successfully classify VCO and that adulterated with CO and SFO using 10 principal components. Furthermore, PLS model correlates the actual and FTIR estimated values of oil adulterants (CO and SFO) with coefficient of determination (R2) of 0.999.
Age-related Macular Degeneration (AMD) affects the central vision of aged people. It can be diagnosed due to the presence of drusen, Geographic Atrophy (GA) and Choroidal Neovascularization (CNV) in the fundus images. It is labor intensive and time-consuming for the ophthalmologists to screen these images. An automated digital fundus photography based screening system can overcome these drawbacks. Such a safe, non-contact and cost-effective platform can be used as a screening system for dry AMD. In this paper, we are proposing a novel algorithm using Radon Transform (RT), Discrete Wavelet Transform (DWT) coupled with Locality Sensitive Discriminant Analysis (LSDA) for automated diagnosis of AMD. First the image is subjected to RT followed by DWT. The extracted features are subjected to dimension reduction using LSDA and ranked using t-test. The performance of various supervised classifiers namely Decision Tree (DT), Support Vector Machine (SVM), Probabilistic Neural Network (PNN) and k-Nearest Neighbor (k-NN) are compared to automatically discriminate to normal and AMD classes using ranked LSDA components. The proposed approach is evaluated using private and public datasets such as ARIA and STARE. The highest classification accuracy of 99.49%, 96.89% and 100% are reported for private, ARIA and STARE datasets. Also, AMD index is devised using two LSDA components to distinguish two classes accurately. Hence, this proposed system can be extended for mass AMD screening.
This study aims to test the translated Hausa version of the stroke impact scale SIS (3.0) and further evaluate its psychometric properties. The SIS 3.0 was translated from English into Hausa and was tested for its reliability and validity on a stratified random sample adult stroke survivors attending rehabilitation services at stroke referral hospitals in Kano, Nigeria. Psychometric analysis of the Hausa-SIS 3.0 involved face, content, criterion, and construct validity tests as well as internal and test-retest reliability. In reliability analyses, the Cronbach's alpha values for the items in Strength, Hand function, Mobility, ADL/IADL, Memory and thinking, Communication, Emotion, and Social participation domains were 0.80, 0.92, 0.90, 0.78, 0.84, 0.89, 0.58, and 0.74, respectively. There are 8 domains in stroke impact scale 3.0 in confirmatory factory analysis; some of the items in the Hausa-SIS questionnaire have to be dropped due to lack of discriminate validity. In the final analysis, a parsimonious model was obtained with two items per construct for the 8 constructs (Chi-square/df < 3, TLI and CFI > 0.9, and RMSEA < 0.08). Cross validation with 1000 bootstrap samples gave a satisfactory result (P = 0.011). In conclusion, the shorter 16-item Hausa-SIS seems to measure adequately the QOL outcomes in the 8 domains.