Robust statistical tools were applied on the water quality datasets with the aim of determining the most significance parameters and their contribution towards temporal water quality variation. Surface water samples were collected from four different sampling points during dry and wet seasons and analyzed for their physicochemical constituents. Discriminant analysis (DA) provided better results with great discriminatory ability by using five parameters with (P < 0.05) for dry season affording more than 96% correct assignation and used five and six parameters for forward and backward stepwise in wet season data with P-value (P < 0.05) affording 68.20% and 82%, respectively. Partial correlation results revealed that there are strong (r(p) = 0.829) and moderate (r(p) = 0.614) relationships between five-day biochemical oxygen demand (BOD(5)) and chemical oxygen demand (COD), total solids (TS) and dissolved solids (DS) controlling for the linear effect of nitrogen in the form of ammonia (NH(3)) and conductivity for dry and wet seasons, respectively. Multiple linear regression identified the contribution of each variable with significant values r = 0.988, R(2) = 0.976 and r = 0.970, R(2) = 0.942 (P < 0.05) for dry and wet seasons, respectively. Repeated measure t-test confirmed that the surface water quality varies significantly between the seasons with significant value P < 0.05.
Ariid catfishes, belong to family Ariidae is considered as one of the taxonomically problematic groups, which is still under review by fish taxonomist globally. Species level identification of some ariids often resulted in species misidentification because of their complex characters and very similar morphological characters within genera. A vigilant and detail observation is very important during the species level identification of ariid species. In these contexts, this study was carried out in order to determine the morphological variations of one of the ariid genera, Plicofollis, which have been giving misleading taxonomic information in the south-east Asian countries. A Truss network technique was used throughout the study period. The study was conducted based on 20 truss measurements using 22 to 23 specimens per species, namely P. argyropleuron, P. nella and P. tenuispinis found in Peninsular Malaysian waters. Morphological variations were determined using a multivariate technique of discriminant function analysis (DFA). The results obtained in this study showed that discriminant analysis using truss network measurements has produced very clear separations of all the species in Plicofollis group. Several important morphological characters have been identified, which represent body depth and caudal regions of the fish. The documentary evidences of these variables could be considered as the constructive functional features, which could enable us to assess more accurately to distinguish the species within this complex Ariidae family.
Wild stocks of endangered mrigal carp, Cirrhinus cirrhosus (Bloch 1795), continues to decline rapidly in the Indo-Ganges river basin. With an objective to evaluate its population status, landmark-based morphometric and meristic variations among three different stocks viz., hatchery (Jessore), baor (Gopalganj) and river (Faridpur) in Bangladesh were studied. Significant differences were observed in 10 of the 15 morphometric measurements viz., head length, standard length, fork length, length of base of spinous, pre-orbital length, eye length, post-orbital length, length of upper jaw, height of pelvic fin and barbel length, two of the 8 meristic counts viz., scales above the lateral line and pectoral fin rays and 10 of the 22 truss network measurements viz., 1 to 10, 2 to 3, 2 to 8, 2 to 9, 2 to 10, 3 to 4, 3 to 8, 4 to 5, 4 to 7 and 9 to 10 among the stocks. For morphometric and landmark measurements, the 1st discriminant function (DF) accounted for 58.1% and the 2nd DF accounted for 41.9% of the among-group variability. In discriminant space, the river stock was isolated from the other two stocks. On the other hand, baor and hatchery stocks formed a very compact cluster. A dendrogram based on the hierarchical cluster analysis using morphometric and truss distance data placed the hatchery and baor in one cluster and the river in another cluster and the distance between the river and hatchery populations was the highest. Morphological differences among stocks are expected, because of their geographical isolation and their origin from different ancestors. The baseline information derived from the present study would be useful for genetic studies and in the assessment of environmental impacts on C. cirrhosus populations in Bangladesh.
The study was aimed to determine the variation in taxonomic diversity of Polynemus paradiseus based on morphometric and meristic analyses of samples collected from three coastal rivers of Bangladesh (Payra, Tentulia and Kirtonkhola). A total of 105 individuals ranging at 10-20 cm in total length (TL) and 7.91-60.64 g in body weight (BW) were sampled using Been nets and Kachal and Veshal nets. Significant differences were observed in 24 out of 25 morphometric measurements and 6 out of 10 meristic counts among the populations. In morphometric measurements, the first discriminant function (DF1) was accounted for 78.6% and the second discriminant function (DF2) was accounted for 21.4% of among groups variability, explaining 100% of total among group variability. A dendrogram based on morphometric data showed that the Tentulia and Kirtankhola populations showed high degree of overlapping and these two populations were highly different from Payra river population. The canonical graph also showed that the populations of Tentulia and Kirtankhola rivers were more closely related comparing with Payra river population for isometric condition. These findings may provide useful information for the conservation and sustainable management of this important fish.
The internet of reality or augmented reality has been considered a breakthrough and an outstanding critical mutation with an emphasis on data mining leading to dismantling of some of its assumptions among several of its stakeholders. In this work, we study the pillars of these technologies connected to web usage as the Internet of things (IoT) system's healthcare infrastructure. We used several data mining techniques to evaluate the online advertisement data set, which can be categorized as high dimensional with 1,553 attributes, and the imbalanced data set, which automatically simulates an IoT discrimination problem. The proposed methodology applies Fischer linear discrimination analysis (FLDA) and quadratic discrimination analysis (QDA) within random projection (RP) filters to compare our runtime and accuracy with support vector machine (SVM), K-nearest neighbor (KNN), and Multilayer perceptron (MLP) in IoT-based systems. Finally, the impact on number of projections was practically experimented, and the sensitivity of both FLDA and QDA with regard to precision and runtime was found to be challenging. The modeling results show not only improved accuracy, but also runtime improvements. When compared with SVM, KNN, and MLP in QDA and FLDA, runtime shortens by 20 times in our chosen data set simulated for a healthcare framework. The RP filtering in the preprocessing stage of the attribute selection, fulfilling the model's runtime, is a standpoint in the IoT industry. Index Terms: Data Mining, Random Projection, Fischer Linear Discriminant Analysis, Online Advertisement Dataset, Quadratic Discriminant Analysis, Feature Selection, Internet of Things.
The approach of this paper is to predict the sand mass distribution in an urban stormwater holding pond at the Stormwater Management And Road Tunnel (SMART) Control Centre, Malaysia, using simulated depth average floodwater velocity diverted into the holding during storm events. Discriminant analysis (DA) was applied to derive the classification function to spatially distinguish areas of relatively high and low sand mass compositions based on the simulated water velocity variations at corresponding locations of gravimetrically measured sand mass composition of surface sediment samples. Three inflow parameter values, 16, 40 and 80 m(3) s(-1), representing diverted floodwater discharge for three storm event conditions were fixed as input parameters of the hydrodynamic model. The sand (grain size > 0.063 mm) mass composition of the surface sediment measured at 29 sampling locations ranges from 3.7 to 45.5%. The sampling locations of the surface sediment were spatially clustered into two groups based on the sand mass composition. The sand mass composition of group 1 is relatively lower (3.69 to 12.20%) compared to group 2 (16.90 to 45.55%). Two Fisher's linear discriminant functions, F 1 and F 2, were generated to predict areas; both consist of relatively higher and lower sand mass compositions based on the relationship between the simulated flow velocity and the measured surface sand composition at corresponding sampling locations. F 1 = -9.405 + 4232.119 × A - 1795.805 × B + 281.224 × C, and F 2 = -2.842 + 2725.137 × A - 1307.688 × B + 231.353 × C. A, B and C represent the simulated flow velocity generated by inflow parameter values of 16, 40 and 80 m(3) s(-1), respectively. The model correctly predicts 88.9 and 100.0% of sampling locations consisting of relatively high and low sand mass percentages, respectively, with the cross-validated classification showing that, overall, 82.8% are correctly classified. The model predicts that 31.4% of the model domain areas consist of high-sand mass composition areas and the remaining 68.6% comprise low-sand mass composition areas.
Physical properties of ripe banana flour were studied in Cavendish and Dream banana, in order to distinguish the two varieties. Flour was analyzed for pH, total soluble solids (TSS), water holding capacity
(WHC) and oil holding capacity (OHC) at 40, 60 and 80 °C, color values L*, a* and b*, back extrusion force
and viscosity. Physical properties data were analyzed by cluster analysis (CA) and discriminant analysis (DA). CA showed that the two types of flour were different in terms of selected physical properties. DA indicated that WHC at 60 °C was the main contributor in discriminating the two types of flour.
Pyrolysis-gas chromatography-mass spectrometry (Py-GC-MS) has been recognised as an effective technique to analyse car paint. This study was conducted to assess the combination of Py-GC-MS and chemometric techniques to classify car paint primer, the inner layer of car paint system. Fifty car paint primer samples from various manufacturers were analysed using Py-GC-MS, and data set of identified pyrolysis products was subjected to principal component analysis (PCA) and discriminant analysis (DA). The PCA rendered 16 principal components with 86.33% of the total variance. The DA was useful to classify the car paint primer samples according to their types (1k and 2k primer) with 100% correct classification in the test set for all three modes (standard, stepwise forward and stepwise backward). Three compounds, indolizine, 1,3-benzenedicarbonitrile and p-terphenyl, were the most significant compounds in discriminating the car paint primer samples.
Partial least squares discriminant analysis (PLS-DA) is a well-known technique for feature extraction and discriminant analysis in chemometrics. Despite its popularity, it has been observed that PLS-DA does not automatically lead to extraction of relevant features. Feature learning and extraction depends on how well the discriminant subspace is captured. In this paper, discriminant subspace learning of chemical data is discussed from the perspective of PLS-DA and a recent extension of PLS-DA, which is known as the locality preserving partial least squares discriminant analysis (LPPLS-DA). The objective is twofold: (a) to introduce the LPPLS-DA algorithm to the chemometrics community and (b) to demonstrate the superior discrimination capabilities of LPPLS-DA and how it can be a powerful alternative to PLS-DA. Four chemical data sets are used: three spectroscopic data sets and one that contains compositional data. Comparative performances are measured based on discrimination and classification of these data sets. To compare the classification performances, the data samples are projected onto the PLS-DA and LPPLS-DA subspaces, and classification of the projected samples into one of the different groups (classes) is done using the nearest-neighbor classifier. We also compare the two techniques in data visualization (discrimination) task. The ability of LPPLS-DA to group samples from the same class while at the same time maximizing the between-class separation is clearly shown in our results. In comparison with PLS-DA, separation of data in the projected LPPLS-DA subspace is more well defined.
The genus Rasbora is one of the most species-rich genus among the freshwater fishes and cryptic diversity has been a major hindrance in species identification in the past four decades due to their high similarities in terms of morphology. This study aimed to investigate this issue both morphologically and molecularly. In this study, a total of 23 morphometric parameters were used to differentiate the 103 Rasbora fish samples harvested from different regions of Sarawak state of Malaysia via Multivariate Stepwise Discriminant Function Analysis (SDFA). Then, cytochrome oxidase subunit I (COI) gene was utilised to further distinguish 33 of these fishes, followed by sequence and phylogenetic analysis. Our results unravelled pre-anal length as strongest morphometric discriminant (100%) and that all eight Rasbora species tested are monophyletic except for R. sumatrana and R. caudimaculata, revealing possible cryptic Rasbora species. Further investigations are vital to enrich the data from this study for Rasbora cryptic diversity and conservation studies in future.
Statistical validation is crucial for the clustering of unknown samples. This study aims to demonstrate how statistical techniques can be optimized using simulated heroin samples containing a range of analyte concentrations that are similar to those of the case samples. Eight simulated heroin distribution links consisting of 64 postcut samples were prepared by mixing one of two mixtures of paracetamol-caffeine-dextromethorphan at different proportions with eight precut samples. Analyte contents and compositional variation of the prepared samples were investigated. A number of data pretreatments were evaluated by associating the postcut samples with the corresponding precut samples using principal component analysis and discriminant analysis. Subsequently, combinations of seven linkage methods and five distance measures were explored using hierarchical cluster analysis. In this study, Ward-Manhattan showed better distinctions between unrelated links and was able to cluster all related samples in very close distance under the known links on a dendogram. A similar discriminative outcome was also achieved by 90 unknown case samples when clustered via Ward-Manhattan.
Red fruit (Pandanus conoideus Lam) is endemic plant of Papua, Indonesia and Papua New Guinea. The price of its oil (red fruit oil, RFO) is 10-15 times higher than that of common vegetable oils; consequently, RFO is subjected to adulteration with lower price oils. Among common vegetable oils, canola oil (CaO) and rice bran oil (RBO) have similar fatty acid profiles to RFO as indicated by the score plot of principal component analysis; therefore, CaO and RBO are potential adulterants in RFO.
The task of identifying firearms from forensic ballistics specimens is exacting in crime investigation since the last two decades. Every firearm, regardless of its size, make and model, has its own unique 'fingerprint'. These fingerprints transfer when a firearm is fired to the fired bullet and cartridge case. The components that are involved in producing these unique characteristics are the firing chamber, breech face, firing pin, ejector, extractor and the rifling of the barrel. These unique characteristics are the critical features in identifying firearms. It allows investigators to decide on which particular firearm that has fired the bullet. Traditionally the comparison of ballistic evidence has been a tedious and time-consuming process requiring highly skilled examiners. Therefore, the main objective of this study is the extraction and identification of suitable features from firing pin impression of cartridge case images for firearm recognition. Some previous studies have shown that firing pin impression of cartridge case is one of the most important characteristics used for identifying an individual firearm. In this study, data are gathered using 747 cartridge case images captured from five different pistols of type 9mm Parabellum Vektor SP1, made in South Africa. All the images of the cartridge cases are then segmented into three regions, forming three different set of images, i.e. firing pin impression image, centre of firing pin impression image and ring of firing pin impression image. Then geometric moments up to the sixth order were generated from each part of the images to form a set of numerical features. These 48 features were found to be significantly different using the MANOVA test. This high dimension of features is then reduced into only 11 significant features using correlation analysis. Classification results using cross-validation under discriminant analysis show that 96.7% of the images were classified correctly. These results demonstrate the value of geometric moments technique for producing a set of numerical features, based on which the identification of firearms are made.
Fish vocalisation is often a major component of underwater soundscapes. Therefore, interpretation of these soundscapes requires an understanding of the vocalisation characteristics of common soniferous fish species. This study of captive female bluefin gurnard, Chelidonichthys kumu, aims to formally characterise their vocalisation sounds and daily pattern of sound production. Four types of sound were produced and characterised, twice as many as previously reported in this species. These sounds fit two aural categories; grunt and growl, the mean peak frequencies for which ranged between 129 to 215 Hz. This species vocalized throughout the 24 hour period at an average rate of (18.5 ± 2.0 sounds fish-1 h-1) with an increase in vocalization rate at dawn and dusk. Competitive feeding did not elevate vocalisation as has been found in other gurnard species. Bluefin gurnard are common in coastal waters of New Zealand, Australia and Japan and, given their vocalization rate, are likely to be significant contributors to ambient underwater soundscape in these areas.
Background. Fish species may be identified based on their unique otolith shape or contour. Several pattern recognition methods have been proposed to classify fish species through morphological features of the otolith contours. However, there has been no fully-automated species identification model with the accuracy higher than 80%. The purpose of the current study is to develop a fully-automated model, based on the otolith contours, to identify the fish species with the high classification accuracy. Methods. Images of the right sagittal otoliths of 14 fish species from three families namely Sciaenidae, Ariidae, and Engraulidae were used to develop the proposed identification model. Short-time Fourier transform (STFT) was used, for the first time in the area of otolith shape analysis, to extract important features of the otolith contours. Discriminant Analysis (DA), as a classification technique, was used to train and test the model based on the extracted features. Results. Performance of the model was demonstrated using species from three families separately, as well as all species combined. Overall classification accuracy of the model was greater than 90% for all cases. In addition, effects of STFT variables on the performance of the identification model were explored in this study. Conclusions. Short-time Fourier transform could determine important features of the otolith outlines. The fully-automated model proposed in this study (STFT-DA) could predict species of an unknown specimen with acceptable identification accuracy. The model codes can be accessed at http://mybiodiversityontologies.um.edu.my/Otolith/ and https://peerj.com/preprints/1517/. The current model has flexibility to be used for more species and families in future studies.
The pollinators of 29 ginger species representing 11 genera in relation to certain floral morphological characteristics in a mixed-dipterocarp forest in Borneo were investigated. Among the 29 species studied, eight were pollinated by spiderhunters (Nectariniidae), 11 by medium-sized Amegilla bees (Anthophoridae), and ten by small halictid bees. These pollination guilds found in gingers in Sarawak are comparable to the pollination guilds of neotropical Zingiberales, i.e., hummingbird-, and euglossine-bee-pollinated guilds. Canonical discriminant analysis revealed that there were significant correlations between floral morphology and pollination guilds and suggests the importance of plant-pollinator interactions in the evolution of floral morphology. Most species in the three guilds were separated on the plot by the first and second canonical variables. Spiderhunter-pollinated flowers had longer floral tubes, while Amegilla-pollinated flowers had wider lips than the others, which function as a platform for the pollinators. Pistils and stamens of halictid-pollinated flowers were smaller than the others. The fact that gingers with diverse morphologies in a forest with high species diversity were grouped into only three pollination guilds and that the pollinators themselves showed low species diversity suggests that many species of rare understory plants have evolved without segregating pollinators in each pollination guild.
This paper outlines the application of chemometrics and pattern recognition tools to classify palm oil using Fourier Transform Mid Infrared spectroscopy (FT-MIR). FT-MIR spectroscopy is used as an effective analytical tool in order to categorise the oil into the category of unused palm oil and used palm oil for frying. The samples used in this study consist of 28 types of pure palm oil, and 28 types of frying palm oils. FT-MIR spectral was obtained in absorbance mode at the spectral range from 650 cm -1 to 4000 cm -1 using FT-MIR-ATR sample handling. The aim of this work is to develop fast method in discriminating the palm oils by implementing Partial Least Square Discriminant Analysis (PLS-DA), Learning Vector Quantisation (LVQ) and Support Vector Machine (SVM). Raw FT-MIR spectra were subjected to Savitzky-Golay smoothing and standardized before developing the classification models. The classification model was validated through finding the value of percentage correctly classified by test set for every model in order to show which classifier provided the best classification. In order to improve the performance of the classification model, variable selection method known as t-statistic method was applied. The significant variable in developing classification model was selected through this method. The result revealed that PLSDA classifier of the standardized data with application of t-statistic showed the best performance with highest percentage correctly classified among the classifiers.
Tongkat Ali (Eurycoma longifolia) is one of the most popular tropical herbal plants as it is believed to enhance virility and sexual prowess. This study looked examined chromatographic fingerprint of Tongkat Ali roots and its products generated using online solid phase-extraction liquid chromatography (SPE-LC) combined with chemometric approaches. The aim was to determine its quality. Pressurised liquid extraction (PLE) technique was used prior to online SPE-LC using polystyrene divinyl benzene (PSDVB) and C18 columns. Seventeen Tongkat Ali roots and 10 products (capsules) were analysed. Chromatographic dataset was subjected to chemometric techniques, namely cluster analysis (CA), discriminant analysis (DA) and principal component analysis (PCA) using 37 selected peaks. The samples were grouped into three clusters based on their quality. The PCA resulted in 11 latent factors describing 90.8% of the whole variance. Pattern matching analysis showed no significant difference (p>0.05) between the roots and products within the same CA grouping. The findings showed the combination of chromatographic fingerprint and chemometric techniques provided comprehensive evaluation for efficient quality control of Tongkat Ali formulation.
Partial least squares-discriminant analysis (PLS-DA) is a versatile algorithm that can be used for predictive and descriptive modelling as well as for discriminative variable selection. However, versatility is both a blessing and a curse and the user needs to optimize a wealth of parameters before reaching reliable and valid outcomes. Over the past two decades, PLS-DA has demonstrated great success in modelling high-dimensional datasets for diverse purposes, e.g. product authentication in food analysis, diseases classification in medical diagnosis, and evidence analysis in forensic science. Despite that, in practice, many users have yet to grasp the essence of constructing a valid and reliable PLS-DA model. As the technology progresses, across every discipline, datasets are evolving into a more complex form, i.e. multi-class, imbalanced and colossal. Indeed, the community is welcoming a new era called big data. In this context, the aim of the article is two-fold: (a) to review, outline and describe the contemporary PLS-DA modelling practice strategies, and (b) to critically discuss the respective knowledge gaps that have emerged in response to the present big data era. This work could complement other available reviews or tutorials on PLS-DA, to provide a timely and user-friendly guide to researchers, especially those working in applied research.
Callus was induced from mangosteen (Garcinia mangostana L.) young purple-red leaves on Murashige and Skoog basal medium with various combinations of plant growth regulators. Murashige and Skoog medium with 4.44 µM 6-benzylaminopurine and 4.52 µM 2,4-dichlorophenoxyacetic acid was the best for friable callus induction. This friable callus was used for the initiation of cell suspension culture. The effects of different combinations of 6-benzylaminopurine and 2,4-dichlorophenoxyacetic acid, carbon sources and inoculum sizes were tested. It was found that combination of 2.22 µM 6-benzylaminopurine + 2.26 µM 2,4-dichlorophenoxyacetic acid, glucose (30 g/l) and 1.5 g/50 ml inoculum size was the best for cell growth. Callus and cell suspension cultures were then treated either with 100 µM methyl jasmonate as an elicitor for 5 days, or 0.5 g/l casein hydrolysate as an organic supplement for 7 days. Metabolites were then extracted and profiled using liquid chromatography-time of flight mass spectrometry. Multivariate discriminant analyses revealed significant metabolite differences (P ≤ 0.05) for callus and suspension cells treated either with methyl jasmonate or casein hydrolysate. Based on MS/MS data, methyl jasmonate stimulated the production of an alkaloid (thalsimine) and fatty acid (phosphatidyl ethanolamine) in suspension cells while in callus, an alkaloid (thiacremonone) and glucosinolate (7-methylthioheptanaldoxime) was produced. Meanwhile casein hydrolysate stimulated the production of alkaloids such as 3ß,6ß-dihydroxynortropane and cis-hinokiresinol and triterpenoids such as schidigerasaponin and talinumoside in suspension cells. This study provides evidence on the potential of secondary metabolite production from in vitro culture of mangosteen.