In gene expression studies, missing values are a common problem with important consequences for the interpretation of the final data (Satija et al., Nat Biotechnol 33(5):495, 2015). Numerous bioinformatics examination tools are used for cancer prediction, including the data set matrix (Bailey et al., Cell 173(2):371-385, 2018); thus, it is necessary to resolve the problem of missing-values imputation. This chapter presents a review of the research on missing-values imputation approaches for gene expression data. By using local and global correlation of the data, we were able to focus mostly on the differences between the algorithms. We classified the algorithms as global, hybrid, local, or knowledge-based techniques. Additionally, this chapter presents suitable assessments of the different approaches. The purpose of this review is to focus on developments in the current techniques for scientists rather than applying different or newly developed algorithms with identical functional goals. The aim was to adapt the algorithms to the characteristics of the data.
The fabrication of Metal-DNA-Metal (MDM) structure-based high sensitivity sensors from DNA micro-and nanoarray strands is a key issue in their development. The tunable semiconducting response of DNA in the presence of external electromagnetic and thermal fields is a gift for molecular electronics. The impact of temperatures (25-55 °C) and magnetic fields (0-1200 mT) on the current-voltage (I-V) features of Au-DNA-Au (GDG) structures with an optimum gap of 10 μm is reported. The I-V characteristics acquired in the presence and absence of magnetic fields demonstrated the semiconducting diode nature of DNA in GDG structures with high temperature sensitivity. The saturation current in the absence of magnetic field was found to increase sharply with the increase of temperature up to 45 °C and decrease rapidly thereafter. This increase was attributed to the temperature-assisted conversion of double bonds into single bond in DNA structures. Furthermore, the potential barrier height and Richardson constant for all the structures increased steadily with the increase of external magnetic field irrespective of temperature variations. Our observation on magnetic field and temperature sensitivity of I-V response in GDG sandwiches may contribute towards the development of DNA-based magnetic sensors.
When gene expression data are too large to be processed, they are transformed into a reduced representation set of genes. Transforming large-scale gene expression data into a set of genes is called feature extraction. If the genes extracted are carefully chosen, this gene set can extract the relevant information from the large-scale gene expression data, allowing further analysis by using this reduced representation instead of the full size data. In this paper, we review numerous software applications that can be used for feature extraction. The software reviewed is mainly for Principal Component Analysis (PCA), Independent Component Analysis (ICA), Partial Least Squares (PLS), and Local Linear Embedding (LLE). A summary and sources of the software are provided in the last section for each feature extraction method.
Gene expression data are expected to be of significant help in the development of efficient cancer diagnoses and classification platforms. In order to select a small subset of informative genes from the data for cancer classification, recently, many researchers are analyzing gene expression data using various computational intelligence methods. However, due to the small number of samples compared to the huge number of genes (high dimension), irrelevant genes, and noisy genes, many of the computational methods face difficulties to select the small subset. Thus, we propose an improved (modified) binary particle swarm optimization to select the small subset of informative genes that is relevant for the cancer classification. In this proposed method, we introduce particles' speed for giving the rate at which a particle changes its position, and we propose a rule for updating particle's positions. By performing experiments on ten different gene expression datasets, we have found that the performance of the proposed method is superior to other previous related works, including the conventional version of binary particle swarm optimization (BPSO) in terms of classification accuracy and the number of selected genes. The proposed method also produces lower running times compared to BPSO.
Light regulates photosynthesis, growth and reproduction, yield and properties of phycocolloids, and starch contents in seaweeds. Despite its importance as an environmental cue that regulates many developmental, physiological, and biochemical processes, the network of genes involved during light deprivation are obscure. In this study, we profiled the transcriptome of Gracilaria changii at two different irradiance levels using a cDNA microarray containing more than 3,000 cDNA probes. Microarray analysis revealed that 93 and 105 genes were up- and down-regulated more than 3-fold under light deprivation, respectively. However, only 50% of the transcripts have significant matches to the nonredundant peptide sequences in the database. The transcripts that accumulated under light deprivation include vanadium chloroperoxidase, thioredoxin, ferredoxin component, and reduced nicotinamide adenine dinucleotide dehydrogenase. Among the genes that were down-regulated under light deprivation were genes encoding light harvesting protein, light harvesting complex I, phycobilisome 7.8 kDa linker polypeptide, low molecular weight early light-inducible protein, and vanadium bromoperoxidase. Our findings also provided important clues to the functions of many unknown sequences that could not be annotated using sequence comparison.
This research determined genes contributing to the pathogenesis of endometrioid endometrial cancer (EEC). Eight pairs of microdissected EEC samples matched with normal glandular epithelium were analyzed using microarray. Unsupervised analysis identified 162 transcripts (58 up- and 104 down-regulated) that were differentially expressed (p < .01, fold change ≥ 1.5) between both groups. Quantitative real-time polymerase chain reaction (qPCR) validated the genes of interest: SLC7A5, SATB1, H19, and ZAK (p < .05). Pathway analysis revealed genes involved in acid amino transport, translation, and chromatin remodeling (p < .05). Laser capture microdissection (LCM) followed by microarray enabled precise assessment of homogeneous cell population and identified putative genes for endometrial carcinogenesis.
Systemic infections of Candida albicans, the most prevalent fungal pathogen in humans, are on the rise in recent years. However, the exact mode of pathogenesis of this fungus is still not well elucidated. Previous studies using C. albicans mutants locked into the yeast form via gene deletion found that this form was avirulent and did not induce significant differential expression of host genes in vitro. In this study, a high density of C. albicans was used to infect human umbilical vein endothelial cells (HUVEC), resulting in yeast-form infections, whilst a low density of C. albicans resulted in hyphae infections. Transcriptional profiling of HUVEC response to these infections showed that high densities of C. albicans induced a stronger, broader transcriptional response from HUVEC than low densities of C. albicans infection. Many of the genes that were significantly differentially expressed were involved in apoptosis and cell death. In addition, conditioned media from the high-density infections caused a significant reduction in HUVEC viability, suggesting that certain molecules released during C. albicans and HUVEC interactions were capable of causing cell death. This study has shown that C. albicans yeast-forms, at high densities, cannot be dismissed as avirulent, but instead could possibly contribute to C. albicans pathogenesis.
Ureaplasma parvum colonizes human mucosal surfaces, primarily in the respiratory and urogenital tracts, causing a wide spectrum of diseases, from non-gonococcal urethritis to pneumonitis in immunocompromised hosts. Although the basis for these diverse clinical outcomes is not yet understood, more severe disease may be associated with strains harboring a certain set of strain-specific genes. To investigate this, whole genome DNA macroarrays were constructed and used to assess genomic diversity in 10 U. parvum clinical strains. We found that 7.6% of U. parvum genes were dispersed into one or more strains, thus defining a minimal functional core of 538 U. parvum genes. Most of the strain-specific genes (79%) were of unknown function and were unique to U. parvum. Four hypervariable plasticity regions were identified in the genome containing 93% of the variability in the gene pool (UU32-UU33, UU145-UU170, UU440-UU447 and UU527-UU529). We hypothesized that one of them (UU145-UU170) was a pathogenicity island in U. parvum and we characterized it. Thus, we propose that the clinical outcome of U. parvum infection is probably associated with this newly identified pathogenicity island.
The aims of the present study were to undertake gene expression profiling of the blood of glioma patients to determine key genetic components of signaling pathways and to develop a panel of genes that could be used as a potential blood-based biomarker to differentiate between high and low grade gliomas, non-gliomas and control samples. In this study, blood samples were obtained from glioma patients, non-glioma and control subjects. Ten samples each were obtained from patients with high and low grade tumours, respectively, ten samples from non-glioma patients and twenty samples from control subjects. Total RNA was isolated from each sample after which first and second strand synthesis was performed. The resulting cRNA was then hybridized with the Agilent Whole Human Genome (4x44K) microarray chip according to the manufacturer's instructions. Universal Human Reference RNA and samples were labeled with Cy3 CTP and Cy5 CTP, respectively. Microarray data were analyzed by the Agilent Gene Spring 12.1V software using stringent criteria which included at least a 2-fold difference in gene expression between samples. Statistical analysis was performed using the unpaired Student's t-test with a p<0.01. Pathway enrichment was also performed, with key genes selected for validation using droplet digital polymerase chain reaction (ddPCR). The gene expression profiling indicated that were a substantial number of genes that were differentially expressed with more than a 2-fold change (p<0.01) between each of the four different conditions. We selected key genes within significant pathways that were analyzed through pathway enrichment. These key genes included regulators of cell proliferation, transcription factors, cytokines and tumour suppressor genes. In the present study, we showed that key genes involved in significant and well established pathways, could possibly be used as a potential blood-based biomarker to differentiate between high and low grade gliomas, non-gliomas and control samples.
Microfluidics-based lab-on-chip (LOC) systems are an active research area that is revolutionising high-throughput sequencing for the fast, sensitive and accurate detection of a variety of pathogens. LOCs also serve as portable diagnostic tools. The devices provide optimum control of nanolitre volumes of fluids and integrate various bioassay operations that allow the devices to rapidly sense pathogenic threat agents for environmental monitoring. LOC systems, such as microfluidic biochips, offer advantages compared to conventional identification procedures that are tedious, expensive and time consuming. This paper aims to provide a broad overview of the need for devices that are easy to operate, sensitive, fast, portable and sufficiently reliable to be used as complementary tools for the control of pathogenic agents that damage the environment.
The natural rubber of Para rubber tree, Hevea brasiliensis, is the main crop involved in industrial rubber production due to its superior quality. The Hevea bark is commercially exploited to obtain latex, which is produced from the articulated secondary laticifer. The laticifer is well defined in the aspect of morphology; however, only some genes associated with its development have been reported. We successfully induced secondary laticifer in the jasmonic acid (JA)-treated and linolenic acid (LA)-treated Hevea bark but secondary laticifer is not observed in the ethephon (ET)-treated and untreated Hevea bark. In this study, we analysed 27,195 gene models using NimbleGen microarrays based on the Hevea draft genome. 491 filtered differentially expressed (FDE) transcripts that are common to both JA- and LA-treated bark samples but not ET-treated bark samples were identified. In the Eukaryotic Orthologous Group (KOG) analysis, 491 FDE transcripts belong to different functional categories that reflect the diverse processes and pathways involved in laticifer differentiation. In the Kyoto Encyclopedia of Genes and Genomes (KEGG) and KOG analysis, the profile of the FDE transcripts suggest that JA- and LA-treated bark samples have a sufficient molecular basis for secondary laticifer differentiation, especially regarding secondary metabolites metabolism. FDE genes in this category are from the cytochrome (CYP) P450 family, ATP-binding cassette (ABC) transporter family, short-chain dehydrogenase/reductase (SDR) family, or cinnamyl alcohol dehydrogenase (CAD) family. The data includes many genes involved in cell division, cell wall synthesis, and cell differentiation. The most abundant transcript in FDE list was SDR65C, reflecting its importance in laticifer differentiation. Using the Basic Local Alignment Search Tool (BLAST) as part of annotation and functional prediction, several characterised as well as uncharacterized transcription factors and genes were found in the dataset. Hence, the further characterization of these genes is necessary to unveil their role in laticifer differentiation. This study provides a platform for the further characterization and identification of the key genes involved in secondary laticifer differentiation.