Kosakonia radicincitans (formerly known as Enterobacter radicincitans), an endophytic bacterium was isolated from the symptomatic tissues of bacterial wilt diseased banana (Musa spp.) plant in Malaysia. The total genome size of K. radicincitans UMEnt01/12 is 5 783 769 bp with 5463 coding sequences (CDS), 75 tRNAs, and 9 rRNAs. The annotated draft genome of the K. radicincitans UMEnt01/12 strain might shed light on its role as a bacterial wilt-associated bacterium.
To evaluate the contribution of non-synonymous-coding variants of known familial and genome-wide association studies (GWAS)-linked genes for Parkinson's disease (PD) to PD risk in the East Asian population, we sequenced all the coding exons of 39 PD-related disease genes and evaluated the accumulation of rare non-synonymous-coding variants in 375 early-onset PD cases and 399 controls. We also genotyped 782 non-synonymous-coding variants of these genes in 710 late-onset PD cases and 9046 population controls. Significant enrichment of LRRK2 variants was observed in both early- and late-onset PD (odds ratio = 1.58; 95% confidence interval = 1.29-1.93; P = 8.05 × 10(-6)). Moderate enrichment was also observed in FGF20, MCCC1, GBA and ITGA8. Half of the rare variants anticipated to cause loss of function of these genes were present in healthy controls. Overall, non-synonymous-coding variants of known familial and GWAS-linked genes appear to make a limited contribution to PD risk, suggesting that clinical sequencing of these genes will provide limited information for risk prediction and molecular diagnosis.
A multilocus sequence analysis using mitochondria-encoded cytochrome c oxidase subunit I (COI), cytochrome B (CytB), NADH dehydrogenase subunit 5 (ND5); nuclear encoded 18S ribosomal RNA (18S) and 28S ribosomal RNA (28S) genes was performed to determine the levels of genetic variation between the closely related species Haematobia irritans Linnaeus and Haematobia exigua de Meijere. Among these five genes, ND5 and CytB genes were found to be more variable and informative in resolving the interspecific relationships of both species. In contrast, the COI gene was more valuable in inferring the intraspecific relationships. The ribosomal 18S and 28S sequences of H. irritans and H. exigua were highly conserved with limited intra- and inter-specific variation. Molecular evidence presented in this study demonstrated that both flies are genetically distinct and could be differentiated based on sequence analysis of mitochondrial genes.
The commercial freshwater crayfish Cherax quadricarinatus complete mitochondrial genome was recovered from partial genome sequencing using the MiSeq Personal Sequencer. The mitogenome has 15,869 base pairs consisting of 13 protein-coding genes, 2 ribosomal subunit genes, 22 transfer RNAs, and a non-coding AT-rich region. The base composition of C. quadricarinatus is 32.16% for T, 23.39% for C, 33.26% for A, and 11.19% for G, with an AT bias of 65.42%.
The mitogenome of the black yabby, Geocharax gracilis, was sequenced using the MiSeq Personal Sequencer. It has 15,924 base pairs consisting of 13 protein-coding genes, 2 ribosomal subunit genes, 23 transfer RNAs, and a non-coding AT-rich region. The base composition of G. gracilis mitogenome is 32.18% for T, 22.32% for C, 34.83% for A, and 10.68% for G, with an AT bias of 67.01%. The mitogenome gene order is typical for that of parastacid crayfish with the exception of some minor rearrangements involving tRNA genes.
High-throughput RNA sequencing (RNA-seq) is considered a powerful tool for novel gene discovery and fine-tuned transcriptional profiling. The digital nature of RNA-seq is also believed to simplify meta-analysis and to reduce background noise associated with hybridization-based approaches. The development of multiplex sequencing enables efficient and economic parallel analysis of gene expression. In addition, RNA-seq is of particular value when low RNA expression or modest changes between samples are monitored. However, recent data uncovered severe bias in the sequencing of small non-protein coding RNA (small RNA-seq or sRNA-seq), such that the expression levels of some RNAs appeared to be artificially enhanced and others diminished or even undetectable. The use of different adapters and barcodes during ligation as well as complex RNA structures and modifications drastically influence cDNA synthesis efficacies and exemplify sources of bias in deep sequencing. In addition, variable specific RNA G/C-content is associated with unequal polymerase chain reaction amplification efficiencies. Given the central importance of RNA-seq to molecular biology and personalized medicine, we review recent findings that challenge small non-protein coding RNA-seq data and suggest approaches and precautions to overcome or minimize bias.
The structural comparison of proteins is a vital step in structural biology that is used to predict and analyse a new unknown protein function. Although a number of different techniques have been explored, the study to develop new alternative methods is still an active research area. The present paper introduces a text modelling-based technique for the structural comparison of proteins. The method models the secondary and tertiary structure of proteins in two linear sequences and then applies them to the comparison of two structures. The technique used for pairwise comparison of the sequences has been adopted from computational linguistics and its well-known techniques for analysing and quantifying textual sequences. To this end, an n-gram modelling technique is used to capture regularities between sequences, and then, the cross-entropy concept is employed to measure their similarities. Several experiments are conducted to evaluate the performance of the method and compare it with other commonly used programs. The assessments for information retrieval evaluation demonstrate that the technique has a high running speed, which is similar to other linear encoding methods, such as 3D-BLAST, SARST, and TS-AMIR, whereas its accuracy is comparable to CE and TM-align, which are high accuracy comparison tools. Accordingly, the results demonstrate that the algorithm has high efficiency compared with other state-of-the-art methods.
Breast cancer is the most common malignancy in women worldwide. The incidence of breast cancer in Malaysia is lower compared to international statistics, with peak occurrence in the age group between 50 to 59 years of age and mortality rates of 18.6%. Despite current diagnostic and prognostic methods, the outcome for individual subjects remain poor. This is in part due to breast cancers' wide genetic heterogeneity. Various platforms for genetics studies are now employed to determine the identity of these genetic abnormalities, including microarray methods like high density single-nucleotide-polymorphism (SNP) oligonucleotide arrays which combine the power of chromosomal comparative genomic hybridization (cCGH) and loss of heterozygosity (LOH) in the offering of higher-resolution mappings. These platforms and their applications in highlighting the genomic alteration frameworks manifested in breast carcinoma will be discussed.
Remote protein homology detection and fold recognition refer to detection of structural homology in proteins where there are small or no similarities in the sequence. To detect protein structural classes from protein primary sequence information, homology-based methods have been developed, which can be divided to three types: discriminative classifiers, generative models for protein families and pairwise sequence comparisons. Support Vector Machines (SVM) and Neural Networks (NN) are two popular discriminative methods. Recent studies have shown that SVM has fast speed during training, more accurate and efficient compared to NN. We present a comprehensive method based on two-layer classifiers. The 1st layer is used to detect up to superfamily and family in SCOP hierarchy using optimized binary SVM classification rules. It used the kernel function known as the Bio-kernel, which incorporates the biological information in the classification process. The 2nd layer uses discriminative SVM algorithm with string kernel that will detect up to protein fold level in SCOP hierarchy. The results obtained were evaluated using mean ROC and mean MRFP and the significance of the result produced with pairwise t-test was tested. Experimental results show that our approaches significantly improve the performance of remote protein homology detection and fold recognition for all three different version SCOP datasets (1.53, 1.67 and 1.73). We achieved 4.19% improvements in term of mean ROC in SCOP 1.53, 4.75% in SCOP 1.67 and 4.03% in SCOP 1.73 datasets when compared to the result produced by well-known methods. The combination of first layer and second layer of BioSVM-2L performs well in remote homology detection and fold recognition even in three different versions of datasets.
Gene expression data are expected to be of significant help in the development of efficient cancer diagnoses and classification platforms. In order to select a small subset of informative genes from the data for cancer classification, recently, many researchers are analyzing gene expression data using various computational intelligence methods. However, due to the small number of samples compared to the huge number of genes (high dimension), irrelevant genes, and noisy genes, many of the computational methods face difficulties to select the small subset. Thus, we propose an improved (modified) binary particle swarm optimization to select the small subset of informative genes that is relevant for the cancer classification. In this proposed method, we introduce particles' speed for giving the rate at which a particle changes its position, and we propose a rule for updating particle's positions. By performing experiments on ten different gene expression datasets, we have found that the performance of the proposed method is superior to other previous related works, including the conventional version of binary particle swarm optimization (BPSO) in terms of classification accuracy and the number of selected genes. The proposed method also produces lower running times compared to BPSO.
Array-based comparative genomic hybridization (array CGH) is a new molecular technique that has the potential to revolutionize cytogenetics. However, use of high resolution array CGH in the clinical setting is plagued by the problem of widespread copy number variations (CNV) in the human genome. Constitutional microarray, containing only clones that interrogate regions of known constitutional syndromes, may circumvent the dilemma of detecting CNV of unknown clinical significance.
To isolate and identify the pathogen of Dengue fever from Shenzhen city in 2005 - 2006, and to analyze the molecular characteristics of the isolated Dengue virus strain as well as to explore its possible origin.
Matched MeSH terms: Sequence Analysis, RNA; Sequence Analysis, Protein
The full-length genomes of two DENV-1 viruses isolated during the 2005-2006 dengue incidents in Brunei were sequenced. Twenty five primer sets were designed to amplify contiguous overlapping fragments of approximately 500-600 base pairs spanning the entire sequence of the genome. The amplified PCR products were sent to a commercial laboratory for sequencing and the nucleotides and the deduced amino acids were determined. Sequence analysis of the envelope gene at the nucleotide and amino acid levels between the two isolates showed 92 and 96 % identity, respectively. Comparison of the envelope gene sequences with 68 other DENV-1 viruses of known genotypes placed the two isolates into two different genotypic groups. Isolate DS06/210505 belongs to genotype V together with some of the recent isolates from India (2003) and older isolates from Singapore (1990) and Burma (1976), while isolate DS212/110306 was clustered in genotype IV with the prototype Nauru strain (1974) and with some of the recent isolates from Indonesia (2004) and the Philippines (2002, 2001). In the full-length genome analysis at the nucleotide level, isolate DS06/210505 showed 94 % identity to the French Guyana strain (1989) in genotype V while isolate DS212/110306 had 96 % identity to the Nauru Island strain (1974) in genotype IV. This work constitutes the first complete genetic characterization of not only Brunei DENV-1 virus isolates, but also the first strain from Borneo Island. This study was the first to report the isolation of dengue virus in the country.
Previously, direct-proportional length-based DNA computing (DPLB-DNAC) for solving weighted graph problems has been reported. The proposed DPLB-DNAC has been successfully applied to solve the shortest path problem, which is an instance of weighted graph problems. The design and development of DPLB-DNAC is important in order to extend the capability of DNA computing for solving numerical optimization problem. According to DPLB-DNAC, after the initial pool generation, the initial solution is subjected to amplification by polymerase chain reaction and, finally, the output of the computation is visualized by gel electrophoresis. In this paper, however, we give more attention to the initial pool generation of DPLB-DNAC. For this purpose, two kinds of initial pool generation methods, which are generally used for solving weighted graph problems, are evaluated. Those methods are hybridization-ligation and parallel overlap assembly (POA). It is found that for DPLB-DNAC, POA is better than that of the hybridization-ligation method, in terms of population size, generation time, material usage, and efficiency, as supported by the results of actual experiments.
As the topological properties of each spot in DNA microarray images may vary from one another, we employed granulometries to understand the shape-size content contributed due to a significant intensity value within a spot. Analysis was performed on the microarray image that consisted of 240 spots by using concepts from mathematical morphology. In order to find out indices for each spot and to further classify them, we adopted morphological multiscale openings, which provided microarrays at multiple scales. Successive opened microarrays were subtracted to identify the protrusions that were smaller than the size of structuring element. Spot-wise details, in terms of probability of these observed protrusions, were computed by placing a regularly spaced grid on microarray such that each spot was centered in each grid. Based on the probability of size distribution functions of these protrusions isolated at each level, we estimated the mean size and texture index for each spot. With these characteristics, we classified the spots in a microarray image into bright and dull categories through pattern spectrum and shape-size complexity measures. These segregated spots can be compared with those of hybridization levels.
Cancer starts when cells in a part of the body start to grow out of control. In fact cells become cancer cells because of DNA damage. A DNA walk of a genome represents how the frequency of each nucleotide of a pairing nucleotide couple changes locally. In this research in order to study the cancer genes, DNA walk plots of genomes of patients with lung cancer were generated using a program written in MATLAB language. The data so obtained was checked for fractal property by computing the fractal dimension using a program written in MATLAB. Also, the correlation of damaged DNA was studied using the Hurst exponent measure. We have found that the damaged DNA sequences are exhibiting higher degree of fractality and less correlation compared with normal DNA sequences. So we confirmed this method can be used for early detection of lung cancer. The method introduced in this research not only is useful for diagnosis of lung cancer but also can be applied for detection and growth analysis of different types of cancers.
Matched MeSH terms: Sequence Analysis, DNA/statistics & numerical data
Hevea brasiliensis Muell. Arg, a member of the family Euphorbiaceae, is the sole natural resource exploited for commercial production of high-quality natural rubber. The properties of natural rubber latex are almost irreplaceable by synthetic counterparts for many industrial applications. A paucity of knowledge on the molecular mechanisms of rubber biosynthesis in high yield traits still persists. Here we report the comprehensive genome-wide analysis of the widely planted H. brasiliensis clone, RRIM 600. The genome was assembled based on ~155-fold combined coverage with Illumina and PacBio sequence data and has a total length of 1.55 Gb with 72.5% comprising repetitive DNA sequences. A total of 84,440 high-confidence protein-coding genes were predicted. Comparative genomic analysis revealed strong synteny between H. brasiliensis and other Euphorbiaceae genomes. Our data suggest that H. brasiliensis's capacity to produce high levels of latex can be attributed to the expansion of rubber biosynthesis-related genes in its genome and the high expression of these genes in latex. Using cap analysis gene expression data, we illustrate the tissue-specific transcription profiles of rubber biosynthesis-related genes, revealing alternative means of transcriptional regulation. Our study adds to the understanding of H. brasiliensis biology and provides valuable genomic resources for future agronomic-related improvement of the rubber tree.
Perspicuous assessments of taxonomic boundaries and discovery of cryptic taxa are of paramount importance in interpreting ecological and evolutionary phenomena among black flies (Simuliidae) and combating associated vector-borne diseases. Simulium tani Takaoka & Davies is the largest and perhaps the most taxonomically challenging species complex of black flies in the Oriental Region. We use a DNA sequence-based method to delineate currently recognized chromosomal and morphological taxa in the S. tani complex on the Southeast Asian mainland and Taiwan, while elucidating their phylogenetic relationships. A molecular approach using multiple genes, coupled with morphological and chromosomal data, supported recognition of cytoform K and morphoform 'b' as valid species; indicated that S. xuandei, cytoform L, and morphoform 'a' contain possible cryptic species; and suggested that cytoform B is in the early stages of reproductive isolation whereas lineage sorting is incomplete in cytoforms A, C, and G.
Magnocellular neurons (MCNs) in the hypothalamo-neurohypophysial system (HNS) are highly specialized to release large amounts of arginine vasopressin (Avp) or oxytocin (Oxt) into the blood stream and play critical roles in the regulation of body fluid homeostasis. The MCNs are osmosensory neurons and are excited by exposure to hypertonic solutions and inhibited by hypotonic solutions. The MCNs respond to systemic hypertonic and hypotonic stimulation with large changes in the expression of their Avp and Oxt genes, and microarray studies have shown that these osmotic perturbations also cause large changes in global gene expression in the HNS. In this paper, we examine gene expression in the rat supraoptic nucleus (SON) under normosmotic and chronic salt-loading SL) conditions by the first time using "new-generation", RNA sequencing (RNA-Seq) methods. We reliably detect 9,709 genes as present in the SON by RNA-Seq, and 552 of these genes were changed in expression as a result of chronic SL. These genes reflect diverse functions, and 42 of these are involved in either transcriptional or translational processes. In addition, we compare the SON transcriptomes resolved by RNA-Seq methods with the SON transcriptomes determined by Affymetrix microarray methods in rats under the same osmotic conditions, and find that there are 6,466 genes present in the SON that are represented in both data sets, although 1,040 of the expressed genes were found only in the microarray data, and 2,762 of the expressed genes are selectively found in the RNA-Seq data and not the microarray data. These data provide the research community a comprehensive view of the transcriptome in the SON under normosmotic conditions and the changes in specific gene expression evoked by salt loading.
Invasive phytoplasmas wreak havoc on coconut palms worldwide, leading to high loss of income, food insecurity and extreme poverty of farmers in producing countries. Phytoplasmas as strictly biotrophic insect-transmitted bacterial pathogens instigate distinct changes in developmental processes and defence responses of the infected plants and manipulate plants to their own advantage; however, little is known about the cellular and molecular mechanisms underlying host-phytoplasma interactions. Further, phytoplasma-mediated transcriptional alterations in coconut palm genes have not yet been identified. This study evaluated the whole transcriptome profiles of naturally infected leaves of Cocos nucifera ecotype Malayan Red Dwarf in response to yellow decline phytoplasma from group 16SrXIV, using RNA-Seq technique. Transcriptomics-based analysis reported here identified genes involved in coconut innate immunity. The number of down-regulated genes in response to phytoplasma infection exceeded the number of genes up-regulated. Of the 39,873 differentially expressed unigenes, 21,860 unigenes were suppressed and 18,013 were induced following infection. Comparative analysis revealed that genes associated with defence signalling against biotic stimuli were significantly overexpressed in phytoplasma-infected leaves versus healthy coconut leaves. Genes involving cell rescue and defence, cellular transport, oxidative stress, hormone stimulus and metabolism, photosynthesis reduction, transcription and biosynthesis of secondary metabolites were differentially represented. Our transcriptome analysis unveiled a core set of genes associated with defence of coconut in response to phytoplasma attack, although several novel defence response candidate genes with unknown function have also been identified. This study constitutes valuable sequence resource for uncovering the resistance genes and/or susceptibility genes which can be used as genetic tools in disease resistance breeding.