The discordant prevalence of Helicobacter pylori and its related diseases, for a long time, fostered certain enigmatic situations observed in the countries of the southern world. Variation in H. pylori infection rates and disease outcomes among different populations in multi-ethnic Malaysia provides a unique opportunity to understand dynamics of host-pathogen interaction and genome evolution. In this study, we extensively analyzed and compared genomes of 27 Malaysian H. pylori isolates and identified three major phylogeographic lineages: hspEastAsia, hpEurope and hpSouthIndia. The analysis of the virulence genes within the core genome, however, revealed a comparable pathogenic potential of the strains. In addition, we identified four genes limited to strains of East-Asian lineage. Our analyses identified a few strain-specific genes encoding restriction modification systems and outlined 311 core genes possibly under differential evolutionary constraints, among the strains representing different ethnic groups. The cagA and vacA genes also showed variations in accordance with the host genetic background of the strains. Moreover, restriction modification genes were found to be significantly enriched in East-Asian strains. An understanding of these variations in the genome content would provide significant insights into various adaptive and host modulation strategies harnessed by H. pylori to effectively persist in a host-specific manner.
We have sequenced miRNA libraries from human embryonic, neural and foetal mesenchymal stem cells. We report that the majority of miRNA genes encode mature isomers that vary in size by one or more bases at the 3' and/or 5' end of the miRNA. Northern blotting for individual miRNAs showed that the proportions of isomiRs expressed by a single miRNA gene often differ between cell and tissue types. IsomiRs were readily co-immunoprecipitated with Argonaute proteins in vivo and were active in luciferase assays, indicating that they are functional. Bioinformatics analysis predicts substantial differences in targeting between miRNAs with minor 5' differences and in support of this we report that a 5' isomiR-9-1 gained the ability to inhibit the expression of DNMT3B and NCAM2 but lost the ability to inhibit CDH1 in vitro. This result was confirmed by the use of isomiR-specific sponges. Our analysis of the miRGator database indicates that a small percentage of human miRNA genes express isomiRs as the dominant transcript in certain cell types and analysis of miRBase shows that 5' isomiRs have replaced canonical miRNAs many times during evolution. This strongly indicates that isomiRs are of functional importance and have contributed to the evolution of miRNA genes.
The Fe(II) and 2-oxoglutarate dependent oxygenase Jmjd6 has been shown to hydroxylate lysine residues in the essential splice factor U2 auxiliary factor 65 kDa subunit (U2AF65) and to act as a modulator of alternative splicing. We describe further evidence for the role of Jmjd6 in the regulation of pre-mRNA processing including interactions of Jmjd6 with multiple arginine-serine-rich (RS)-domains of SR- and SR-related proteins including U2AF65, Luc7-like protein 3 (Luc7L3), SRSF11 and Acinus S', but not with the bona fide RS-domain of SRSF1. The identified Jmjd6 target proteins are involved in different mRNA processing steps and play roles in exon dependent alternative splicing and exon definition. Moreover, we show that Jmjd6 modifies splicing of a constitutive splice reporter, binds RNA derived from the reporter plasmid and punctually co-localises with nascent RNA. We propose that Jmjd6 exerts its splice modulatory function by interacting with specific SR-related proteins during splicing in a RNA dependent manner.
Hydrogen bonds are crucial factors that stabilize a complex ribonucleic acid (RNA) molecule's three-dimensional (3D) structure. Minute conformational changes can result in variations in the hydrogen bond interactions in a particular structure. Furthermore, networks of hydrogen bonds, especially those found in tight clusters, may be important elements in structure stabilization or function and can therefore be regarded as potential tertiary motifs. In this paper, we describe a graph theoretical algorithm implemented as a web server that is able to search for unbroken networks of hydrogen-bonded base interactions and thus provide an accounting of such interactions in RNA 3D structures. This server, COGNAC (COnnection tables Graphs for Nucleic ACids), is also able to compare the hydrogen bond networks between two structures and from such annotations enable the mapping of atomic level differences that may have resulted from conformational changes due to mutations or binding events. The COGNAC server can be accessed at http://mfrlab.org/grafss/cognac.
High-throughput RNA sequencing (RNA-seq) is considered a powerful tool for novel gene discovery and fine-tuned transcriptional profiling. The digital nature of RNA-seq is also believed to simplify meta-analysis and to reduce background noise associated with hybridization-based approaches. The development of multiplex sequencing enables efficient and economic parallel analysis of gene expression. In addition, RNA-seq is of particular value when low RNA expression or modest changes between samples are monitored. However, recent data uncovered severe bias in the sequencing of small non-protein coding RNA (small RNA-seq or sRNA-seq), such that the expression levels of some RNAs appeared to be artificially enhanced and others diminished or even undetectable. The use of different adapters and barcodes during ligation as well as complex RNA structures and modifications drastically influence cDNA synthesis efficacies and exemplify sources of bias in deep sequencing. In addition, variable specific RNA G/C-content is associated with unequal polymerase chain reaction amplification efficiencies. Given the central importance of RNA-seq to molecular biology and personalized medicine, we review recent findings that challenge small non-protein coding RNA-seq data and suggest approaches and precautions to overcome or minimize bias.
We describe a server that allows the interrogation of the Protein Data Bank for hypothetical 3D side chain patterns that are not limited to known patterns from existing 3D structures. A minimal side chain description allows a variety of side chain orientations to exist within the pattern, and generic side chain types such as acid, base and hydroxyl-containing can be additionally deployed in the search query. Moreover, only a subset of distances between the side chains need be specified. We illustrate these capabilities in case studies involving arginine stacks, serine-acid group arrangements and multiple catalytic triad-like configurations. The IMAAAGINE server can be accessed at http://mfrlab.org/grafss/imaaagine/.
Similarities in the 3D patterns of RNA base interactions or arrangements can provide insights into their functions and roles in stabilization of the RNA 3D structure. Nucleic Acids Search for Substructures and Motifs (NASSAM) is a graph theoretical program that can search for 3D patterns of base arrangements by representing the bases as pseudo-atoms. The geometric relationship of the pseudo-atoms to each other as a pattern can be represented as a labeled graph where the pseudo-atoms are the graph's nodes while the edges are the inter-pseudo-atomic distances. The input files for NASSAM are PDB formatted 3D coordinates. This web server can be used to identify matches of base arrangement patterns in a query structure to annotated patterns that have been reported in the literature or that have possible functional and structural stabilization implications. The NASSAM program is freely accessible without any login requirement at http://mfrlab.org/grafss/nassam/.
Similarities in the 3D patterns of amino acid side chains can provide insights into their function despite the absence of any detectable sequence or fold similarities. Search for protein sites (SPRITE) and amino acid pattern search for substructures and motifs (ASSAM) are graph theoretical programs that can search for 3D amino side chain matches in protein structures, by representing the amino acid side chains as pseudo-atoms. The geometric relationship of the pseudo-atoms to each other as a pattern can be represented as a labeled graph where the pseudo-atoms are the graph's nodes while the edges are the inter-pseudo-atomic distances. Both programs require the input file to be in the PDB format. The objective of using SPRITE is to identify matches of side chains in a query structure to patterns with characterized function. In contrast, a 3D pattern of interest can be searched for existing occurrences in available PDB structures using ASSAM. Both programs are freely accessible without any login requirement. SPRITE is available at http://mfrlab.org/grafss/sprite/ while ASSAM can be accessed at http://mfrlab.org/grafss/assam/.
A major component of RNA structure stabilization are the hydrogen bonded interactions between the base residues. The importance and biological relevance for large clusters of base interactions can be much more easily investigated when their occurrences have been systematically detected, catalogued and compared. In this paper, we describe the database InterRNA (INTERactions in RNA structures database-http://mfrlab.org/interrna/) that contains records of known RNA 3D motifs as well as records for clusters of bases that are interconnected by hydrogen bonds. The contents of the database were compiled from RNA structural annotations carried out by the NASSAM (http://mfrlab.org/grafss/nassam) and COGNAC (http://mfrlab.org/grafss/cognac) computer programs. An analysis of the database content and comparisons with the existing corpus of knowledge regarding RNA 3D motifs clearly show that InterRNA is able to provide an extension of the annotations for known motifs as well as able to provide novel interactions for further investigations.
Genome3D (https://www.genome3d.eu) is a freely available resource that provides consensus structural annotations for representative protein sequences taken from a selection of model organisms. Since the last NAR update in 2015, the method of data submission has been overhauled, with annotations now being 'pushed' to the database via an API. As a result, contributing groups are now able to manage their own structural annotations, making the resource more flexible and maintainable. The new submission protocol brings a number of additional benefits including: providing instant validation of data and avoiding the requirement to synchronise releases between resources. It also makes it possible to implement the submission of these structural annotations as an automated part of existing internal workflows. In turn, these improvements facilitate Genome3D being opened up to new prediction algorithms and groups. For the latest release of Genome3D (v2.1), the underlying dataset of sequences used as prediction targets has been updated using the latest reference proteomes available in UniProtKB. A number of new reference proteomes have also been added of particular interest to the wider scientific community: cow, pig, wheat and mycobacterium tuberculosis. These additions, along with improvements to the underlying predictions from contributing resources, has ensured that the number of annotations in Genome3D has nearly doubled since the last NAR update article. The new API has also been used to facilitate the dissemination of Genome3D data into InterPro, thereby widening the visibility of both the annotation data and annotation algorithms.
Reactive oxygen species (ROS) are toxic by-products of normal aerobic metabolism. ROS can damage mRNAs and the translational apparatus resulting in translational defects and aberrant protein production. Three mRNA quality control systems monitor mRNAs for translational errors: nonsense-mediated decay, non-stop decay (NSD) and no-go decay (NGD) pathways. Here, we show that factors required for the recognition of NSD substrates and components of the SKI complex are required for oxidant tolerance. We found an overlapping requirement for Ski7, which bridges the interaction between the SKI complex and the exosome, and NGD components (Dom34/Hbs1) which have been shown to function in both NSD and NGD. We show that ski7 dom34 and ski7 hbs1 mutants are sensitive to hydrogen peroxide stress and accumulate an NSD substrate. We further show that NSD substrates are generated during ROS exposure as a result of aggregation of the Sup35 translation termination factor, which increases stop codon read-through allowing ribosomes to translate into the 3΄-end of mRNAs. Overexpression of Sup35 decreases stop codon read-through and rescues oxidant tolerance consistent with this model. Our data reveal an unanticipated requirement for the NSD pathway during oxidative stress conditions which prevents the production of aberrant proteins from NSD mRNAs.
The IntFOLD server provides a unified resource for the automated prediction of: protein tertiary structures with built-in estimates of model accuracy (EMA), protein structural domain boundaries, natively unstructured or disordered regions in proteins, and protein-ligand interactions. The component methods have been independently evaluated via the successive blind CASP experiments and the continual CAMEO benchmarking project. The IntFOLD server has established its ranking as one of the best performing publicly available servers, based on independent official evaluation metrics. Here, we describe significant updates to the server back end, where we have focused on performance improvements in tertiary structure predictions, in terms of global 3D model quality and accuracy self-estimates (ASE), which we achieve using our newly improved ModFOLD7_rank algorithm. We also report on various upgrades to the front end including: a streamlined submission process, enhanced visualization of models, new confidence scores for ranking, and links for accessing all annotated model data. Furthermore, we now include an option for users to submit selected models for further refinement via convenient push buttons. The IntFOLD server is freely available at: http://www.reading.ac.uk/bioinf/IntFOLD/.
DNA from healthy Malaysian newborns was studied on gene maps after digestion with different restriction endonucleases. Of 65 newborns, two were found to be carriers of two different variants of triplicated alpha-globin loci. In variant no. 1, found in an Malay, the three alpha-globin genes are in an elongated DNA fragment on digestion with Eco RI and Bam HI. The third alpha-globin gene was found in a additional 3.7-kb fragment on digestion with Hpa I, Bgl II and Hind III. In variant no. 2, a new type of triplicated alpha-globin loci, found in a Chinese, the three alpha-globin genes reside in an elongated DNA fragment longer than that of variant no. 1 on digestion with Eco RI and Bam HI. The third alpha-globin gene was found in an additional 4.2-kb fragment on digestion with Hpa I and Hind III. Digestion of this variant DNA with Bg1 II produced an abnormal 16.7-kb fragment in addition to the normal 7.0-kb Bgl-II fragment. The locations of the restriction sites in the two types of triplicated alpha-globin loci are compatible with a mechanism of unequal crossing over following two different modes of misalignment.