RESULTS: This study describes a large-scale, systematic bioinformatics approach for identification and characterization of shared sequences between the host and pathogen. An application of the approach is demonstrated through identification and characterization of the Flaviviridae-human share-ome. A total of 2430 nonamers represented the Flaviviridae-human share-ome with 100% identity. Although the share-ome represented a small fraction of the repertoire of Flaviviridae (~ 0.12%) and human (~ 0.013%) non-redundant nonamers, the 2430 shared nonamers mapped to 16,946 Flaviviridae and 7506 human non-redundant protein sequences. The shared nonamer sequences mapped to 125 species of Flaviviridae, including several with unclassified genus. The majority (~ 68%) of the shared sequences mapped to Hepacivirus C species; West Nile, dengue and Zika viruses of the Flavivirus genus accounted for ~ 11%, ~ 7%, and ~ 3%, respectively, of the Flaviviridae protein sequences (16,946) mapped by the share-ome. Further characterization of the share-ome provided important structural-functional insights to Flaviviridae-human interactions.
CONCLUSION: Mapping of the host-pathogen share-ome has important implications for the design of vaccines and drugs, diagnostics, disease surveillance and the discovery of unknown, potential host-pathogen interactions. The generic workflow presented herein is potentially applicable to a variety of pathogens, such as of viral, bacterial or parasitic origin.
RESULTS: We re-sequenced the H. gammarus mitogenome on an Oxford Nanopore Minion flowcell and performed a long-read only assembly, generating a complete mitogenome assembly for H. gammarus. In contrast to previous reporting, we found an intact mitochondrial nad2 gene in the H. gammarus mitogenome and showed that its gene organization is broadly similar to that of the American lobster (H. americanus) except for the presence of a large tandemly duplicated region with evidence of pseudogenization in one of each duplicated protein-coding genes.
CONCLUSIONS: Using the European lobster as an example, we demonstrate the value of Oxford Nanopore long read technology in resolving problematic mitogenome assemblies. The increasing accessibility of Oxford Nanopore technology will make it an attractive and useful tool for evolutionary biologists to verify new and existing unusual mitochondrial gene rearrangements recovered using first and second generation sequencing technologies, particularly those used to make phylogenetic inferences of evolutionary scenarios.
RESULTS: We analyzed the whole-genome deep sequencing data (~ 30×) of five native trios from Peninsular Malaysia and North Borneo, and characterized the genomic variants, including single nucleotide variants (SNVs), small insertions and deletions (indels) and copy number variants (CNVs). We discovered approximately 6.9 million SNVs, 1.2 million indels, and 9000 CNVs in the 15 samples, of which 2.7% SNVs, 2.3% indels and 22% CNVs were novel, implying the insufficient coverage of population diversity in existing databases. We identified a higher proportion of novel variants in the Orang Asli (OA) samples, i.e., the indigenous people from Peninsular Malaysia, than that of the North Bornean (NB) samples, likely due to more complex demographic history and long-time isolation of the OA groups. We used the pedigree information to identify de novo variants and estimated the autosomal mutation rates to be 0.81 × 10- 8 - 1.33 × 10- 8, 1.0 × 10- 9 - 2.9 × 10- 9, and ~ 0.001 per site per generation for SNVs, indels, and CNVs, respectively. The trio-genomes also allowed for haplotype phasing with high accuracy, which serves as references to the future genomic studies of OA and NB populations. In addition, high-frequency inherited CNVs specific to OA or NB were identified. One example is a 50-kb duplication in DEFA1B detected only in the Negrito trios, implying plausible effects on host defense against the exposure of diverse microbial in tropical rainforest environment of these hunter-gatherers. The CNVs shared between OA and NB groups were much fewer than those specific to each group. Nevertheless, we identified a 142-kb duplication in AMY1A in all the 15 samples, and this gene is associated with the high-starch diet. Moreover, novel insertions shared with archaic hominids were identified in our samples.
CONCLUSION: Our study presents a full catalogue of the genome variants of the native Malaysian populations, which is a complement of the genome diversity in Southeast Asians. It implies specific population history of the native inhabitants, and demonstrated the necessity of more genome sequencing efforts on the multi-ethnic native groups of Malaysia and Southeast Asia.
RESULTS: The study generated 846,762 high quality sequence reads, with an average length of 334 bp and totalling 283 Mbp. De novo assembly generated 36,384 and 35,269 unigene sequences for M. acuminata Calcutta 4 and Cavendish Grande Naine, respectively. A total of 64.4% of the unigenes were annotated through Basic Local Alignment Search Tool (BLAST) similarity analyses against public databases.Assembled sequences were functionally mapped to Gene Ontology (GO) terms, with unigene functions covering a diverse range of molecular functions, biological processes and cellular components. Genes from a number of defense-related pathways were observed in transcripts from each cDNA library. Over 99% of contig unigenes mapped to exon regions in the reference M. acuminata DH Pahang whole genome sequence. A total of 4068 genic-SSR loci were identified in Calcutta 4 and 4095 in Cavendish Grande Naine. A subset of 95 potential defense-related gene-derived simple sequence repeat (SSR) loci were validated for specific amplification and polymorphism across M. acuminata accessions. Fourteen loci were polymorphic, with alleles per polymorphic locus ranging from 3 to 8 and polymorphism information content ranging from 0.34 to 0.82.
CONCLUSIONS: A large set of unigenes were characterized in this study for both M. acuminata Calcutta 4 and Cavendish Grande Naine, increasing the number of public domain Musa ESTs. This transcriptome is an invaluable resource for furthering our understanding of biological processes elicited during biotic stresses in Musa. Gene-based markers will facilitate molecular breeding strategies, forming the basis of genetic linkage mapping and analysis of quantitative trait loci.
RESULTS: More than 15,000 partial sequences were generated from the 5' and 3' ends of clones randomly selected from an E. tenella second generation merozoite full-length cDNA library. Clustering of these sequences produced 1,529 unique transcripts (UTs). Based on the transcript assembly and subsequently primer walking, 433 full-length cDNA sequences were successfully generated. These sequences varied in length, ranging from 441 bp to 3,083 bp, with an average size of 1,647 bp. Simple sequence repeat (SSR) analysis identified CAG as the most abundant trinucleotide motif, while codon usage analysis revealed that the ten most infrequently used codons in E. tenella are UAU, UGU, GUA, CAU, AUA, CGA, UUA, CUA, CGU and AGU. Subsequent analysis of the E. tenella complete coding sequences identified 25 putative secretory and 60 putative surface proteins, all of which are now rational candidates for development as recombinant vaccines or drug targets in the effort to control avian coccidiosis.
CONCLUSIONS: This paper describes the generation and characterisation of full-length cDNA sequences from E. tenella second generation merozoites and provides new insights into the E. tenella transcriptome. The data generated will be useful for the development and validation of diagnostic and control strategies for coccidiosis and will be of value in annotation of the E. tenella genome sequence.
RESULTS: Here, we reported the first Oceanospirillum phage, vB_OliS_GJ44, which was assembled into a 33,786 bp linear dsDNA genome, which includes abundant tail-related and recombinant proteins. The recombinant module was highly adapted to the host, according to the tetranucleotides correlations. Genomic and morphological analyses identified vB_OliS_GJ44 as a siphovirus, however, due to the distant evolutionary relationship with any other known siphovirus, it is proposed that this virus could be classified as the type phage of a new Oceanospirivirus genus within the Siphoviridae family. vB_OliS_GJ44 showed synteny with six uncultured phages, which supports its representation in uncultured environmental viral contigs from metagenomics. Homologs of several vB_OliS_GJ44 genes have mostly been found in marine metagenomes, suggesting the prevalence of this phage genus in the oceans.
CONCLUSIONS: These results describe the first Oceanospirillum phage, vB_OliS_GJ44, that represents a novel viral cluster and exhibits interesting genetic features related to phage-host interactions and evolution. Thus, we propose a new viral genus Oceanospirivirus within the Siphoviridae family to reconcile this cluster, with vB_OliS_GJ44 as a representative member.
RESULTS: Using a combination of short (10X Genomics) and long read (PacBio HiFi, PacBio CLR) sequencing and a genetic map for the GIFT strain, we generated a chromosome level genome assembly for the GIFT. Using genomes of two closely related species (O. mossambicus, O. aureus), we characterised the extent of introgression between these species and O. niloticus that has occurred during the breeding process. Over 11 Mb of O. mossambicus genomic material could be identified within the GIFT genome, including genes associated with immunity but also with traits of interest such as growth rate.
CONCLUSION: Because of the breeding history of elite strains, current reference genomes might not be the most suitable to support further studies into the GIFT strain. We generated a chromosome level assembly of the GIFT strain, characterising its mixed origins, and the potential contributions of introgressed regions to selected traits.
RESULTS: The chloroplast genome of Gracilaria firma maps as a circular molecule of 187,001 bp and contains 252 genes, which are distributed on both strands and consist of 35 RNA genes (3 rRNAs, 30 tRNAs, tmRNA and a ribonuclease P RNA component) and 217 protein-coding genes, including the unidentified open reading frames. The chloroplast genome of G. firma is by far the largest reported for Gracilariaceae, featuring a unique intergenic region of about 7000 bp with discontinuous vestiges of red algal plasmid DNA sequences interspersed between the nblA and cpeB genes. This chloroplast genome shows similar gene content and order to other Florideophycean taxa. Phylogenomic analyses based on the concatenated amino acid sequences of 146 protein-coding genes confirmed the monophyly of the classes Bangiophyceae and Florideophyceae with full nodal support. Relationships within the subclass Rhodymeniophycidae in Florideophyceae received moderate to strong nodal support, and the monotypic family of Gracilariales were resolved with maximum support.
CONCLUSIONS: Chloroplast genomes hold substantial information that can be tapped for resolving the phylogenetic relationships of difficult regions in the Rhodymeniophycidae, which are perceived to have experienced rapid radiation and thus received low nodal support, as exemplified in this study. The present study shows that chloroplast genome of G. firma could serve as a key link to the full resolution of Gracilaria sensu lato complex and recognition of Hydropuntia as a genus distinct from Gracilaria sensu stricto.
RESULTS: Remarkably, modules could be grouped into just four functional themes: transcription regulation, immunological, extracellular, and neurological, with module generation frequently driven by lncRNA tissue specificity. Notably, three modules associated with the extracellular matrix represented potential networks of lncRNAs regulating key events in tumour progression. These included a tumour-specific signature of 33 lncRNAs that may play a role in inducing epithelial-mesenchymal transition through modulation of TGFβ signalling, and two stromal-specific modules comprising 26 lncRNAs linked to a tumour suppressive microenvironment and 12 lncRNAs related to cancer-associated fibroblasts. One member of the 12-lncRNA signature was experimentally supported by siRNA knockdown, which resulted in attenuated differentiation of quiescent fibroblasts to a cancer-associated phenotype.
CONCLUSIONS: Overall, the study provides a unique pan-cancer perspective on the lncRNA functional landscape, acting as a global source of novel hypotheses on lncRNA contribution to tumour progression.
RESULTS: In this study we generated Whole Exome Sequencing (WES), Reduced Representation Bisulfite Sequencing (RRBS) and RNA sequencing (RNA-seq) data from samples with known mixtures of mouse and human DNA or RNA and from a cohort of human breast cancers and their derived PDTXs. We show that using an In silico Combined human-mouse Reference Genome (ICRG) for alignment discriminates between human and mouse reads with up to 99.9% accuracy and decreases the number of false positive somatic mutations caused by misalignment by >99.9%. We also derived a model to estimate the human DNA content in independent PDTX samples. For RNA-seq and RRBS data analysis, the use of the ICRG allows dissecting computationally the transcriptome and methylome of human tumour cells and mouse stroma. In a direct comparison with previously reported approaches, our method showed similar or higher accuracy while requiring significantly less computing time.
CONCLUSIONS: The computational pipeline we describe here is a valuable tool for the molecular analysis of PDTXs as well as any other mixture of DNA or RNA species.
RESULTS: A gene-oriented database is a valuable support for rubber research. Based on our original draft genome sequence of H. brasiliensis RRIM600, we constructed a rubber tree genome and transcriptome DB. Our DB provides genome information including gene functional annotations and multi-transcriptome data of RNA-seq, full-length cDNAs including PacBio Isoform sequencing (Iso-Seq), ESTs and genome wide transcription start sites (TSSs) derived from CAGE technology. Using our original and publically available RNA-seq data, we calculated co-expressed genes for identifying functionally related gene sets and/or genes regulated by the same transcription factor (TF). Users can access multi-transcriptome data through both a gene-oriented web page and a genome browser. For the gene searching system, we provide keyword search, sequence homology search and gene expression search; users can also select their expression threshold easily.
CONCLUSION: The rubber genome and transcriptome DB provides rubber tree genome sequence and multi-transcriptomics data. This DB is useful for comprehensive understanding of the rubber transcriptome. This will assist both industrial and academic researchers for rubber and economically important close relatives such as R. communis, M. esculenta and J. curcas. The Rubber Transcriptome DB release 2017.03 is accessible at http://matsui-lab.riken.jp/rubber/ .