MATERIALS AND METHODS: Biofilm yield of 32 Helicobacter pylori strains (standard strain and 31 clinical strains) were determined by crystal-violet assay and grouped into poor, moderate and good biofilm forming groups. Whole genome sequencing of these 32 clinical strains was performed on the Illumina MiSeq platform. Annotation and comparison of the differences between the genomic sequences were carried out using RAST (Rapid Annotation using Subsystem Technology) and SEED viewer. Genes identified were confirmed using PCR.
RESULTS: Genes identified to be associated with biofilm formation in H. pylori includes alpha (1,3)-fucosyltransferase, flagellar protein, 3 hypothetical proteins, outer membrane protein and a cag pathogenicity island protein. These genes play a role in bacterial motility, lipopolysaccharide (LPS) synthesis, Lewis antigen synthesis, adhesion and/or the type-IV secretion system (T4SS). Deletion of cagA and cagPAI confirmed that CagA and T4SS were involved in H. pylori biofilm formation.
CONCLUSIONS: Results from this study suggest that biofilm formation in H. pylori might be genetically determined and might be influenced by multiple genes. Good, moderate and poor biofilm forming strain might differ during the initiation of biofilm formation.
RESULTS: In this study we generated Whole Exome Sequencing (WES), Reduced Representation Bisulfite Sequencing (RRBS) and RNA sequencing (RNA-seq) data from samples with known mixtures of mouse and human DNA or RNA and from a cohort of human breast cancers and their derived PDTXs. We show that using an In silico Combined human-mouse Reference Genome (ICRG) for alignment discriminates between human and mouse reads with up to 99.9% accuracy and decreases the number of false positive somatic mutations caused by misalignment by >99.9%. We also derived a model to estimate the human DNA content in independent PDTX samples. For RNA-seq and RRBS data analysis, the use of the ICRG allows dissecting computationally the transcriptome and methylome of human tumour cells and mouse stroma. In a direct comparison with previously reported approaches, our method showed similar or higher accuracy while requiring significantly less computing time.
CONCLUSIONS: The computational pipeline we describe here is a valuable tool for the molecular analysis of PDTXs as well as any other mixture of DNA or RNA species.
RESULTS: We present an automated gene prediction pipeline, Seqping that uses self-training HMM models and transcriptomic data. The pipeline processes the genome and transcriptome sequences of the target species using GlimmerHMM, SNAP, and AUGUSTUS pipelines, followed by MAKER2 program to combine predictions from the three tools in association with the transcriptomic evidence. Seqping generates species-specific HMMs that are able to offer unbiased gene predictions. The pipeline was evaluated using the Oryza sativa and Arabidopsis thaliana genomes. Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis showed that the pipeline was able to identify at least 95% of BUSCO's plantae dataset. Our evaluation shows that Seqping was able to generate better gene predictions compared to three HMM-based programs (MAKER2, GlimmerHMM and AUGUSTUS) using their respective available HMMs. Seqping had the highest accuracy in rice (0.5648 for CDS, 0.4468 for exon, and 0.6695 nucleotide structure) and A. thaliana (0.5808 for CDS, 0.5955 for exon, and 0.8839 nucleotide structure).
CONCLUSIONS: Seqping provides researchers a seamless pipeline to train species-specific HMMs and predict genes in newly sequenced or less-studied genomes. We conclude that the Seqping pipeline predictions are more accurate than gene predictions using the other three approaches with the default or available HMMs.