RESULTS: We present an automated gene prediction pipeline, Seqping that uses self-training HMM models and transcriptomic data. The pipeline processes the genome and transcriptome sequences of the target species using GlimmerHMM, SNAP, and AUGUSTUS pipelines, followed by MAKER2 program to combine predictions from the three tools in association with the transcriptomic evidence. Seqping generates species-specific HMMs that are able to offer unbiased gene predictions. The pipeline was evaluated using the Oryza sativa and Arabidopsis thaliana genomes. Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis showed that the pipeline was able to identify at least 95% of BUSCO's plantae dataset. Our evaluation shows that Seqping was able to generate better gene predictions compared to three HMM-based programs (MAKER2, GlimmerHMM and AUGUSTUS) using their respective available HMMs. Seqping had the highest accuracy in rice (0.5648 for CDS, 0.4468 for exon, and 0.6695 nucleotide structure) and A. thaliana (0.5808 for CDS, 0.5955 for exon, and 0.8839 nucleotide structure).
CONCLUSIONS: Seqping provides researchers a seamless pipeline to train species-specific HMMs and predict genes in newly sequenced or less-studied genomes. We conclude that the Seqping pipeline predictions are more accurate than gene predictions using the other three approaches with the default or available HMMs.
METHODS: The genome sequence was used as a reference to study gene expression during growth in a starved carbon (C) and nitrogen (N) environment with minimal sugar and sawdust as initial energy sources. This study was conducted to mimic possible limitations of the C-N nutrient sources during the growth of G. boninense in oil palm plantations.
RESULTS: Genome sequencing of an isolate collected from a palm tree in West Malaysia generated an assembly of 67.12 Mb encoding 19,851 predicted genes. Transcriptomic analysis from a time course experiment during growth in this starvation media identified differentially expressed genes (DEGs) that were found to be associated with 29 metabolic pathways. During the active growth phase, 26 DEGs were related to four pathways, including secondary metabolite biosynthesis, carbohydrate metabolism, glycan metabolism and mycotoxin biosynthesis. G. boninense genes involved in the carbohydrate metabolism pathway that contribute to the degradation of plant cell walls were up-regulated. Interestingly, several genes associated with the mycotoxin biosynthesis pathway were identified as playing a possible role in pathogen-host interaction. In addition, metabolomics analysis revealed six metabolites, maltose, xylobiose, glucooligosaccharide, glycylproline, dimethylfumaric acid and arabitol that were up-regulated on Day2 of the time course experiment.
CONCLUSIONS: This study provides information on genes expressed by G. boninense in metabolic pathways that may play a role in the initial infection of the host.