RESULTS: We present an automated gene prediction pipeline, Seqping that uses self-training HMM models and transcriptomic data. The pipeline processes the genome and transcriptome sequences of the target species using GlimmerHMM, SNAP, and AUGUSTUS pipelines, followed by MAKER2 program to combine predictions from the three tools in association with the transcriptomic evidence. Seqping generates species-specific HMMs that are able to offer unbiased gene predictions. The pipeline was evaluated using the Oryza sativa and Arabidopsis thaliana genomes. Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis showed that the pipeline was able to identify at least 95% of BUSCO's plantae dataset. Our evaluation shows that Seqping was able to generate better gene predictions compared to three HMM-based programs (MAKER2, GlimmerHMM and AUGUSTUS) using their respective available HMMs. Seqping had the highest accuracy in rice (0.5648 for CDS, 0.4468 for exon, and 0.6695 nucleotide structure) and A. thaliana (0.5808 for CDS, 0.5955 for exon, and 0.8839 nucleotide structure).
CONCLUSIONS: Seqping provides researchers a seamless pipeline to train species-specific HMMs and predict genes in newly sequenced or less-studied genomes. We conclude that the Seqping pipeline predictions are more accurate than gene predictions using the other three approaches with the default or available HMMs.
METHODS: Fourteen datasets extracted from three published papers were used in a meta-analysis to examine the cyclic behaviour of the Arabidopsis thaliana photosynthesis-related gene CAB2 and the clock oscillator genes TOC1 and LHY in T cycles and N-H cycles.
KEY RESULTS: Changes in the rhythms of CAB2, TOC1 and LHY in plants subjected to non-24-h light:dark cycles matched the hypothesized changes in their behaviour as predicted by the solar clock model, thus validating it. The analysis further showed that TOC1 expression peaked ∼5·5 h after mid-day, CAB2 peaked close to noon, while LHY peaked ∼7·5 h after midnight, regardless of the cycle period, the photoperiod or the light:dark period ratio. The solar clock model correctly predicted the zeitgeber timing of these genes under 11 different lighting regimes comprising combinations of seven light periods, nine dark periods, four cycle periods and four light:dark period ratios. In short cycles that terminated before LHY could be expressed, the solar clock correctly predicted zeitgeber timing of its expression in the following cycle.
CONCLUSIONS: Regulation of gene phases by the solar clock enables the plant to tell the time, by which means a large number of genes are regulated. This facilitates the initiation of gene expression even before the arrival of sunrise, sunset or noon, thus allowing the plant to 'anticipate' dawn, dusk or mid-day respectively, independently of the photoperiod.