Affiliations 

  • 1 Centre for Frontier Sciences, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, UKM, 43600, Bangi, Selangor, Malaysia
  • 2 Centre for Frontier Sciences, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, UKM, 43600, Bangi, Selangor, Malaysia. firdaus@mfrlab.org
BMC Bioinformatics, 2019 Feb 04;19(Suppl 13):551.
PMID: 30717662 DOI: 10.1186/s12859-018-2550-2

Abstract

BACKGROUND: Small open reading frames (smORF/sORFs) that encode short protein sequences are often overlooked during the standard gene prediction process thus leading to many sORFs being left undiscovered and/or misannotated. For many genomes, a second round of sORF targeted gene prediction can complement the existing annotation. In this study, we specifically targeted the identification of ORFs encoding for 80 amino acid residues or less from 31 fungal genomes. We then compared the predicted sORFs and analysed those that are highly conserved among the genomes.

RESULTS: A first set of sORFs was identified from existing annotations that fitted the maximum of 80 residues criterion. A second set was predicted using parameters that specifically searched for ORF candidates of 80 codons or less in the exonic, intronic and intergenic sequences of the subject genomes. A total of 1986 conserved sORFs were predicted and characterized.

CONCLUSIONS: It is evident that numerous open reading frames that could potentially encode for polypeptides consisting of 80 amino acid residues or less are overlooked during standard gene prediction and annotation. From our results, additional targeted reannotation of genomes is clearly able to complement standard genome annotation to identify sORFs. Due to the lack of, and limitations with experimental validation, we propose that a simple conservation analysis can provide an acceptable means of ensuring that the predicted sORFs are sufficiently clear of gene prediction artefacts.

* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.