RESULTS: A first set of sORFs was identified from existing annotations that fitted the maximum of 80 residues criterion. A second set was predicted using parameters that specifically searched for ORF candidates of 80 codons or less in the exonic, intronic and intergenic sequences of the subject genomes. A total of 1986 conserved sORFs were predicted and characterized.
CONCLUSIONS: It is evident that numerous open reading frames that could potentially encode for polypeptides consisting of 80 amino acid residues or less are overlooked during standard gene prediction and annotation. From our results, additional targeted reannotation of genomes is clearly able to complement standard genome annotation to identify sORFs. Due to the lack of, and limitations with experimental validation, we propose that a simple conservation analysis can provide an acceptable means of ensuring that the predicted sORFs are sufficiently clear of gene prediction artefacts.
Methods: Microarray expression dataset GSE22255 was retrieved from the Gene Expression Omnibus (GEO) database. It includes messenger ribonucleic acid (mRNA) expression data for the peripheral blood mononuclear cells of 20 controls and 20 IS patients. The bioconductor-package 'affy' was used to calculate expression and a pairwise t-test was applied to screen DEGs (P < 0.01). Further, GSEA was used to determine the enrichment of DEGs specific to gene ontology (GO) annotations.
Results: GSEA analysis revealed 21 genes to be significantly plausible gene markers, enriched in multiple pathways among all the DEGs (n = 881). Ten gene sets were found to be core enriched in specific GO annotations. JunD, NCX3 and fibroblast growth factor receptor 4 (FGFR4) were under-represented and glycoprotein M6-B (GPM6B) was persistently over-represented.
Conclusion: The identified genes are either associated with the pathophysiology of IS or they affect post-IS neuronal regeneration, thereby influencing clinical outcome. These genes should, therefore, be evaluated for their utility as suitable markers for predicting IS in clinical scenarios.