Affiliations 

  • 1 Center for Bioinformatics and Genome Biology, Fundacion Ciencia & Vida, Zañartu 1482, Ñuñoa Santiago 7780132, Chile
  • 2 Department of Global Surveillance, Technical University of Denmark, Kemitorvet building 204, 2800 Kgs. Lyngby, Denmark
  • 3 Section for Evolutionary Genomics, The GLOBE Institute, University of Copenhagen, Hovedstaden, Øster Voldgade 5-7, Copenhagen 1350, Denmark
Database (Oxford), 2021 01 28;2021.
PMID: 33507271 DOI: 10.1093/database/baab002

Abstract

Single-exon coding sequences (CDSs), also known as 'single-exon genes' (SEGs), are defined as nuclear, protein-coding genes that lack introns in their CDSs. They have been studied not only to determine their origin and evolution but also because their expression has been linked to several types of human cancers and neurological/developmental disorders, and many exhibit tissue-specific transcription. We developed SinEx DB that houses DNA and protein sequence information of SEGs from 10 mammalian genomes including human. SinEx DB includes their functional predictions (KOG (euKaryotic Orthologous Groups)) and the relative distribution of these functions within species. Here, we report SinEx 2.0, a major update of SinEx DB that includes information of the occurrence, distribution and functional prediction of SEGs from 60 completely sequenced eukaryotic genomes, representing animals, fungi, protists and plants. The information is stored in a relational database built with MySQL Server 5.7, and the complete dataset of SEG sequences and their GO (Gene Ontology) functional assignations are available for downloading. SinEx DB 2.0 was built with a novel pipeline that helps disambiguate single-exon isoforms from SEGs. SinEx DB 2.0 is the largest available database for SEGs and provides a rich source of information for advancing our understanding of the evolution, function of SEGs and their associations with disorders including cancers and neurological and developmental diseases. Database URL: http://v2.sinex.cl/.

* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.