MyMedR

Displaying all 9 publications

Abstract:

Sort:

Fulltext FusoBase: an online Fusobacterium comparative genomic analysis platform

Ang MY, Heydari H, Jakubovics NS, Mahmud MI, Dutta A, Wee WY, et al.

Database (Oxford), 2014;2014.
PMID: 25149689 DOI: 10.1093/database/bau082

Fusobacterium are anaerobic gram-negative bacteria that have been associated with a wide spectrum of human infections and diseases. As the biology of Fusobacterium is still not well understood, comparative genomic analysis on members of this species will provide further insights on their taxonomy, phylogeny, pathogenicity and other information that may contribute to better management of infections and diseases. To facilitate the ongoing genomic research on Fusobacterium, a specialized database with easy-to-use analysis tools is necessary. Here we present FusoBase, an online database providing access to genome-wide annotated sequences of Fusobacterium strains as well as bioinformatics tools, to support the expanding scientific community. Using our custom-developed Pairwise Genome Comparison tool, we demonstrate how differences between two user-defined genomes and how insertion of putative prophages can be identified. In addition, Pathogenomics Profiling Tool is capable of clustering predicted genes across Fusobacterium strains and visualizing the results in the form of a heat map with dendrogram.
Fulltext StaphyloBase: a specialized genomic resource for the staphylococcal research community

Heydari H, Mutha NV, Mahmud MI, Siow CC, Wee WY, Wong GJ, et al.

Database (Oxford), 2014;2014:bau010.
PMID: 24578355 DOI: 10.1093/database/bau010

With the advent of high-throughput sequencing technologies, many staphylococcal genomes have been sequenced. Comparative analysis of these strains will provide better understanding of their biology, phylogeny, virulence and taxonomy, which may contribute to better management of diseases caused by staphylococcal pathogens. We developed StaphyloBase with the goal of having a one-stop genomic resource platform for the scientific community to access, retrieve, download, browse, search, visualize and analyse the staphylococcal genomic data and annotations. We anticipate this resource platform will facilitate the analysis of staphylococcal genomic data, particularly in comparative analyses. StaphyloBase currently has a collection of 754 032 protein-coding sequences (CDSs), 19 258 rRNAs and 15 965 tRNAs from 292 genomes of different staphylococcal species. Information about these features is also included, such as putative functions, subcellular localizations and gene/protein sequences. Our web implementation supports diverse query types and the exploration of CDS- and RNA-type information in detail using an AJAX-based real-time search system. JBrowse has also been incorporated to allow rapid and seamless browsing of staphylococcal genomes. The Pairwise Genome Comparison tool is designed for comparative genomic analysis, for example, to reveal the relationships between two user-defined staphylococcal genomes. A newly designed Pathogenomics Profiling Tool (PathoProT) is also included in this platform to facilitate comparative pathogenomics analysis of staphylococcal strains. In conclusion, StaphyloBase offers access to a range of staphylococcal genomic resources as well as analysis tools for comparative analyses. Database URL: http://staphylococcus.um.edu.my/.
Fulltext DemaDb: an integrated dematiaceous fungal genomes database

Kuan CS, Yew SM, Chan CL, Toh YF, Lee KW, Cheong WH, et al.

Database (Oxford), 2016;2016.
PMID: 26980516 DOI: 10.1093/database/baw008

Many species of dematiaceous fungi are associated with allergic reactions and potentially fatal diseases in human, especially in tropical climates. Over the past 10 years, we have isolated more than 400 dematiaceous fungi from various clinical samples. In this study, DemaDb, an integrated database was designed to support the integration and analysis of dematiaceous fungal genomes. A total of 92 072 putative genes and 6527 pathways that identified in eight dematiaceous fungi (Bipolaris papendorfii UM 226, Daldinia eschscholtzii UM 1400, D. eschscholtzii UM 1020, Pyrenochaeta unguis-hominis UM 256, Ochroconis mirabilis UM 578, Cladosporium sphaerospermum UM 843, Herpotrichiellaceae sp. UM 238 and Pleosporales sp. UM 1110) were deposited in DemaDb. DemaDb includes functional annotations for all predicted gene models in all genomes, such as Gene Ontology, EuKaryotic Orthologous Groups, Kyoto Encyclopedia of Genes and Genomes (KEGG), Pfam and InterProScan. All predicted protein models were further functionally annotated to Carbohydrate-Active enzymes, peptidases, secondary metabolites and virulence factors. DemaDb Genome Browser enables users to browse and visualize entire genomes with annotation data including gene prediction, structure, orientation and custom feature tracks. The Pathway Browser based on the KEGG pathway database allows users to look into molecular interaction and reaction networks for all KEGG annotated genes. The availability of downloadable files containing assembly, nucleic acid, as well as protein data allows the direct retrieval for further downstream works. DemaDb is a useful resource for fungal research community especially those involved in genome-scale analysis, functional genomics, genetics and disease studies of dematiaceous fungi. Database URL: http://fungaldb.um.edu.my.
Fulltext SinEx DB 2.0 update 2020: database for eukaryotic single-exon coding sequences

Jorquera R, González C, Clausen PTLC, Petersen B, Holmes DS

Database (Oxford), 2021 01 28;2021.
PMID: 33507271 DOI: 10.1093/database/baab002

Single-exon coding sequences (CDSs), also known as 'single-exon genes' (SEGs), are defined as nuclear, protein-coding genes that lack introns in their CDSs. They have been studied not only to determine their origin and evolution but also because their expression has been linked to several types of human cancers and neurological/developmental disorders, and many exhibit tissue-specific transcription. We developed SinEx DB that houses DNA and protein sequence information of SEGs from 10 mammalian genomes including human. SinEx DB includes their functional predictions (KOG (euKaryotic Orthologous Groups)) and the relative distribution of these functions within species. Here, we report SinEx 2.0, a major update of SinEx DB that includes information of the occurrence, distribution and functional prediction of SEGs from 60 completely sequenced eukaryotic genomes, representing animals, fungi, protists and plants. The information is stored in a relational database built with MySQL Server 5.7, and the complete dataset of SEG sequences and their GO (Gene Ontology) functional assignations are available for downloading. SinEx DB 2.0 was built with a novel pipeline that helps disambiguate single-exon isoforms from SEGs. SinEx DB 2.0 is the largest available database for SEGs and provides a rich source of information for advancing our understanding of the evolution, function of SEGs and their associations with disorders including cancers and neurological and developmental diseases. Database URL: http://v2.sinex.cl/.
Fulltext PGD: a pangolin genome hub for the research community

Tan TK, Tan KY, Hari R, Mohamed Yusoff A, Wong GJ, Siow CC, et al.

Database (Oxford), 2016;2016.
PMID: 27616775 DOI: 10.1093/database/baw063

Pangolins (order Pholidota) are the only mammals covered by scales. We have recently sequenced and analyzed the genomes of two critically endangered Asian pangolin species, namely the Malayan pangolin (Manis javanica) and the Chinese pangolin (Manis pentadactyla). These complete genome sequences will serve as reference sequences for future research to address issues of species conservation and to advance knowledge in mammalian biology and evolution. To further facilitate the global research effort in pangolin biology, we developed the Pangolin Genome Database (PGD), as a future hub for hosting pangolin genomic and transcriptomic data and annotations, and with useful analysis tools for the research community. Currently, the PGD provides the reference pangolin genome and transcriptome data, gene sequences and functional information, expressed transcripts, pseudogenes, genomic variations, organ-specific expression data and other useful annotations. We anticipate that the PGD will be an invaluable platform for researchers who are interested in pangolin and mammalian research. We will continue updating this hub by including more data, annotation and analysis tools particularly from our research consortium.Database URL: http://pangolin-genome.um.edu.my.
Fulltext PalmXplore: oil palm gene database

Sanusi NSNM, Rosli R, Halim MAA, Chan KL, Nagappan J, Azizi N, et al.

Database (Oxford), 2018 01 01;2018.
PMID: 30239681 DOI: 10.1093/database/bay095

A set of Elaeis guineensis genes had been generated by combining two gene prediction pipelines: Fgenesh++ developed by Softberry and Seqping by the Malaysian Palm Oil Board. PalmXplore was developed to provide a scalable data repository and a user-friendly search engine system to efficiently store, manage and retrieve the oil palm gene sequences and annotations. Information deposited in PalmXplore includes predicted genes, their genomic coordinates, as well as the annotations derived from external databases, such as Pfam, Gene Ontology and Kyoto Encyclopedia of Genes and Genomes. Information about genes related to important traits, such as those involved in fatty acid biosynthesis (FAB) and disease resistance, is also provided. The system offers Basic Local Alignment Search Tool homology search, where the results can be downloaded or visualized in the oil palm genome browser (MYPalmViewer). PalmXplore is regularly updated offering new features, improvements to genome annotation and new genomic sequences. The system is freely accessible at http://palmxplore.mpob.gov.my.
Fulltext Improved ontology for eukaryotic single-exon coding sequences in biological databases

Jorquera R, González C, Clausen P, Petersen B, Holmes DS

Database (Oxford), 2018 01 01;2018:1-6.
PMID: 30239665 DOI: 10.1093/database/bay089

Efficient extraction of knowledge from biological data requires the development of structured vocabularies to unambiguously define biological terms. This paper proposes descriptions and definitions to disambiguate the term 'single-exon gene'. Eukaryotic Single-Exon Genes (SEGs) have been defined as genes that do not have introns in their protein coding sequences. They have been studied not only to determine their origin and evolution but also because their expression has been linked to several types of human cancer and neurological/developmental disorders and many exhibit tissue-specific transcription. Unfortunately, the term 'SEGs' is rife with ambiguity, leading to biological misinterpretations. In the classic definition, no distinction is made between SEGs that harbor introns in their untranslated regions (UTRs) versus those without. This distinction is important to make because the presence of introns in UTRs affects transcriptional regulation and post-transcriptional processing of the mRNA. In addition, recent whole-transcriptome shotgun sequencing has led to the discovery of many examples of single-exon mRNAs that arise from alternative splicing of multi-exon genes, these single-exon isoforms are being confused with SEGs despite their clearly different origin. The increasing expansion of RNA-seq datasets makes it imperative to distinguish the different SEG types before annotation errors become indelibly propagated in biological databases. This paper develops a structured vocabulary for their disambiguation, allowing a major reassessment of their evolutionary trajectories, regulation, RNA processing and transport, and provides the opportunity to improve the detection of gene associations with disorders including cancers, neurological and developmental diseases.
Fulltext PCOSBase: a manually curated database of polycystic ovarian syndrome

Afiqah-Aleng N, Harun S, A-Rahman MRA, Nor Muhammad NA, Mohamed-Hussein ZA

Database (Oxford), 2017 Jan 01;2017.
PMID: 31725861 DOI: 10.1093/database/bax098

Polycystic ovarian syndrome (PCOS) is one of the main causes of infertility and affects 5-20% women of reproductive age. Despite the increased prevalence of PCOS, the mechanisms involved in its pathogenesis and pathophysiology remains unclear. The expansion of omics on studying the mechanisms of PCOS has lead into vast amounts of proteins related to PCOS resulting to a challenge in collating and depositing this deluge of data into one place. A knowledge-based repository named as PCOSBase was developed to systematically store all proteins related to PCOS. These proteins were compiled from various online databases and published expression studies. Rigorous criteria were developed to identify those that were highly related to PCOS. They were manually curated and analysed to provide additional information on gene ontologies, pathways, domains, tissue localizations and diseases that associate with PCOS. Other proteins that might interact with PCOS-related proteins identified from this study were also included. Currently, 8185 PCOS-related proteins were identified and assigned to 13 237 gene ontology vocabulary, 1004 pathways, 7936 domains, 29 disease classes, 1928 diseases, 91 tissues and 320 472 interactions. All publications related to PCOS are also indexed in PCOSBase. Data entries are searchable in the main page, search, browse and datasets tabs. Protein advanced search is provided to search for specific proteins. To date, PCOSBase has the largest collection of PCOS-related proteins. PCOSBase aims to become a self-contained database that can be used to further understand the PCOS pathogenesis and towards the identification of potential PCOS biomarkers. Database URL: http://pcosbase.org.
Fulltext SuCComBase: a manually curated repository of plant sulfur-containing compounds

Harun S, Abdullah-Zawawi MR, A-Rahman MRA, Muhammad NAN, Mohamed-Hussein ZA

Database (Oxford), 2019 01 01;2019.
PMID: 30793170 DOI: 10.1093/database/baz021

Plants produce a wide range of secondary metabolites that play important roles in plant defense and immunity, their interaction with the environment and symbiotic associations. Sulfur-containing compounds (SCCs) are a group of important secondary metabolites produced in members of the Brassicales order. SCCs constitute various groups of phytochemicals, but not much is known about them. Findings from previous studies on SCCs were scattered in published literatures, hence SuCComBase was developed to store all molecular information related to the biosynthesis of SCCs. Information that includes genes, proteins and compounds that are involved in the SCC biosynthetic pathway was manually identified from databases and published scientific literatures. Sets of co-expression data was analyzed to search for other possible (previously unknown) genes that might be involved in the biosynthesis of SCC. These genes were named as potential SCC-related encoding genes. A total of 147 known and 92 putative Arabidopsis thaliana SCC-related genes from literatures were used to identify other potential SCC-related encoding genes. We identified 778 potential SCC-related encoding genes, 4026 homologs to the SCC-related encoding genes and 116 SCCs as shown on SuCComBase homepage. Data entries are searchable from the Main page, Search, Browse and Datasets tabs. Users can easily download all data stored in SuCComBase. All publications related to SCCs are also indexed in SuCComBase, which is currently the first and only database dedicated to plant SCCs. SuCComBase aims to become a manually curated and au fait knowledge-based repository for plant SCCs.