MyMedR

Displaying publications 1 - 20 of 95 in total

Abstract:

Sort:

Fulltext Chromosome-level assembly and gene annotation of Kappaphycus striatus genome

Zhou Z, Ma Y, Zhang J, Firdaus M, Roleda MY, Duan D

Sci Data, 2025 Feb 12;12(1):249.
PMID: 39939323 DOI: 10.1038/s41597-025-04583-y

Kappaphycus striatus is one of the carrageenan-producing red algae, and found primarily in tropical and subtropical coastal regions. Its global distribution is mainly in the Philippines, Indonesia, and Malaysia, among other locations. Here, through the high-quality chromosome-level genome sequences and assembly with PacBio HiFi and Hi-C sequencing data, we assembled one genome with a total of 211.46 Mb in size, containing a contig N50 length of 5.04 Mb and a scaffold N50 length of 5.39 Mb. After Hi-C assembly and manual adjustment to the heatmap, we deduced that 199.42 Mb of genomic sequences were anchored to 33 presumed chromosomes, which accounting for 94.31% of the entire genome. One total of 14,596 protein-coding genes and 1,673 non-coding RNAs were identified, and the 100.96 Mb of repetitive sequences accounting for 47.73% of the assembled genome. Our chromosome-level genome assembly data provide valuable references for K. striatus future nursery and breeding, and will be useful for the functional genomics interpretations and evolutionary studies of eukaryotes.

Matched MeSH terms: Molecular Sequence Annotation*
Fulltext NASSAM: a server to search for and annotate tertiary interactions and motifs in three-dimensional structures of complex RNA molecules

Hamdani HY, Appasamy SD, Willett P, Artymiuk PJ, Firdaus-Raih M

Nucleic Acids Res, 2012 Jul;40(Web Server issue):W35-41.
PMID: 22661578 DOI: 10.1093/nar/gks513

Similarities in the 3D patterns of RNA base interactions or arrangements can provide insights into their functions and roles in stabilization of the RNA 3D structure. Nucleic Acids Search for Substructures and Motifs (NASSAM) is a graph theoretical program that can search for 3D patterns of base arrangements by representing the bases as pseudo-atoms. The geometric relationship of the pseudo-atoms to each other as a pattern can be represented as a labeled graph where the pseudo-atoms are the graph's nodes while the edges are the inter-pseudo-atomic distances. The input files for NASSAM are PDB formatted 3D coordinates. This web server can be used to identify matches of base arrangement patterns in a query structure to annotated patterns that have been reported in the literature or that have possible functional and structural stabilization implications. The NASSAM program is freely accessible without any login requirement at http://mfrlab.org/grafss/nassam/.

Matched MeSH terms: Molecular Sequence Annotation*
Fulltext Graph Theoretical Methods and Workflows for Searching and Annotation of RNA Tertiary Base Motifs and Substructures

Emrizal R, Hamdani HY, Firdaus-Raih M

Int J Mol Sci, 2021 Aug 09;22(16).
PMID: 34445259 DOI: 10.3390/ijms22168553

The increasing number and complexity of structures containing RNA chains in the Protein Data Bank (PDB) have led to the need for automated structure annotation methods to replace or complement expert visual curation. This is especially true when searching for tertiary base motifs and substructures. Such base arrangements and motifs have diverse roles that range from contributions to structural stability to more direct involvement in the molecule's functions, such as the sites for ligand binding and catalytic activity. We review the utility of computational approaches in annotating RNA tertiary base motifs in a dataset of PDB structures, particularly the use of graph theoretical algorithms that can search for such base motifs and annotate them or find and annotate clusters of hydrogen-bond-connected bases. We also demonstrate how such graph theoretical algorithms can be integrated into a workflow that allows for functional analysis and comparisons of base arrangements and sub-structures, such as those involved in ligand binding. The capacity to carry out such automatic curations has led to the discovery of novel motifs and can give new context to known motifs as well as enable the rapid compilation of RNA 3D motifs into a database.

Matched MeSH terms: Molecular Sequence Annotation*
Fulltext Evidence-based gene models for structural and functional annotations of the oil palm genome

Chan KL, Tatarinova TV, Rosli R, Amiruddin N, Azizi N, Halim MAA, et al.

Biol. Direct, 2017 Sep 08;12(1):21.
PMID: 28886750 DOI: 10.1186/s13062-017-0191-4

BACKGROUND: Oil palm is an important source of edible oil. The importance of the crop, as well as its long breeding cycle (10-12 years) has led to the sequencing of its genome in 2013 to pave the way for genomics-guided breeding. Nevertheless, the first set of gene predictions, although useful, had many fragmented genes. Classification and characterization of genes associated with traits of interest, such as those for fatty acid biosynthesis and disease resistance, were also limited. Lipid-, especially fatty acid (FA)-related genes are of particular interest for the oil palm as they specify oil yields and quality. This paper presents the characterization of the oil palm genome using different gene prediction methods and comparative genomics analysis, identification of FA biosynthesis and disease resistance genes, and the development of an annotation database and bioinformatics tools.
RESULTS: Using two independent gene-prediction pipelines, Fgenesh++ and Seqping, 26,059 oil palm genes with transcriptome and RefSeq support were identified from the oil palm genome. These coding regions of the genome have a characteristic broad distribution of GC3 (fraction of cytosine and guanine in the third position of a codon) with over half the GC3-rich genes (GC3 ≥ 0.75286) being intronless. In comparison, only one-seventh of the oil palm genes identified are intronless. Using comparative genomics analysis, characterization of conserved domains and active sites, and expression analysis, 42 key genes involved in FA biosynthesis in oil palm were identified. For three of them, namely EgFABF, EgFABH and EgFAD3, segmental duplication events were detected. Our analysis also identified 210 candidate resistance genes in six classes, grouped by their protein domain structures.
CONCLUSIONS: We present an accurate and comprehensive annotation of the oil palm genome, focusing on analysis of important categories of genes (GC3-rich and intronless), as well as those associated with important functions, such as FA biosynthesis and disease resistance. The study demonstrated the advantages of having an integrated approach to gene prediction and developed a computational framework for combining multiple genome annotations. These results, available in the oil palm annotation database ( http://palmxplore.mpob.gov.my ), will provide important resources for studies on the genomes of oil palm and related crops.
REVIEWERS: This article was reviewed by Alexander Kel, Igor Rogozin, and Vladimir A. Kuznetsov.

Matched MeSH terms: Molecular Sequence Annotation*
Chromosome-scale Elaeis guineensis and E. oleifera assemblies: comparative genomics of oil palm and other Arecaceae

Low EL, Chan KL, Zaki NM, Taranenko E, Ordway JM, Wischmeyer C, et al.

G3 (Bethesda), 2024 Sep 04;14(9).
PMID: 38918881 DOI: 10.1093/g3journal/jkae135

Elaeis guineensis and E. oleifera are the two species of oil palm. E. guineensis is the most widely cultivated commercial species, and introgression of desirable traits from E. oleifera is ongoing. We report an improved E. guineensis genome assembly with substantially increased continuity and completeness, as well as the first chromosome-scale E. oleifera genome assembly. Each assembly was obtained by integration of long-read sequencing, proximity ligation sequencing, optical mapping, and genetic mapping. High interspecific genome conservation is observed between the two species. The study provides the most extensive gene annotation to date, including 46,697 E. guineensis and 38,658 E. oleifera gene predictions. Analyses of repetitive element families further resolve the DNA repeat architecture of both genomes. Comparative genomic analyses identified experimentally validated small structural variants between the oil palm species and resolved the mechanism of chromosomal fusions responsible for the evolutionary descending dysploidy from 18 to 16 chromosomes.

Matched MeSH terms: Molecular Sequence Annotation
Fulltext COGNAC: a web server for searching and annotating hydrogen-bonded base interactions in RNA three-dimensional structures

Firdaus-Raih M, Hamdani HY, Nadzirin N, Ramlan EI, Willett P, Artymiuk PJ

Nucleic Acids Res, 2014 Jul;42(Web Server issue):W382-8.
PMID: 24831543 DOI: 10.1093/nar/gku438

Hydrogen bonds are crucial factors that stabilize a complex ribonucleic acid (RNA) molecule's three-dimensional (3D) structure. Minute conformational changes can result in variations in the hydrogen bond interactions in a particular structure. Furthermore, networks of hydrogen bonds, especially those found in tight clusters, may be important elements in structure stabilization or function and can therefore be regarded as potential tertiary motifs. In this paper, we describe a graph theoretical algorithm implemented as a web server that is able to search for unbroken networks of hydrogen-bonded base interactions and thus provide an accounting of such interactions in RNA 3D structures. This server, COGNAC (COnnection tables Graphs for Nucleic ACids), is also able to compare the hydrogen bond networks between two structures and from such annotations enable the mapping of atomic level differences that may have resulted from conformational changes due to mutations or binding events. The COGNAC server can be accessed at http://mfrlab.org/grafss/cognac.

Matched MeSH terms: Molecular Sequence Annotation
Fulltext InterRNA: a database of base interactions in RNA structures

Appasamy SD, Hamdani HY, Ramlan EI, Firdaus-Raih M

Nucleic Acids Res, 2016 Jan 4;44(D1):D266-71.
PMID: 26553798 DOI: 10.1093/nar/gkv1186

A major component of RNA structure stabilization are the hydrogen bonded interactions between the base residues. The importance and biological relevance for large clusters of base interactions can be much more easily investigated when their occurrences have been systematically detected, catalogued and compared. In this paper, we describe the database InterRNA (INTERactions in RNA structures database-http://mfrlab.org/interrna/) that contains records of known RNA 3D motifs as well as records for clusters of bases that are interconnected by hydrogen bonds. The contents of the database were compiled from RNA structural annotations carried out by the NASSAM (http://mfrlab.org/grafss/nassam) and COGNAC (http://mfrlab.org/grafss/cognac) computer programs. An analysis of the database content and comparisons with the existing corpus of knowledge regarding RNA 3D motifs clearly show that InterRNA is able to provide an extension of the annotations for known motifs as well as able to provide novel interactions for further investigations.

Matched MeSH terms: Molecular Sequence Annotation
Fulltext DemaDb: an integrated dematiaceous fungal genomes database

Kuan CS, Yew SM, Chan CL, Toh YF, Lee KW, Cheong WH, et al.

Database (Oxford), 2016;2016.
PMID: 26980516 DOI: 10.1093/database/baw008

Many species of dematiaceous fungi are associated with allergic reactions and potentially fatal diseases in human, especially in tropical climates. Over the past 10 years, we have isolated more than 400 dematiaceous fungi from various clinical samples. In this study, DemaDb, an integrated database was designed to support the integration and analysis of dematiaceous fungal genomes. A total of 92 072 putative genes and 6527 pathways that identified in eight dematiaceous fungi (Bipolaris papendorfii UM 226, Daldinia eschscholtzii UM 1400, D. eschscholtzii UM 1020, Pyrenochaeta unguis-hominis UM 256, Ochroconis mirabilis UM 578, Cladosporium sphaerospermum UM 843, Herpotrichiellaceae sp. UM 238 and Pleosporales sp. UM 1110) were deposited in DemaDb. DemaDb includes functional annotations for all predicted gene models in all genomes, such as Gene Ontology, EuKaryotic Orthologous Groups, Kyoto Encyclopedia of Genes and Genomes (KEGG), Pfam and InterProScan. All predicted protein models were further functionally annotated to Carbohydrate-Active enzymes, peptidases, secondary metabolites and virulence factors. DemaDb Genome Browser enables users to browse and visualize entire genomes with annotation data including gene prediction, structure, orientation and custom feature tracks. The Pathway Browser based on the KEGG pathway database allows users to look into molecular interaction and reaction networks for all KEGG annotated genes. The availability of downloadable files containing assembly, nucleic acid, as well as protein data allows the direct retrieval for further downstream works. DemaDb is a useful resource for fungal research community especially those involved in genome-scale analysis, functional genomics, genetics and disease studies of dematiaceous fungi. Database URL: http://fungaldb.um.edu.my.

Matched MeSH terms: Molecular Sequence Annotation
Fulltext PlantFuncSSR: Integrating First and Next Generation Transcriptomics for Mining of SSR-Functional Domains Markers

Sablok G, Pérez-Pulido AJ, Do T, Seong TY, Casimiro-Soriguer CS, La Porta N, et al.

Front Plant Sci, 2016;7:878.
PMID: 27446111 DOI: 10.3389/fpls.2016.00878

Analysis of repetitive DNA sequence content and divergence among the repetitive functional classes is a well-accepted approach for estimation of inter- and intra-generic differences in plant genomes. Among these elements, microsatellites, or Simple Sequence Repeats (SSRs), have been widely demonstrated as powerful genetic markers for species and varieties discrimination. We present PlantFuncSSRs platform having more than 364 plant species with more than 2 million functional SSRs. They are provided with detailed annotations for easy functional browsing of SSRs and with information on primer pairs and associated functional domains. PlantFuncSSRs can be leveraged to identify functional-based genic variability among the species of interest, which might be of particular interest in developing functional markers in plants. This comprehensive on-line portal unifies mining of SSRs from first and next generation sequencing datasets, corresponding primer pairs and associated in-depth functional annotation such as gene ontology annotation, gene interactions and its identification from reference protein databases. PlantFuncSSRs is freely accessible at: http://www.bioinfocabd.upo.es/plantssr.

Matched MeSH terms: Molecular Sequence Annotation
Fulltext PalmXplore: oil palm gene database

Sanusi NSNM, Rosli R, Halim MAA, Chan KL, Nagappan J, Azizi N, et al.

Database (Oxford), 2018 01 01;2018.
PMID: 30239681 DOI: 10.1093/database/bay095

A set of Elaeis guineensis genes had been generated by combining two gene prediction pipelines: Fgenesh++ developed by Softberry and Seqping by the Malaysian Palm Oil Board. PalmXplore was developed to provide a scalable data repository and a user-friendly search engine system to efficiently store, manage and retrieve the oil palm gene sequences and annotations. Information deposited in PalmXplore includes predicted genes, their genomic coordinates, as well as the annotations derived from external databases, such as Pfam, Gene Ontology and Kyoto Encyclopedia of Genes and Genomes. Information about genes related to important traits, such as those involved in fatty acid biosynthesis (FAB) and disease resistance, is also provided. The system offers Basic Local Alignment Search Tool homology search, where the results can be downloaded or visualized in the oil palm genome browser (MYPalmViewer). PalmXplore is regularly updated offering new features, improvements to genome annotation and new genomic sequences. The system is freely accessible at http://palmxplore.mpob.gov.my.

Matched MeSH terms: Molecular Sequence Annotation
Fulltext High-quality Schistosoma haematobium genome achieved by single-molecule and long-range sequencing

Stroehlein AJ, Korhonen PK, Chong TM, Lim YL, Chan KG, Webster B, et al.

Gigascience, 2019 Sep 01;8(9).
PMID: 31494670 DOI: 10.1093/gigascience/giz108

BACKGROUND: Schistosoma haematobium causes urogenital schistosomiasis, a neglected tropical disease affecting >100 million people worldwide. Chronic infection with this parasitic trematode can lead to urogenital conditions including female genital schistosomiasis and bladder cancer. At the molecular level, little is known about this blood fluke and the pathogenesis of the disease that it causes. To support molecular studies of this carcinogenic worm, we reported a draft genome for S. haematobium in 2012. Although a useful resource, its utility has been somewhat limited by its fragmentation.
FINDINGS: Here, we systematically enhanced the draft genome of S. haematobium using a single-molecule and long-range DNA-sequencing approach. We achieved a major improvement in the accuracy and contiguity of the genome assembly, making it superior or comparable to assemblies for other schistosome species. We transferred curated gene models to this assembly and, using enhanced gene annotation pipelines, inferred a gene set with as many or more complete gene models as those of other well-studied schistosomes. Using conserved, single-copy orthologs, we assessed the phylogenetic position of S. haematobium in relation to other parasitic flatworms for which draft genomes were available.
CONCLUSIONS: We report a substantially enhanced genomic resource that represents a solid foundation for molecular research on S. haematobium and is poised to better underpin population and functional genomic investigations and to accelerate the search for new disease interventions.

Matched MeSH terms: Molecular Sequence Annotation
Fulltext MabsBase: a Mycobacterium abscessus genome and annotation database

Heydari H, Wee WY, Lokanathan N, Hari R, Mohamed Yusoff A, Beh CY, et al.

PLoS One, 2013;8(4):e62443.
PMID: 23658631 DOI: 10.1371/journal.pone.0062443

Mycobacterium abscessus is a rapidly growing non-tuberculous mycobacterial species that has been associated with a wide spectrum of human infections. As the classification and biology of this organism is still not well understood, comparative genomic analysis on members of this species may provide further insights on their taxonomy, phylogeny, pathogenicity and other information that may contribute to better management of infections. The MabsBase described in this paper is a user-friendly database providing access to whole-genome sequences of newly discovered M. abscessus strains as well as resources for whole-genome annotations and computational predictions, to support the expanding scientific community interested in M. abscessus research. The MabsBase is freely available at http://mabscessus.um.edu.my.

Matched MeSH terms: Molecular Sequence Annotation*
Fulltext De novo assembly, characterization and functional annotation of pineapple fruit transcriptome through massively parallel sequencing

Ong WD, Voo LY, Kumar VS

PLoS One, 2012;7(10):e46937.
PMID: 23091603 DOI: 10.1371/journal.pone.0046937

BACKGROUND: Pineapple (Ananas comosus var. comosus), is an important tropical non-climacteric fruit with high commercial potential. Understanding the mechanism and processes underlying fruit ripening would enable scientists to enhance the improvement of quality traits such as, flavor, texture, appearance and fruit sweetness. Although, the pineapple is an important fruit, there is insufficient transcriptomic or genomic information that is available in public databases. Application of high throughput transcriptome sequencing to profile the pineapple fruit transcripts is therefore needed.
METHODOLOGY/PRINCIPAL FINDINGS: To facilitate this, we have performed transcriptome sequencing of ripe yellow pineapple fruit flesh using Illumina technology. About 4.7 millions Illumina paired-end reads were generated and assembled using the Velvet de novo assembler. The assembly produced 28,728 unique transcripts with a mean length of approximately 200 bp. Sequence similarity search against non-redundant NCBI database identified a total of 16,932 unique transcripts (58.93%) with significant hits. Out of these, 15,507 unique transcripts were assigned to gene ontology terms. Functional annotation against Kyoto Encyclopedia of Genes and Genomes pathway database identified 13,598 unique transcripts (47.33%) which were mapped to 126 pathways. The assembly revealed many transcripts that were previously unknown.
CONCLUSIONS: The unique transcripts derived from this work have rapidly increased of the number of the pineapple fruit mRNA transcripts as it is now available in public databases. This information can be further utilized in gene expression, genomics and other functional genomics studies in pineapple.

Matched MeSH terms: Molecular Sequence Annotation*
Fulltext Computational discovery and annotation of conserved small open reading frames in fungal genomes

Mat-Sharani S, Firdaus-Raih M

BMC Bioinformatics, 2019 Feb 04;19(Suppl 13):551.
PMID: 30717662 DOI: 10.1186/s12859-018-2550-2

BACKGROUND: Small open reading frames (smORF/sORFs) that encode short protein sequences are often overlooked during the standard gene prediction process thus leading to many sORFs being left undiscovered and/or misannotated. For many genomes, a second round of sORF targeted gene prediction can complement the existing annotation. In this study, we specifically targeted the identification of ORFs encoding for 80 amino acid residues or less from 31 fungal genomes. We then compared the predicted sORFs and analysed those that are highly conserved among the genomes.
RESULTS: A first set of sORFs was identified from existing annotations that fitted the maximum of 80 residues criterion. A second set was predicted using parameters that specifically searched for ORF candidates of 80 codons or less in the exonic, intronic and intergenic sequences of the subject genomes. A total of 1986 conserved sORFs were predicted and characterized.
CONCLUSIONS: It is evident that numerous open reading frames that could potentially encode for polypeptides consisting of 80 amino acid residues or less are overlooked during standard gene prediction and annotation. From our results, additional targeted reannotation of genomes is clearly able to complement standard genome annotation to identify sORFs. Due to the lack of, and limitations with experimental validation, we propose that a simple conservation analysis can provide an acceptable means of ensuring that the predicted sORFs are sufficiently clear of gene prediction artefacts.

Matched MeSH terms: Molecular Sequence Annotation/methods*
The complete mitogenome of the soldier crab Mictyris longicarpus (Latreille, 1806) (Crustacea: Decapoda: Mictyridae)

Tan MH, Gan HM, Lee YP, Austin CM

Mitochondrial DNA A DNA Mapp Seq Anal, 2016 05;27(3):2121-2.
PMID: 25423510 DOI: 10.3109/19401736.2014.982585

The Mictyris longicarpus (soldier crab) complete mitochondrial genome sequence is reported making it the first for the family Mictyridae and the second for the superfamily Ocypodoidea. The mitogenome is 15,548 base pairs made up of 13 protein-coding genes, 2 ribosomal subunit genes, 22 transfer RNAs and a non-coding AT-rich region. The soldier crab mitogenome gene order is characteristic of brachyuran crabs with a base composition of 36.58% for T, 19.15% for C, 32.43% for A and 11.83% for G, with an AT bias of 69.01%.

Matched MeSH terms: Molecular Sequence Annotation
The complete mitogenome of the river blackfish, Gadopsis marmoratus (Richardson, 1848) (Teleostei: Percichthyidae)

Gan HM, Tan MH, Lee YP, Austin CM

Mitochondrial DNA A DNA Mapp Seq Anal, 2016 05;27(3):2030-1.
PMID: 25329292 DOI: 10.3109/19401736.2014.974174

The mitogenome of the Australian freshwater blackfish, Gadopsis marmoratus was recovered coverage by genome skimming using the MiSeq sequencer (GenBank Accession Number: NC_024436). The blackfish mitogenome has 16,407 base pairs made up of 13 protein-coding genes, 2 ribosomal subunit genes, 22 transfer RNAs, and a 819 bp non-coding AT-rich region. This is the 5th mitogenome sequence to be reported for the family Percichthyidae.

Matched MeSH terms: Molecular Sequence Annotation
The complete mitogenome of the Australian tadpole shrimp Triops australiensis (Spencer & Hall, 1895) (Crustacea: Branchiopoda: Notostraca)

Gan HM, Tan MH, Lee YP, Austin CM

Mitochondrial DNA A DNA Mapp Seq Anal, 2016 05;27(3):2028-9.
PMID: 25329290 DOI: 10.3109/19401736.2014.974173

The mitochondrial genome sequence of the Australian tadpole shrimp, Triops australiensis is presented (GenBank Accession Number: NC_024439) and compared with other Triops species. Triops australiensis has a mitochondrial genome of 15,125 base pairs consisting of 13 protein-coding genes, 2 ribosomal subunit genes, 22 transfer RNAs, and a non-coding AT-rich region. The T. australiensis mitogenome is composed of 36.4% A, 16.1% C, 12.3% G and 35.1% T. The mitogenome gene order conforms to the primitive arrangement for Branchiopod crustaceans, which is also conserved within the Pancrustacean.

Matched MeSH terms: Molecular Sequence Annotation
The complete mitogenome of the moon crab Ashtoret lunaris (Forskal, 1775), (Crustacea; Decapoda; Matutidae)

Tan MH, Gan HM, Lee YP, Austin CM

Mitochondrial DNA A DNA Mapp Seq Anal, 2016;27(2):1313-4.
PMID: 25090387 DOI: 10.3109/19401736.2014.945572

The complete mitochondrial genome of the moon crab Ashtoret lunaris was obtained from a partial genome scan using the MiSeq sequencing system. The Ashtoret lunaris mitogenome is 15,807 base pairs in length (70% A + T content) and made up of 13 protein-coding genes, 2 ribosomal subunit genes, 22 transfer RNAs, and a putative 956 bp non-coding AT-rich region. This A. lunaris mitogenome sequence is the first for the genus, as well as the family Matutidae and superfamily Calappoidea.

Matched MeSH terms: Molecular Sequence Annotation
Development of expressed sequence tag resources for Vanda Mimi Palmer and data mining for EST-SSR

Teh SL, Chan WS, Abdullah JO, Namasivayam P

Mol Biol Rep, 2011 Aug;38(6):3903-9.
PMID: 21116862 DOI: 10.1007/s11033-010-0506-3

Vanda Mimi Palmer (VMP) is a highly sought as fragrant-orchid hybrid in Malaysia. It is economically important in cosmetic and beauty industries and also a famous potted ornamental plant. To date, no work on fragrance-related genes of vandaceous orchids has been reported from other research groups although the analysis of floral fragrance or volatiles have been extensively studied. An expressed sequence tag (EST) resource was developed for VMP principally to mine any potential fragrance-related expressed sequence tag-simple sequence repeat (EST-SSR) for future development as markers in the identification of fragrant vandaceous orchids endemic to Malaysia. Clustering, annotation and assembling of the ESTs identified 1,196 unigenes which defined 966 singletons and 230 contigs. The VMP dbEST was functionally classified by gene ontology (GO) into three groups: molecular functions (51.2%), cellular components (16.4%) and biological processes (24.6%) while the remaining 7.8% showed no hits with GO identifier. A total of 112 EST-SSR (9.4%) was mined on which at least five units of di-, tri-, tetra-, penta-, or hexa-nucleotide repeats were predicted. The di-nucleotide motif repeats appeared to be the most frequent repeats among the detected SSRs with the AT/TA types as the most abundant among the dimerics, while AAG/TTC, AGA/TCT-type were the most frequent trimerics. The mined EST-SSR is believed to be useful in the development of EST-SSR markers that is applicable in the screening and characterization of fragrance-related transcripts in closely related species.

Matched MeSH terms: Molecular Sequence Annotation
Fulltext Finding Nemo: hybrid assembly with Oxford Nanopore and Illumina reads greatly improves the clownfish (Amphiprion ocellaris) genome assembly

Tan MH, Austin CM, Hammer MP, Lee YP, Croft LJ, Gan HM

Gigascience, 2018 03 01;7(3):1-6.
PMID: 29342277 DOI: 10.1093/gigascience/gix137

Background: Some of the most widely recognized coral reef fishes are clownfish or anemonefish, members of the family Pomacentridae (subfamily: Amphiprioninae). They are popular aquarium species due to their bright colours, adaptability to captivity, and fascinating behavior. Their breeding biology (sequential hermaphrodites) and symbiotic mutualism with sea anemones have attracted much scientific interest. Moreover, there are some curious geographic-based phenotypes that warrant investigation. Leveraging on the advancement in Nanopore long read technology, we report the first hybrid assembly of the clown anemonefish (Amphiprion ocellaris) genome utilizing Illumina and Nanopore reads, further demonstrating the substantial impact of modest long read sequencing data sets on improving genome assembly statistics.
Results: We generated 43 Gb of short Illumina reads and 9 Gb of long Nanopore reads, representing approximate genome coverage of 54× and 11×, respectively, based on the range of estimated k-mer-predicted genome sizes of between 791 and 967 Mbp. The final assembled genome is contained in 6404 scaffolds with an accumulated length of 880 Mb (96.3% BUSCO-calculated genome completeness). Compared with the Illumina-only assembly, the hybrid approach generated 94% fewer scaffolds with an 18-fold increase in N50 length (401 kb) and increased the genome completeness by an additional 16%. A total of 27 240 high-quality protein-coding genes were predicted from the clown anemonefish, 26 211 (96%) of which were annotated functionally with information from either sequence homology or protein signature searches.
Conclusions: We present the first genome of any anemonefish and demonstrate the value of low coverage (∼11×) long Nanopore read sequencing in improving both genome assembly contiguity and completeness. The near-complete assembly of the A. ocellaris genome will be an invaluable molecular resource for supporting a range of genetic, genomic, and phylogenetic studies specifically for clownfish and more generally for other related fish species of the family Pomacentridae.

Matched MeSH terms: Molecular Sequence Annotation

Filters

Please provide feedback to Administrator (afdal@afpm.org.my)

External Links