MyMedR

Displaying publications 1 - 20 of 104 in total

Abstract:

Sort:

Fulltext Molecular characterization of serous ovarian carcinoma using a multigene next generation sequencing cancer panel approach

Ab Mutalib NS, Syafruddin SE, Md Zain RR, Mohd Dali AZ, Mohd Yunos RI, Saidin S, et al.

BMC Res Notes, 2014;7:805.
PMID: 25404506 DOI: 10.1186/1756-0500-7-805

High grade serous ovarian cancer is one of the poorly characterized malignancies. This study aimed to elucidate the mutational events in Malaysian patients with high grade serous ovarian cancer by performing targeted sequencing on 50 cancer hotspot genes.

Matched MeSH terms: Databases, Genetic
Isolation and characterization of novel microsatellite loci for Asian sea bass, Lates calcarifer from genome sequence survey database

Abdul Rahman Z, Choay-Hoong L, Mat Khairuddin R, Ab Razak S, Othman AS

J Genet, 2012 Aug;91(2):e82-5.
PMID: 22932425
Matched MeSH terms: Databases, Genetic
Fulltext Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm

Abdulrauf Sharifai G, Zainol Z

Genes (Basel), 2020 06 27;11(7).
PMID: 32605144 DOI: 10.3390/genes11070717

The training machine learning algorithm from an imbalanced data set is an inherently challenging task. It becomes more demanding with limited samples but with a massive number of features (high dimensionality). The high dimensional and imbalanced data set has posed severe challenges in many real-world applications, such as biomedical data sets. Numerous researchers investigated either imbalanced class or high dimensional data sets and came up with various methods. Nonetheless, few approaches reported in the literature have addressed the intersection of the high dimensional and imbalanced class problem due to their complicated interactions. Lately, feature selection has become a well-known technique that has been used to overcome this problem by selecting discriminative features that represent minority and majority class. This paper proposes a new method called Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm (rCBR-BGOA); rCBR-BGOA has employed an ensemble of multi-filters coupled with the Correlation-Based Redundancy method to select optimal feature subsets. A binary Grasshopper optimisation algorithm (BGOA) is used to construct the feature selection process as an optimisation problem to select the best (near-optimal) combination of features from the majority and minority class. The obtained results, supported by the proper statistical analysis, indicate that rCBR-BGOA can improve the classification performance for high dimensional and imbalanced datasets in terms of G-mean and the Area Under the Curve (AUC) performance metrics.

Matched MeSH terms: Databases, Genetic/standards*
Initiating a Human Variome Project Country Node

AlAama J, Smith TD, Lo A, Howard H, Kline AA, Lange M, et al.

Hum Mutat, 2011 May;32(5):501-6.
PMID: 21305654 DOI: 10.1002/humu.21463

Genetic diseases are a pressing global health problem that requires comprehensive access to basic clinical and genetic data to counter. The creation of regional and international databases that can be easily accessed by clinicians and diagnostic labs will greatly improve our ability to accurately diagnose and treat patients with genetic disorders. The Human Variome Project is currently working in conjunction with human genetics societies to achieve this by establishing systems to collect every mutation reported by a diagnostic laboratory, clinic, or research laboratory in a country and store these within a national repository, or HVP Country Node. Nodes have already been initiated in Australia, Belgium, China, Egypt, Malaysia, and Kuwait. Each is examining how to systematically collect and share genetic, clinical, and biochemical information in a country-specific manner that is sensitive to local ethical and cultural issues. This article gathers cases of genetic data collection within countries and takes recommendations from the global community to develop a procedure for countries wishing to establish their own collection system as part of the Human Variome Project. We hope this may lead to standard practices to facilitate global collection of data and allow efficient use in clinical practice, research and therapy.

Matched MeSH terms: Databases, Genetic*
Fulltext FusoBase: an online Fusobacterium comparative genomic analysis platform

Ang MY, Heydari H, Jakubovics NS, Mahmud MI, Dutta A, Wee WY, et al.

Database (Oxford), 2014;2014.
PMID: 25149689 DOI: 10.1093/database/bau082

Fusobacterium are anaerobic gram-negative bacteria that have been associated with a wide spectrum of human infections and diseases. As the biology of Fusobacterium is still not well understood, comparative genomic analysis on members of this species will provide further insights on their taxonomy, phylogeny, pathogenicity and other information that may contribute to better management of infections and diseases. To facilitate the ongoing genomic research on Fusobacterium, a specialized database with easy-to-use analysis tools is necessary. Here we present FusoBase, an online database providing access to genome-wide annotated sequences of Fusobacterium strains as well as bioinformatics tools, to support the expanding scientific community. Using our custom-developed Pairwise Genome Comparison tool, we demonstrate how differences between two user-defined genomes and how insertion of putative prophages can be identified. In addition, Pathogenomics Profiling Tool is capable of clustering predicted genes across Fusobacterium strains and visualizing the results in the form of a heat map with dendrogram.

Matched MeSH terms: Databases, Genetic*
DENVirDB: a web portal of dengue virus sequence information on Asian isolates

Asnet MJ, Rubia AG, Ramya G, Nagalakshmi RN, Shenbagarathai R

J Vector Borne Dis, 2014 Jun;51(2):82-5.
PMID: 24947213

DENVirDB is a web portal that provides the sequence information and computationally curated information of dengue viral proteins. The advent of genomic technology has increased the sequences available in the public databases. In order to create relevant concise information on Dengue Virus (DENV), the genomic sequences were collected, analysed with the bioinformatics tools and presented as DENVirDB. It provides the comprehensive information of complete genome sequences of dengue virus isolates of Southeast Asia, viz. India, Bangladesh, Sri Lanka, East Timor, Philippines, Malaysia, Papua New Guinea, Brunei and China. DENVirDB also includes the structural and non-structural protein sequences of DENV. It intends to provide the integrated information on the physicochemical properties, topology, secondary structure, domain and structural properties for each protein sequences. It contains over 99 entries in complete genome sequences and 990 entries in protein sequences, respectively. Therefore, DENVirDB could serve as a user friendly database for researchers in acquiring sequences and proteomic information in one platform.

Matched MeSH terms: Databases, Genetic*
Marantodes pumilum: Systematic computational approach to identify their therapeutic potential and effectiveness

Azfaralariff A, Farahfaiqah F, Shahid M, Sanusi SA, Law D, Mohd Isa AR, et al.

J Ethnopharmacol, 2022 Jan 30;283:114751.
PMID: 34662662 DOI: 10.1016/j.jep.2021.114751

ETHNOPHARMACOLOGICAL RELEVANCE: Marantodes pumilum (MP) herbs, locally known as Kacip Fatimah, are widely used traditionally to improve women's health. The herb is frequently used for gynecological issues such as menstrual problems, facilitating and quickening delivery, post-partum medication, treats flatulence and dysentery, and. MP extracts are thought to aid in the firming and toning of abdominal muscles, tighten breasts and vaginal muscles, and anti-dysmenorrhea. It also was used for the treatment of gonorrhea and hemorrhoids. As MP product has been produced commercially recently, more in-depth studies should be conducted. The presence of numerous active compounds in MP might provide a synergistic effect and potentially offer other health benefits than those already identified and known.
AIM OF THE STUDY: This study aimed to use a computational target fishing approach to predict the possible therapeutic effect of Marantodes pumilum and evaluated their effectivity.
MATERIALS AND METHODS: This study involves a computational approach to identify the potential targets by using target fishing. Several databases were used: PubChem database to obtain the chemical structure of interested compounds; Traditional Chinese Medicine Systems Pharmacology Database and Analysis Platform (TCMSP) server and the SWISSADME web tool to identify and select the compounds having drug-likeness properties; PharmMapper was used to identify top ten target protein of the selected compounds and Online Mendelian Inheritance in Man (OMIM) was used to predict human genetic problems; the gene id of top-10 proteins was obtained from UniProtKB to be analyzed by using GeneMANIA server to check the genes' function and their co-expression; Gene Pathway established by Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) of the selected targets were analyzed by using EnrichR server and confirmed by using DAVID (The Database for Annotation, Visualization and Integrated Discovery) version 6.8 and STRING database. All the interaction data was analyzed by Cytoscape version 3.7.2 software. The protein structure of most putative proteins was obtained from the RCSB protein data bank. Thedocking analysis was conducted using PyRx biological software v0.8 and illustrated by BIOVIA Discovery Studio Visualizer version 20.1.0. As a preliminary evaluation, a cell viability assay using Sulforhodamine B was conducted to evaluate the potential of the predicted therapeutic effect.
RESULTS: It was found that four studied compounds are highly correlated with three proteins: EFGR, CDK2, and ESR1. These proteins are highly associated with cancer pathways, especially breast cancer and prostate cancer. Qualitatively, cell proliferation assay conducted shown that the extract has IC50 of 88.69 μg/ml against MCF-7 and 66.51 μg/ml against MDA-MB-231.
CONCLUSIONS: Natural herbs are one of the most common forms of complementary and alternative medicine, and they play an important role in disease treatment. The results of this study show that in addition to being used traditionally to maintain women's health, the use of Marantodes pumilum indirectly has the potential to protect against the development of cancer cells, especially breast cancer. Therefore, further research is necessary to confirm the potential of this plant to be used in the development of anti-cancer drugs, especially for breast cancer.

Matched MeSH terms: Databases, Genetic
New insights into the haplotype diversity of the cosmopolitan cat flea Ctenocephalides felis (Siphonaptera: Pulicidae)

Azrizal-Wahid N, Sofian-Azirun M, Low VL

Vet Parasitol, 2020 May;281:109102.
PMID: 32289653 DOI: 10.1016/j.vetpar.2020.109102

The present study investigated the genetic profile of the cosmopolitan cat flea, Ctenocephalides felis (Siphonaptera: Pulicidae) from Malaysia and the reference data available in the National Center for Biotechnology Information (NCBI) GenBank. A set of sequences of 100 Malaysian samples aligned as 550 characters of the cytochrome c oxidase subunit I (cox1) and 706 characters of the II (cox2) genes revealed ten haplotypes (A1-A10) and eight haplotypes (B1-B8), respectively. The concatenated sequences of cox1 and cox2 genes with a total of 1256 characters revealed 15 haplotypes (AB1-AB15). Analyses indicated that haplotype AB1 was the most frequent and the most widespread haplotype in Malaysia. Overall haplotype and nucleotide diversities of the concatenated sequences were 0.52909 and 0.00424, respectively, with moderate genetic differentiation (FST = 0.17522) and high gene flow (Nm = 1.18). The western population presented the highest genetic diversity (Hd = 0.78333, Pi = 0.01269, Nh = 9), whereas the southern population demonstrated the lowest diversity (Hd = 0.15667, Pi = 0.00019, Nh = 3). The concatenated sequences showed genetic distances ranged from 0.08 % to 4.39 %. There were three aberrant haplotypes in cox2 sequences that highly divergent, suggesting the presence of cryptic species or occurrence of introgression. In the global point of view, the aligned sequences of C. felis revealed 65 haplotypes (AA1-AA65) by the cox1 gene (n = 586), and 27 haplotypes (BB1-BB27) by the cox2 gene (n = 204). Mapping of the haplotype network showed that Malaysian C. felis possesses seven unique haplotypes in both genes with the common haplotypes demonstrated genetic affinity with C. felis from Southeast Asia for cox1 and South America for cox2. The topologies of cox1 and cox2 phylogenetic trees were concordant with relevant grouping pattern of haplotypes in the network but revealed two major lineages by which Malaysian haplotypes were closely related with haplotypes from the tropical region.

Matched MeSH terms: Databases, Genetic
A pattern matching approach for the estimation of alignment between any two given DNA sequences

Basu K, Sriraam N, Richard RJ

J Med Syst, 2007 Aug;31(4):247-53.
PMID: 17685148

For a given DNA sequence, it is well known that pair wise alignment schemes are used to determine the similarity with the DNA sequences available in the databanks. The efficiency of the alignment decides the type of amino acids and its corresponding proteins. In order to evaluate the given DNA sequence for its proteomic identity, a pattern matching approach is proposed in this paper. A block based semi-global alignment scheme is introduced to determine the similarity between the DNA sequences (known and given). The two DNA sequences are divided into blocks of equal length and alignment is performed which minimizes the computational complexity. The efficiency of the alignment scheme is evaluated using the parameter, percentage of similarity (POS). Four essential DNA version of the amino acids that emphasize the importance of proteomic functionalities are chosen as patterns and matching is performed with the known and given DNA sequences to determine the similarity between them. The ratio of amino acid counts between the two sequences is estimated and the results are compared with that of the POS value. It is found from the experimental results that higher the POS value and the pattern matching higher are the similarity between the two DNA sequences. The optimal block is also identified based on the POS value and amino acids count.

Matched MeSH terms: Databases, Genetic/statistics & numerical data*
Application of the threshold model for modelling and forecasting of exchange rate in selected ASEAN countries

Behrooz Gharleghi, Abu Hassan Shaari Md Nor, Tamat Sarmidi

Sains Malaysiana, 2014;43:1609-1622.

Linear time series models are not able to capture the behaviour of many financial time series, as in the cases of exchange rates and stock market data. Some phenomena, such as volatility and structural breaks in time series data, cannot be modelled implicitly using linear time series models. Therefore, nonlinear time series models are typically designed to accommodate for such nonlinear features. In the present study, a nonlinearity test and a structural change test are used to detect the nonlinearity and the break date in three ASEAN currencies, namely the Indonesian Rupiah (IDR), the Malaysian Ringgit (MYR) and the Thai Baht (THB). The study finds that the null hypothesis of linearity is rejected and evidence of structural breaks exist in the exchange rates series. Therefore, the decision to use the self-exciting threshold autoregressive (SETAR) model in the present study is justified. The results showed that the SETAR model, as a regime switching model, can explain abrupt changes in a time series. To evaluate the prediction performance of SETAR model, an Autoregressive Integrated Moving Average (ARIMA) model used as a benchmark. In order to increase the accuracy of prediction, both models are combined with an exponential generalised autoregressive conditional heteroscedasticity (EGARCH) model. The prediction results showed that the construct model of SETAR-EGARCH performs better than that of the ARIMA model and the combined ARIMA and EGARCH model. The results indicated that nonlinear models give better fitting than linear models.

Matched MeSH terms: Databases, Genetic
Fulltext Multi-platform discovery of haplotype-resolved structural variation in human genomes

Chaisson MJP, Sanders AD, Zhao X, Malhotra A, Porubsky D, Rausch T, et al.

Nat Commun, 2019 04 16;10(1):1784.
PMID: 30992455 DOI: 10.1038/s41467-018-08148-z

The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (<50 bp) and 27,622 SVs (≥50 bp) per genome. We also discover 156 inversions per genome and 58 of the inversions intersect with the critical regions of recurrent microdeletion and microduplication syndromes. Taken together, our SV callsets represent a three to sevenfold increase in SV detection compared to most standard high-throughput sequencing studies, including those from the 1000 Genomes Project. The methods and the dataset presented serve as a gold standard for the scientific community allowing us to make recommendations for maximizing structural variation sensitivity for future genome sequencing studies.

Matched MeSH terms: Databases, Genetic
Fulltext Identification of four functionally important microRNA families with contrasting differential expression profiles between drought-tolerant and susceptible rice leaf at vegetative stage

Cheah BH, Nadarajah K, Divate MD, Wickneswari R

BMC Genomics, 2015;16:692.
PMID: 26369665 DOI: 10.1186/s12864-015-1851-3

Developing drought-tolerant rice varieties with higher yield under water stressed conditions provides a viable solution to serious yield-reduction impact of drought. Understanding the molecular regulation of this polygenic trait is crucial for the eventual success of rice molecular breeding programmes. microRNAs have received tremendous attention recently due to its importance in negative regulation. In plants, apart from regulating developmental and physiological processes, microRNAs have also been associated with different biotic and abiotic stresses. Hence here we chose to analyze the differential expression profiles of microRNAs in three drought treated rice varieties: Vandana (drought-tolerant), Aday Sel (drought-tolerant) and IR64 (drought-susceptible) in greenhouse conditions via high-throughput sequencing.

Matched MeSH terms: Databases, Genetic
Interethnic comparisons of important pharmacology genes using SNP databases: potential application to drug regulatory assessments

Chen J, Teo YY, Toh DS, Sung C

Pharmacogenomics, 2010 Aug;11(8):1077-94.
PMID: 20712526 DOI: 10.2217/pgs.10.79

The frequencies of alleles implicated in drug-response variability provide vital information for public health management. Differences in frequencies between genetically diverse groups of individuals can hamper drug assessments, particularly in populations where clinical data are not readily available.

Matched MeSH terms: Databases, Genetic*
The HARX-GJR-GARCH skewed-t multipower realized volatility modelling for S&P 500

Chin WC, Nadira Mohamed Isa, Nadira Mohamed Isa, Lee MC, Poo KH

Sains Malaysiana, 2017;46:107-116.

The heterogeneous autoregressive (HAR) models are used in modeling high frequency multipower realized volatility of the
S&P 500 index. Extended from the standard realized volatility, the multipower realized volatility representations have
the advantage of handling the possible abrupt jumps by smoothing the consecutive volatility. In order to accommodate
clustering volatility and asymmetric of multipower realized volatility, the HAR model is extended by the threshold
autoregressive conditional heteroscedastic (GJR-GARCH) component. In addition, the innovations of the multipower realized
volatility are characterized by the skewed student-t distributions. The extended model provides the best performing insample
and out-of-sample forecast evaluations.

Matched MeSH terms: Databases, Genetic
In silico analysis of Burkholderia pseudomallei genome sequence for potential drug targets

Chong CE, Lim BS, Nathan S, Mohamed R

In Silico Biol. (Gedrukt), 2006;6(4):341-6.
PMID: 16922696

Recent advances in DNA sequencing technology have enabled elucidation of whole genome information from a plethora of organisms. In parallel with this technology, various bioinformatics tools have driven the comparative analysis of the genome sequences between species and within isolates. While drawing meaningful conclusions from a large amount of raw material, computer-aided identification of suitable targets for further experimental analysis and characterization, has also led to the prediction of non-human homologous essential genes in bacteria as promising candidates for novel drug discovery. Here, we present a comparative genomic analysis to identify essential genes in Burkholderia pseudomallei. Our in silico prediction has identified 312 essential genes which could also be potential drug candidates. These genes encode essential proteins to support the survival of B. pseudomallei including outer-inner membrane and surface structures, regulators, proteins involved in pathogenenicity, adaptation, chaperones as well as degradation of small and macromolecules, energy metabolism, information transfer, central/intermediate/miscellaneous metabolism pathways and some conserved hypothetical proteins of unknown function. Therefore, our in silico approach has enabled rapid screening and identification of potential drug targets for further characterization in the laboratory.

Matched MeSH terms: Databases, Genetic
Fulltext MycoCAP - Mycobacterium Comparative Analysis Platform

Choo SW, Ang MY, Dutta A, Tan SY, Siow CC, Heydari H, et al.

Sci Rep, 2015 Dec 15;5:18227.
PMID: 26666970 DOI: 10.1038/srep18227

Mycobacterium spp. are renowned for being the causative agent of diseases like leprosy, Buruli ulcer and tuberculosis in human beings. With more and more mycobacterial genomes being sequenced, any knowledge generated from comparative genomic analysis would provide better insights into the biology, evolution, phylogeny and pathogenicity of this genus, thus helping in better management of diseases caused by Mycobacterium spp.With this motivation, we constructed MycoCAP, a new comparative analysis platform dedicated to the important genus Mycobacterium. This platform currently provides information of 2108 genome sequences of at least 55 Mycobacterium spp. A number of intuitive web-based tools have been integrated in MycoCAP particularly for comparative analysis including the PGC tool for comparison between two genomes, PathoProT for comparing the virulence genes among the Mycobacterium strains and the SuperClassification tool for the phylogenic classification of the Mycobacterium strains and a specialized classification system for strains of Mycobacterium abscessus. We hope the broad range of functions and easy-to-use tools provided in MycoCAP makes it an invaluable analysis platform to speed up the research discovery on mycobacteria for researchers. Database URL: http://mycobacterium.um.edu.my.

Matched MeSH terms: Databases, Genetic
Fulltext HelicoBase: a Helicobacter genomic resource and analysis platform

Choo SW, Ang MY, Fouladi H, Tan SY, Siow CC, Mutha NV, et al.

BMC Genomics, 2014;15:600.
PMID: 25030426 DOI: 10.1186/1471-2164-15-600

Helicobacter is a genus of Gram-negative bacteria, possessing a characteristic helical shape that has been associated with a wide spectrum of human diseases. Although much research has been done on Helicobacter and many genomes have been sequenced, currently there is no specialized Helicobacter genomic resource and analysis platform to facilitate analysis of these genomes. With the increasing number of Helicobacter genomes being sequenced, comparative genomic analysis on members of this species will provide further insights on their taxonomy, phylogeny, pathogenicity and other information that may contribute to better management of diseases caused by Helicobacter pathogens.

Matched MeSH terms: Databases, Genetic*
Fulltext VibrioBase: a model for next-generation genome and annotation database development

Choo SW, Heydari H, Tan TK, Siow CC, Beh CY, Wee WY, et al.

ScientificWorldJournal, 2014;2014:569324.
PMID: 25243218 DOI: 10.1155/2014/569324

To facilitate the ongoing research of Vibrio spp., a dedicated platform for the Vibrio research community is needed to host the fast-growing amount of genomic data and facilitate the analysis of these data. We present VibrioBase, a useful resource platform, providing all basic features of a sequence database with the addition of unique analysis tools which could be valuable for the Vibrio research community. VibrioBase currently houses a total of 252 Vibrio genomes developed in a user-friendly manner and useful to enable the analysis of these genomic data, particularly in the field of comparative genomics. Besides general data browsing features, VibrioBase offers analysis tools such as BLAST interfaces and JBrowse genome browser. Other important features of this platform include our newly developed in-house tools, the pairwise genome comparison (PGC) tool, and pathogenomics profiling tool (PathoProT). The PGC tool is useful in the identification and comparative analysis of two genomes, whereas PathoProT is designed for comparative pathogenomics analysis of Vibrio strains. Both of these tools will enable researchers with little experience in bioinformatics to get meaningful information from Vibrio genomes with ease. We have tested the validity and suitability of these tools and features for use in the next-generation database development.

Matched MeSH terms: Databases, Genetic/trends*
Fulltext Genomic reconnaissance of clinical isolates of emerging human pathogen Mycobacterium abscessus reveals high evolutionary potential

Choo SW, Wee WY, Ngeow YF, Mitchell W, Tan JL, Wong GJ, et al.

Sci Rep, 2014;4:4061.
PMID: 24515248 DOI: 10.1038/srep04061

Mycobacterium abscessus (Ma) is an emerging human pathogen that causes both soft tissue infections and systemic disease. We present the first comparative whole-genome study of Ma strains isolated from patients of wide geographical origin. We found a high proportion of accessory strain-specific genes indicating an open, non-conservative pan-genome structure, and clear evidence of rapid phage-mediated evolution. Although we found fewer virulence factors in Ma compared to M. tuberculosis, our data indicated that Ma evolves rapidly and therefore should be monitored closely for the acquisition of more pathogenic traits. This comparative study provides a better understanding of Ma and forms the basis for future functional work on this important pathogen.

Matched MeSH terms: Databases, Genetic
Fulltext RNA sequencing read depth requirement for optimal transcriptome coverage in Hevea brasiliensis

Chow KS, Ghazali AK, Hoh CC, Mohd-Zainuddin Z

BMC Res Notes, 2014 Feb 01;7:69.
PMID: 24484543 DOI: 10.1186/1756-0500-7-69

BACKGROUND: One of the concerns of assembling de novo transcriptomes is determining the amount of read sequences required to ensure a comprehensive coverage of genes expressed in a particular sample. In this report, we describe the use of Illumina paired-end RNA-Seq (PE RNA-Seq) reads from Hevea brasiliensis (rubber tree) bark to devise a transcript mapping approach for the estimation of the read amount needed for deep transcriptome coverage.
FINDINGS: We optimized the assembly of a Hevea bark transcriptome based on 16 Gb Illumina PE RNA-Seq reads using the Oases assembler across a range of k-mer sizes. We then assessed assembly quality based on transcript N50 length and transcript mapping statistics in relation to (a) known Hevea cDNAs with complete open reading frames, (b) a set of core eukaryotic genes and (c) Hevea genome scaffolds. This was followed by a systematic transcript mapping process where sub-assemblies from a series of incremental amounts of bark transcripts were aligned to transcripts from the entire bark transcriptome assembly. The exercise served to relate read amounts to the degree of transcript mapping level, the latter being an indicator of the coverage of gene transcripts expressed in the sample. As read amounts or datasize increased toward 16 Gb, the number of transcripts mapped to the entire bark assembly approached saturation. A colour matrix was subsequently generated to illustrate sequencing depth requirement in relation to the degree of coverage of total sample transcripts.
CONCLUSIONS: We devised a procedure, the "transcript mapping saturation test", to estimate the amount of RNA-Seq reads needed for deep coverage of transcriptomes. For Hevea de novo assembly, we propose generating between 5-8 Gb reads, whereby around 90% transcript coverage could be achieved with optimized k-mers and transcript N50 length. The principle behind this methodology may also be applied to other non-model plants, or with reads from other second generation sequencing platforms.

Matched MeSH terms: Databases, Genetic

Filters

Please provide feedback to Administrator (afdal@afpm.org.my)

External Links