Displaying publications 1 - 20 of 96 in total

Abstract:
Sort:
  1. A Rahaman SN, Mat Yusop J, Mohamed-Hussein ZA, Aizat WM, Ho KL, Teh AH, et al.
    PeerJ, 2018;6:e5377.
    PMID: 30280012 DOI: 10.7717/peerj.5377
    Proteins of the DUF866 superfamily are exclusively found in eukaryotic cells. A member of the DUF866 superfamily, C1ORF123, is a human protein found in the open reading frame 123 of chromosome 1. The physiological role of C1ORF123 is yet to be determined. The only available protein structure of the DUF866 family shares just 26% sequence similarity and does not contain a zinc binding motif. Here, we present the crystal structure of the recombinant human C1ORF123 protein (rC1ORF123). The structure has a 2-fold internal symmetry dividing the monomeric protein into two mirrored halves that comprise of distinct electrostatic potential. The N-terminal half of rC1ORF123 includes a zinc-binding domain interacting with a zinc ion near to a potential ligand binding cavity. Functional studies of human C1ORF123 and its homologue in the fission yeast Schizosaccharomyces pombe (SpEss1) point to a role of DUF866 protein in mitochondrial oxidative phosphorylation.
    Matched MeSH terms: Open Reading Frames
  2. Amerizadeh A, Idris ZM, Khoo BY, Kotresha D, Yunus MH, Karim IZ, et al.
    Microb Pathog, 2013 Jan;54:60-6.
    PMID: 23044055 DOI: 10.1016/j.micpath.2012.09.006
    Toxoplasmosis is an infection caused by the parasite Toxoplasma gondii. Chronically-infected individuals with a compromised immune system are at risk for reactivation of the disease. In-vivo induced antigen technology (IVIAT) is a promising method for the identification of antigens expressed in-vivo. The aim of the present study was to apply IVIAT to identify antigens which are expressed in-vivo during T. gondii infection using sera from individuals with chronic toxoplasmosis. Forty serum samples were pooled, pre-adsorped against three different preparations of antigens, from each in-vitro grown T. gondii and Escherichia coli XLBlue MRF', and then used to screen a T. gondii cDNA expression library. Sequencing of DNA inserts from positive clones showed eight open reading frames with high homology to T. gondii genes. Expression analysis using quantitative real-time PCR showed that SAG1-related sequence 3 (SRS3) and two hypothetical genes were up-regulated in-vivo relative to their expression levels in-vitro. These three proteins also showed high sensitivity and specificity when tested with individual serum samples. Five other proteins namely M16 domain peptidase, microneme protein, elongation factor 1-alpha, pre-mRNA-splicing factor and small nuclear ribonucleoprotein F had lower RNA expression in-vivo as compared to in-vitro. SRS3 and the two hypothetical proteins warrant further investigation into their roles in the pathogenesis of toxoplasmosis.
    Matched MeSH terms: Open Reading Frames
  3. Amiruddin N, Lee XW, Blake DP, Suzuki Y, Tay YL, Lim LS, et al.
    BMC Genomics, 2012 Jan 13;13:21.
    PMID: 22244352 DOI: 10.1186/1471-2164-13-21
    BACKGROUND: Eimeria tenella is an apicomplexan parasite that causes coccidiosis in the domestic fowl. Infection with this parasite is diagnosed frequently in intensively reared poultry and its control is usually accorded a high priority, especially in chickens raised for meat. Prophylactic chemotherapy has been the primary method used for the control of coccidiosis. However, drug efficacy can be compromised by drug-resistant parasites and the lack of new drugs highlights demands for alternative control strategies including vaccination. In the long term, sustainable control of coccidiosis will most likely be achieved through integrated drug and vaccination programmes. Characterisation of the E. tenella transcriptome may provide a better understanding of the biology of the parasite and aid in the development of a more effective control for coccidiosis.

    RESULTS: More than 15,000 partial sequences were generated from the 5' and 3' ends of clones randomly selected from an E. tenella second generation merozoite full-length cDNA library. Clustering of these sequences produced 1,529 unique transcripts (UTs). Based on the transcript assembly and subsequently primer walking, 433 full-length cDNA sequences were successfully generated. These sequences varied in length, ranging from 441 bp to 3,083 bp, with an average size of 1,647 bp. Simple sequence repeat (SSR) analysis identified CAG as the most abundant trinucleotide motif, while codon usage analysis revealed that the ten most infrequently used codons in E. tenella are UAU, UGU, GUA, CAU, AUA, CGA, UUA, CUA, CGU and AGU. Subsequent analysis of the E. tenella complete coding sequences identified 25 putative secretory and 60 putative surface proteins, all of which are now rational candidates for development as recombinant vaccines or drug targets in the effort to control avian coccidiosis.

    CONCLUSIONS: This paper describes the generation and characterisation of full-length cDNA sequences from E. tenella second generation merozoites and provides new insights into the E. tenella transcriptome. The data generated will be useful for the development and validation of diagnostic and control strategies for coccidiosis and will be of value in annotation of the E. tenella genome sequence.

    Matched MeSH terms: Open Reading Frames
  4. Arockiaraj J, Easwvaran S, Vanaraja P, Singh A, Othman RY, Bhassu S
    Fish Shellfish Immunol, 2012 May;32(5):929-33.
    PMID: 22361112 DOI: 10.1016/j.fsi.2012.02.011
    This study reports the first full length gene of interferon related developmental regulator-1 (designated as MrIRDR-1), identified from the transcriptome of Macrobrachium rosenbergii. The complete gene sequence of the MrIRDR-1 is 2459 base pair long with an open reading frame of 1308 base pairs and encoding a predicted protein of 436 amino acids with a calculated molecular mass of 48 kDa. The MrIRDR-1 protein contains a long interferon related developmental regulator super family domain between 30 and 330. The mRNA expressions of MrIRDR-1 in healthy and the infectious hypodermal and hematopoietic necrosis virus (IHHNV) infected M. rosenbergii were examined using qRT-PCR. The MrIRDR-1 is highly expressed in hepatopancreas along with all other tissues (walking leg, gills, muscle, haemocyte, pleopods, brain, stomach, intestine and eye stalk). After IHHNV infection, the expression is highly upregulated in hepatopancreas. This result indicates an important role of MrIRDR-1 in prawn defense system.
    Matched MeSH terms: Open Reading Frames
  5. Arockiaraj J, Easwvaran S, Vanaraja P, Singh A, Othman RY, Bhassu S
    Mol Biol Rep, 2012 Feb;39(2):1377-86.
    PMID: 21614523 DOI: 10.1007/s11033-011-0872-5
    The prophenoloxidase activating system is an important innate immune response against microbial infections in invertebrates. The major enzyme, phenoloxidase, is synthesized as an inactive precursor and its activation to an active enzyme is mediated by a cascade of clip domain serine proteinases. In this study, a cDNA encoding a prophenoloxidase activating enzyme-III from the giant freshwater prawn Macrobrachium rosenbergii, designated as MrProAE-III, was identified and characterized. The full-length cDNA contains an open reading frame of 1110 base pair (bp) encoding a predicted protein of 370 amino acids including an 22 amino acid signal peptide. The MrProAE-III protein exhibits a characteristic sequence structure of a long serine proteases-trypsin domain and an N- and C-terminal serine proteases-trypsin family histidine active sites, respectively, which together are the characteristics of the clip-serin proteases. Sequence analysis showed that MrProAE-III exhibited the highest amino acid sequence similarity (63%) to a ProAE-III from Atlantic blue crab, Callinectes sapidus. MrProAE-III mRNA and enzyme activity of MrProAE-III were detectable in all examined tissues, including hepatopancreas, hemocytes, pleopods, walking legs, eye stalk, gill, stomach, intestine, brain and muscle with the highest level of both in hepatopancreas. This is regulated after systemic infectious hypodermal and hematopoietic necrosis virus infection supporting that it is an immune-responsive gene. These results indicate that MrProAE-III functions in the proPO system and is an important component in the prawn immune system.
    Matched MeSH terms: Open Reading Frames/genetics
  6. Arsad H, Sudesh K, Nazalan N, Muhammad TS, Wahab H, Razip Samian M
    Trop Life Sci Res, 2009 Dec;20(2):1-14.
    PMID: 24575175 MyJurnal
    The (R)-3-hydroxyacyl-ACP-CoA transferase catalyses the conversion of (R)-3-hydroxyacyl-ACP to (R)-3-hydroxyacyl-CoA derivatives, which serves as the ultimate precursor for polyhydroxyalkanoate (PHA) polymerisation from unrelated substrates in pseudomonads. PhaG was found to be responsible for channelling precursors for polyhydroxyalkanoate (PHA) synthase from a de novo fatty acid biosynthesis pathway when cultured on carbohydrates, such as glucose or gluconate. The phaG gene was cloned from Pseudomonas sp. USM 4-55 using a homologous probe. The gene was located in a 3660 bp Sal I fragment (GenBank accession number EU305558). The open reading frame (ORF) was 885 bp long and encoded a 295 amino acid protein. The predicted molecular weight was 33251 Da, and it showed a 62% identity to the PhaG of Pseudomonas aeruginosa. The function of the cloned phaG of Pseudomonas sp. USM 4-55 was confirmed by complementation studies. Plasmid pBCS39, which harboured the 3660 bp Sal I fragment, was found to complement the PhaG-mutant heterologous host cell, Pseudomonas putida PhaGN-21. P. putida PhaGN-21, which harboured pBCS39, accumulated PHA that accounted for up to 18% of its cellular dry weight (CDW). P. putida PhaGN-21, which harboured the vector alone (PBBR1MCS-2), accumulated only 0.6% CDW of PHA.
    Matched MeSH terms: Open Reading Frames
  7. Atago Y, Shimodaira J, Araki N, Bin Othman N, Zakaria Z, Fukuda M, et al.
    Biosci Biotechnol Biochem, 2016 May;80(5):1012-9.
    PMID: 26828632 DOI: 10.1080/09168451.2015.1127134
    Rhodococcus jostii RHA1 (RHA1) degrades polychlorinated biphenyl (PCB) via co-metabolism with biphenyl. To identify the novel open reading frames (ORFs) that contribute to PCB/biphenyl metabolism in RHA1, we compared chromatin immunoprecipitation chip and transcriptomic data. Six novel ORFs involved in PCB/biphenyl metabolism were identified. Gene deletion mutants of these 6 ORFs were made and were tested for their ability to grow on biphenyl. Interestingly, only the ro10225 deletion mutant showed deficient growth on biphenyl. Analysis of Ro10225 protein function showed that growth of the ro10225 deletion mutant on biphenyl was recovered when exogenous recombinant Ro10225 protein was added to the culture medium. Although Ro10225 protein has no putative secretion signal sequence, partially degraded Ro10225 protein was detected in conditioned medium from wild-type RHA1 grown on biphenyl. This Ro10225 fragment appeared to form a complex with another PCB/biphenyl oxidation enzyme. These results indicated that Ro10225 protein is essential for the formation of the PCB/biphenyl dioxygenase complex in RHA1.
    Matched MeSH terms: Open Reading Frames
  8. Atif A. B., Halim-Fikri A H, Zilfalil BA
    MyJurnal
    In the human genome, point variations are most common (Nachman & Crowell, 2000) and well understood. These variations, when existing in more than 1% of the population, is referred to as
    Single Nucleotide Polymorphism (SNP) and can fall in the coding region of a gene, non coding region or intergenic regions.
    Matched MeSH terms: Open Reading Frames
  9. Austin CM, Tan MH, Gan HY, Gan HM
    Mitochondrial DNA A DNA Mapp Seq Anal, 2016 11;27(6):4176-4177.
    PMID: 25630729
    Next-Gen sequencing was used to recover the complete mitochondrial genome of Cherax tenuimanus. The mitogenome consists of 15,797 base pairs (68.14% A + T content) containing 13 protein-coding genes, two ribosomal subunit genes, 22 transfer RNAs, and a 779 bp non-coding AT-rich region. Mitogenomes have now been recovered for all six species of Cherax native to Western Australia.
    Matched MeSH terms: Open Reading Frames
  10. Austin CM, Tan MH, Lee YP, Croft LJ, Meekan MG, Gan HM
    PMID: 25103432 DOI: 10.3109/19401736.2014.947586
    The complete mitogenome of the ray Pastinachus atrus was recovered from a partial genome scan using the HiSeq sequencing system. The P. atrus mitogenome has 18,162 base pairs (61% A + T content) made up of 13 protein-coding genes, 2 ribosomal subunit genes, 22 transfer RNAs, and a 2516 bp non-coding AT-rich region. This mitogenome sequence is the first for a ray from Australian waters, the first for the Genus Pastinachus, and the 6th for the family Dasyatidae.
    Matched MeSH terms: Open Reading Frames
  11. Azlan A, Obeidat SM, Yunus MA, Azzam G
    Sci Rep, 2019 08 21;9(1):12147.
    PMID: 31434910 DOI: 10.1038/s41598-019-47506-9
    Long noncoding RNAs (lncRNAs) play diverse roles in biological processes. Aedes aegypti (Ae. aegypti), a blood-sucking mosquito, is the principal vector responsible for replication and transmission of arboviruses including dengue, Zika, and Chikungunya virus. Systematic identification and developmental characterisation of Ae. aegypti lncRNAs are still limited. We performed genome-wide identification of lncRNAs, followed by developmental profiling of lncRNA in Ae. aegypti. We identified a total of 4,689 novel lncRNA transcripts, of which 2,064, 2,076, and 549 were intergenic, intronic, and antisense respectively. Ae. aegypti lncRNAs share many characteristics with other species including low expression, low GC content, short in length, and low conservation. Besides, the expression of Ae. aegypti lncRNAs tend to be correlated with neighbouring and antisense protein-coding genes. A subset of lncRNAs shows evidence of maternal inheritance; hence, suggesting potential role of lncRNAs in early-stage embryos. Additionally, lncRNAs show higher tendency to be expressed in developmental and temporal specific manner. The results from this study provide foundation for future investigation on the function of Ae. aegypti lncRNAs.
    Matched MeSH terms: Open Reading Frames
  12. Balakrishnan KN, Abdullah AA, Bala J, Abba Y, Sarah SA, Jesse FFA, et al.
    Infect Genet Evol, 2017 10;54:81-90.
    PMID: 28642159 DOI: 10.1016/j.meegid.2017.06.020
    BACKGROUND: Rat cytomegalovirus ALL-03 (Malaysian strain) which was isolated from a placenta and uterus of a house rat, Rattus rattus diardii has the ability to cross the placenta and infecting the fetus. To further elucidate the pathogenesis of the Malaysian strain of Rat Cytomegalovirus ALL-03 (RCMV ALL-03), detailed analysis on the viral genome sequence is crucial.

    METHODS: Genome sequencing of RCMV ALL-03 was carried out in order to identify the open reading frame (ORF), homology comparison of ORF with other strains of CMV, phylogenetic analysis, classifying ORF with its corresponding conserved genes, and determination of functional proteins and grouping of gene families in order to obtain fundamental knowledge of the genome.

    RESULTS: The present study revealed a total of 123 Coding DNA sequences (CDS) from RCMV ALL-03 with 37 conserved ORF domains as with all herpesvirus genomes. All the CDS possess similar function with RCMV-England followed by RCMV-Berlin, RCMV-Maastricht, and Human CMV. The phylogenetic analysis of RCMV ALL-03 based on conserving genes of herpes virus showed that the Malaysian RCMV isolate is closest to RCMV-English and RCMV-Berlin strains, with 99% and 97% homology, respectively. Similarly, it also demonstrated an evolutionary relationship between RCMV ALL-03 and other strains of herpesviruses from all the three subfamilies. Interestingly, betaherpesvirus subfamily, which has been shown to be more closely related with gammaherpesviruses as compared to alphaherpesviruses, shares some of the functional ORFs. In addition, the arrangement of gene blocks for RCMV ALL-03, which was conserved among herpesvirus family members was also observed in the RCMV ALL-03 genome.

    CONCLUSION: Genomic analysis of RCMV ALL-03 provided an overall picture of the whole genome organization and it served as a good platform for further understanding on the divergence in the family of Herpesviridae.

    Matched MeSH terms: Open Reading Frames/genetics*
  13. Bellini WJ, Harcourt BH, Bowden N, Rota PA
    J Neurovirol, 2005 Oct;11(5):481-7.
    PMID: 16287690
    Nipah virus is a recently emergent paramyxovirus that is capable of causing severe disease in both humans and animals. The first outbreak of Nipah virus occurred in Malaysia and Singapore in 1999 and, more recently, outbreaks were detected in Bangladesh. In humans, Nipah virus causes febrile encephalitis with respiratory syndrome that has a high mortality rate. The reservoir for Nipah virus is believed to be fruit bats, and humans are infected by contact with infected bats or by contact with an intermediate animal host such as pigs. Person to person spread of the virus has also been described. Nipah virus retains many of the genetic and biologic properties found in other paramyxoviruses, though it also has several unique characteristics. However, the virologic characteristics that allow the virus to cause severe disease over a broad host range, and the epidemiologic, environmental and virologic features that favor transmission to humans are unknown. This review summarizes what is known about the virology, epidemiology, pathology, diagnosis and control of this novel pathogen.
    Matched MeSH terms: Open Reading Frames
  14. Boon Yin K, Najimudin N, Muhammad TS
    Biochem Biophys Res Commun, 2008 Jun 27;371(2):177-9.
    PMID: 18413145 DOI: 10.1016/j.bbrc.2008.04.013
    Peroxisome proliferator-activated receptor gamma (PPARgamma) is a ligand activated transcription factor, plays many essential roles of biological function in higher organisms. The PPARgamma is mainly expressed in adipose tissue. It regulates the transcriptional activity of genes by binding with other transcription factor. The PPARgamma coding region has been found to be closest to that of monkey in ours and other research groups. Thus, monkey is a more suitable animal model for future PPARgamma studying, although mice and rat are frequently being used. The PPARgamma is involved in regulating alterations of adipose tissue masses result from changes in mature adipocyte size and/or number through a complex interplay process called adipogenesis. However, the role of PPARgamma in negatively regulating the process of adipogenesis remains unclear. This review may help we investigate the differential expression of key transcription factor in adipose tissue in response to visceral obesity-induced diet in vivo. The study may also provide valuable information to define a more appropriate physiological condition in adipogenesis which may help to prevent diseases cause by negative regulation of the transcription factors in adipose tissue.
    Matched MeSH terms: Open Reading Frames
  15. Cha TS, Habib Shah F
    Plant Sci, 2001 Apr;160(5):913-923.
    PMID: 11297788
    The mRNA differential display method was used to identify and isolate cDNAs corresponding to transcripts that accumulate during the period of lipid synthesis, 12-20 weeks after anthesis (WAA) in the kernel of Elaeis guineensis, var. Tenera. We successfully isolated two cDNA clones, KT7 (312 bp) and KT8 (266 bp). Interestingly, both clones show 79% nucleotide sequence identity to each other. This suggests that both clones encode the isoforms of the same protein. We screened the kernel (15 WAA) cDNA library and isolated the clone pKT7 (587 bp) using KT7 as probe, and isolated another isoform with KT8 probe, which designated as pKT9 (900 bp). Clone pKT9 has 93% nucleotide identity to KT8 and only 46% to pKT7 in their 3'-untranslated region. All three clones displayed significant amino acid sequence identity to seed storage protein glutelin from monocotyledon and globulin from dicotyledon plants. The coding sequence of KT8 (106 bp) shows 76 and 97% identity to pKT9 and pKT7, respectively. Therefore, we suggest that clones KT8 and pKT7 are members of the same subfamily (A), while pKT9 belongs to another subfamily (B) of glutelin multigene families. Southern analysis shows that there are at least four members for the subfamily B. Northern analysis shows that these three members of the glutelin family are co-ordinately expressed and developmentally regulated during the development of the kernel. The transcripts begin to accumulate at 12 WAA, increase in 15 WAA and show a significant reduction at 17 WAA.
    Matched MeSH terms: Open Reading Frames
  16. Chang Y, Liu H, Liu M, Liao X, Sahu SK, Fu Y, et al.
    Gigascience, 2019 03 01;8(3).
    PMID: 30535374 DOI: 10.1093/gigascience/giy152
    BACKGROUND: The expanding world population is expected to double the worldwide demand for food by 2050. Eighty-eight percent of countries currently face a serious burden of malnutrition, especially in Africa and south and southeast Asia. About 95% of the food energy needs of humans are fulfilled by just 30 species, of which wheat, maize, and rice provide the majority of calories. Therefore, to diversify and stabilize the global food supply, enhance agricultural productivity, and tackle malnutrition, greater use of neglected or underutilized local plants (so-called orphan crops, but also including a few plants of special significance to agriculture, agroforestry, and nutrition) could be a partial solution.

    RESULTS: Here, we present draft genome information for five agriculturally, biologically, medicinally, and economically important underutilized plants native to Africa: Vigna subterranea, Lablab purpureus, Faidherbia albida, Sclerocarya birrea, and Moringa oleifera. Assembled genomes range in size from 217 to 654 Mb. In V. subterranea, L. purpureus, F. albida, S. birrea, and M. oleifera, we have predicted 31,707, 20,946, 28,979, 18,937, and 18,451 protein-coding genes, respectively. By further analyzing the expansion and contraction of selected gene families, we have characterized root nodule symbiosis genes, transcription factors, and starch biosynthesis-related genes in these genomes.

    CONCLUSIONS: These genome data will be useful to identify and characterize agronomically important genes and understand their modes of action, enabling genomics-based, evolutionary studies, and breeding strategies to design faster, more focused, and predictable crop improvement programs.

    Matched MeSH terms: Open Reading Frames/genetics
  17. Chen Y, Guo R, Liang Y, Luo L, Han Y, Wang H, et al.
    Virus Res, 2023 Sep;334:199183.
    PMID: 37499764 DOI: 10.1016/j.virusres.2023.199183
    Stutzerimonas stutzeri is an opportunistic pathogen widely distributed in the environment and displays diverse metabolic capabilities. In this study, a novel lytic S. stutzeri phage, named vB_PstM_ZRG1, was isolated from the seawater in the East China Sea (29°09'N, 123°39'E). vB_PstM_ZRG1 was stable at temperatures ranging from -20°C to 65°C and across a wide range of pH values from 3 to 10. The genome of vB_PstM_ZRG1 was determined to be a double-stranded DNA with a genome size of 52,767 bp, containing 78 putative open reading frames (ORFs). Three auxiliary metabolic genes encoded by phage vB_PstM_ZRG1 were predicted, including Toll/interleukin-1 receptor (TIR) domain, proline-alanine-alanine-arginine (PAAR) protein and SGNH (Ser-Gly-Asn-His) family hydrolase, especially TIR domain is not common in isolated phages. Phylogenic and network analysis showed that vB_PstM_ZRG1 has low similarity to other phage genomes in the GenBank and IMG/VR database, and might represent a novel viral genus, named Elithevirus. Additionally, the distribution map results indicated that vB_PstM_ZRG1 could infect both extreme colds- and warm-type hosts in the marine environment. In summary, our finding provided basic information for further research on the relationship between S. stutzeri and their phages, and expanded our understanding of genomic characteristics, phylogenetic diversity and distribution of Elithevirus.
    Matched MeSH terms: Open Reading Frames
  18. Cheng TH, Saidin J, Danish-Daniel M, Gan HM, Mat Isa MN, Abu Bakar MF, et al.
    Genome Announc, 2018 Feb 08;6(6).
    PMID: 29439033 DOI: 10.1128/genomeA.00022-18
    Serratia marcescens
    subsp.sakuensisstrain K27 was isolated from sponge (Haliclona amboinensis). The genome of this strain consists of 5,325,727 bp, with 5,140 open reading frames (ORFs), 3 rRNAs, and 67 tRNAs. It contains genes for the production of amylases, lipases, and proteases. Gene clusters for the biosynthesis of nonribosomal peptides and thiopeptide were also identified.
    Matched MeSH terms: Open Reading Frames
  19. Choo SW, Ang MY, Fouladi H, Tan SY, Siow CC, Mutha NV, et al.
    BMC Genomics, 2014;15:600.
    PMID: 25030426 DOI: 10.1186/1471-2164-15-600
    Helicobacter is a genus of Gram-negative bacteria, possessing a characteristic helical shape that has been associated with a wide spectrum of human diseases. Although much research has been done on Helicobacter and many genomes have been sequenced, currently there is no specialized Helicobacter genomic resource and analysis platform to facilitate analysis of these genomes. With the increasing number of Helicobacter genomes being sequenced, comparative genomic analysis on members of this species will provide further insights on their taxonomy, phylogeny, pathogenicity and other information that may contribute to better management of diseases caused by Helicobacter pathogens.
    Matched MeSH terms: Open Reading Frames
  20. Chow KS, Ghazali AK, Hoh CC, Mohd-Zainuddin Z
    BMC Res Notes, 2014 Feb 01;7:69.
    PMID: 24484543 DOI: 10.1186/1756-0500-7-69
    BACKGROUND: One of the concerns of assembling de novo transcriptomes is determining the amount of read sequences required to ensure a comprehensive coverage of genes expressed in a particular sample. In this report, we describe the use of Illumina paired-end RNA-Seq (PE RNA-Seq) reads from Hevea brasiliensis (rubber tree) bark to devise a transcript mapping approach for the estimation of the read amount needed for deep transcriptome coverage.

    FINDINGS: We optimized the assembly of a Hevea bark transcriptome based on 16 Gb Illumina PE RNA-Seq reads using the Oases assembler across a range of k-mer sizes. We then assessed assembly quality based on transcript N50 length and transcript mapping statistics in relation to (a) known Hevea cDNAs with complete open reading frames, (b) a set of core eukaryotic genes and (c) Hevea genome scaffolds. This was followed by a systematic transcript mapping process where sub-assemblies from a series of incremental amounts of bark transcripts were aligned to transcripts from the entire bark transcriptome assembly. The exercise served to relate read amounts to the degree of transcript mapping level, the latter being an indicator of the coverage of gene transcripts expressed in the sample. As read amounts or datasize increased toward 16 Gb, the number of transcripts mapped to the entire bark assembly approached saturation. A colour matrix was subsequently generated to illustrate sequencing depth requirement in relation to the degree of coverage of total sample transcripts.

    CONCLUSIONS: We devised a procedure, the "transcript mapping saturation test", to estimate the amount of RNA-Seq reads needed for deep coverage of transcriptomes. For Hevea de novo assembly, we propose generating between 5-8 Gb reads, whereby around 90% transcript coverage could be achieved with optimized k-mers and transcript N50 length. The principle behind this methodology may also be applied to other non-model plants, or with reads from other second generation sequencing platforms.

    Matched MeSH terms: Open Reading Frames
Filters
Contact Us

Please provide feedback to Administrator (afdal@afpm.org.my)

External Links