FINDINGS: We optimized the assembly of a Hevea bark transcriptome based on 16 Gb Illumina PE RNA-Seq reads using the Oases assembler across a range of k-mer sizes. We then assessed assembly quality based on transcript N50 length and transcript mapping statistics in relation to (a) known Hevea cDNAs with complete open reading frames, (b) a set of core eukaryotic genes and (c) Hevea genome scaffolds. This was followed by a systematic transcript mapping process where sub-assemblies from a series of incremental amounts of bark transcripts were aligned to transcripts from the entire bark transcriptome assembly. The exercise served to relate read amounts to the degree of transcript mapping level, the latter being an indicator of the coverage of gene transcripts expressed in the sample. As read amounts or datasize increased toward 16 Gb, the number of transcripts mapped to the entire bark assembly approached saturation. A colour matrix was subsequently generated to illustrate sequencing depth requirement in relation to the degree of coverage of total sample transcripts.
CONCLUSIONS: We devised a procedure, the "transcript mapping saturation test", to estimate the amount of RNA-Seq reads needed for deep coverage of transcriptomes. For Hevea de novo assembly, we propose generating between 5-8 Gb reads, whereby around 90% transcript coverage could be achieved with optimized k-mers and transcript N50 length. The principle behind this methodology may also be applied to other non-model plants, or with reads from other second generation sequencing platforms.
RESULTS: A total of 12 standard cDNA libraries, representing three main developmental stages in oil palm tissue culture, were generated in this study. Random sequencing of clones from these cDNA libraries generated 17,599 expressed sequence tags (ESTs). The ESTs were analysed, annotated and assembled to generate 9,584 putative unigenes distributed in 3,268 consensi and 6,316 singletons. These unigenes were assigned putative functions based on similarity and gene ontology annotations. Cluster analysis, which surveyed the relatedness of each library based on the abundance of ESTs in each consensus, revealed that lipid transfer proteins were highly expressed in embryogenic tissues. A glutathione S-transferase was found to be highly expressed in non-embryogenic callus. Further analysis of the unigenes identified 648 non-redundant simple sequence repeats and 211 putative full-length open reading frames.
CONCLUSION: This study has provided an overview of genes expressed during oil palm tissue culture. Candidate genes with expression that are modulated during tissue culture were identified. However, in order to confirm whether these genes are suitable as early markers for embryogenesis, the genes need to be tested on earlier stages of tissue culture and a wider range of genotypes. This collection of ESTs is an important resource for genetic and genome analyses of the oil palm, particularly during tissue culture development.