Results: We generated 43 Gb of short Illumina reads and 9 Gb of long Nanopore reads, representing approximate genome coverage of 54× and 11×, respectively, based on the range of estimated k-mer-predicted genome sizes of between 791 and 967 Mbp. The final assembled genome is contained in 6404 scaffolds with an accumulated length of 880 Mb (96.3% BUSCO-calculated genome completeness). Compared with the Illumina-only assembly, the hybrid approach generated 94% fewer scaffolds with an 18-fold increase in N50 length (401 kb) and increased the genome completeness by an additional 16%. A total of 27 240 high-quality protein-coding genes were predicted from the clown anemonefish, 26 211 (96%) of which were annotated functionally with information from either sequence homology or protein signature searches.
Conclusions: We present the first genome of any anemonefish and demonstrate the value of low coverage (∼11×) long Nanopore read sequencing in improving both genome assembly contiguity and completeness. The near-complete assembly of the A. ocellaris genome will be an invaluable molecular resource for supporting a range of genetic, genomic, and phylogenetic studies specifically for clownfish and more generally for other related fish species of the family Pomacentridae.
RESULTS: Both laboratory approaches yielded complete mtDNA genomes from M. f. fascicularis with high accuracy and/or coverage. According to our phylogenetic reconstructions, M. f. fascicularis initially diverged into two clades 1.70 million years ago (Ma), with one including haplotypes from mainland Southeast Asia, the Malay Peninsula and North Sumatra (Clade A) and the other, haplotypes from the islands of Bangka, Java, Borneo, Timor, and the Philippines (Clade B). The three geographical populations of Clade A appear as paraphyletic groups, while local populations of Clade B form monophyletic clades with the exception of a Philippine individual which is nested within the Borneo clade. Further, in Clade B the branching pattern among main clades/lineages remains largely unresolved, most likely due to their relatively rapid diversification 0.93-0.84 Ma.
CONCLUSIONS: Both laboratory methods have proven to be powerful to generate complete mtDNA genome data with similarly high accuracy, with the DNA-capture and high-throughput sequencing approach as the most promising and only practical option to obtain such data from highly degraded DNA, in time and with relatively low costs. The application of complete mtDNA genomes yields new insights into the evolutionary history of M. f. fascicularis by providing a more robust phylogeny and more reliable divergence age estimations than earlier studies.