FINDINGS: We present a draft genome assembly that includes 200 Gb of Illumina reads, 4 Gb of Moleculo synthetic long reads, and 108 Gb of Chicago libraries, with a final size matching the estimated genome size of 2.7 Gb, and a scaffold N50 of 4.8 Mb. We also present an alternative assembly including 27 Gb raw reads generated using the Pacific Biosciences platform. In addition, we sequenced the proteome of the same individual and RNA from 3 different tissue types from 3 other species of squid (Onychoteuthis banksii, Dosidicus gigas, and Sthenoteuthis oualaniensis) to assist genome annotation. We annotated 33,406 protein-coding genes supported by evidence, and the genome completeness estimated by BUSCO reached 92%. Repetitive regions cover 49.17% of the genome.
CONCLUSIONS: This annotated draft genome of A. dux provides a critical resource to investigate the unique traits of this species, including its gigantism and key adaptations to deep-sea environments.
METHODS: Using data from 12 microsatellite loci, we assessed the genetic diversity and genetic/geographic structure for 353 cempedak and 175 bangkong accessions from Malaysia and neighboring countries and employed clonal analysis to characterize cempedak cultivars. We conducted haplotype network analyses on the trnH-psbA region in a subset of these samples. We also analyzed key vegetative characters that reportedly differentiate cempedak and bangkong.
KEY RESULTS: We show that cempedak and bangkong are sister taxa and distinct genetically and morphologically, but the directionality of domestication origin is unclear. Genetic diversity was generally higher in bangkong than in cempedak. We found a distinct genetic cluster for cempedak from Borneo as compared to cempedak from Peninsular Malaysia. Finally, cempedak cultivars with the same names did not always share the same genetic fingerprint.
CONCLUSIONS: Cempedak origins are complex, with likely admixture and hybridization with bangkong, warranting further investigation. We provide a baseline of genetic diversity of cempedak and bangkong in Malaysia and found that germplasm collections in Malaysia represent diverse coverage of the four cempedak genetic clusters detected.