METHODS: Using the full-length sequences of SARS-CoV-2 with intact geographic, demographic, and temporal information worldwide from the GISAID database during 26 December 2019 and 30 November 2020, we constructed the transmission tree to depict the evolutionary process by the R package "outbreaker". The affinity of the mutated receptor-binding region of the spike protein to angiotensin-converting enzyme 2 (ACE2) was predicted using mCSM-PPI2 software. Viral infectivity and antigenicity were tested in ACE2-transfected HEK293T cells by pseudovirus transfection and neutralizing antibody test.
RESULTS: From 26 December 2019 to 8 March 2020, early stage of the COVID-19 pandemic, SARS-CoV-2 strains identified worldwide were mainly composed of three clusters: the Europe-based cluster including two USA-based sub-clusters; the Asia-based cluster including isolates in China, Japan, the USA, Singapore, Australia, Malaysia, and Italy; and the USA-based cluster. The SARS-CoV-2 strains identified in the USA formed four independent clades while those identified in China formed one clade. After 8 March 2020, the clusters of SARS-CoV-2 strains tended to be independent and became "pure" in each of the major countries. Twenty-two of 60 mutations in the receptor-binding domain of the spike protein were predicted to increase the binding affinity of SARS-CoV-2 to ACE2. Of all predicted mutants, the number of E484K was the largest one with 86 585 sequences, followed by S477N with 55 442 sequences worldwide. In more than ten countries, the frequencies of the isolates with E484K and S477N increased significantly. V367F and N354D mutations increased the infectivity of SARS-CoV-2 pseudoviruses (P
RESULTS: As part of the Vertebrate Genomes Project (VGP) we develop mitoVGP, a fully automated pipeline for similarity-based identification of mitochondrial reads and de novo assembly of mitochondrial genomes that incorporates both long (> 10 kbp, PacBio or Nanopore) and short (100-300 bp, Illumina) reads. Our pipeline leads to successful complete mitogenome assemblies of 100 vertebrate species of the VGP. We observe that tissue type and library size selection have considerable impact on mitogenome sequencing and assembly. Comparing our assemblies to purportedly complete reference mitogenomes based on short-read sequencing, we identify errors, missing sequences, and incomplete genes in those references, particularly in repetitive regions. Our assemblies also identify novel gene region duplications. The presence of repeats and duplications in over half of the species herein assembled indicates that their occurrence is a principle of mitochondrial structure rather than an exception, shedding new light on mitochondrial genome evolution and organization.
CONCLUSIONS: Our results indicate that even in the "simple" case of vertebrate mitogenomes the completeness of many currently available reference sequences can be further improved, and caution should be exercised before claiming the complete assembly of a mitogenome, particularly from short reads alone.