Displaying publications 1 - 20 of 39 in total

Abstract:
Sort:
  1. Gopalakrishnan S, Ebenesersdóttir SS, Lundstrøm IKC, Turner-Walker G, Moore KHS, Luisi P, et al.
    Curr Biol, 2022 Nov 07;32(21):4743-4751.e6.
    PMID: 36182700 DOI: 10.1016/j.cub.2022.09.023
    Human populations have been shaped by catastrophes that may have left long-lasting signatures in their genomes. One notable example is the second plague pandemic that entered Europe in ca. 1,347 CE and repeatedly returned for over 300 years, with typical village and town mortality estimated at 10%-40%.1 It is assumed that this high mortality affected the gene pools of these populations. First, local population crashes reduced genetic diversity. Second, a change in frequency is expected for sequence variants that may have affected survival or susceptibility to the etiologic agent (Yersinia pestis).2 Third, mass mortality might alter the local gene pools through its impact on subsequent migration patterns. We explored these factors using the Norwegian city of Trondheim as a model, by sequencing 54 genomes spanning three time periods: (1) prior to the plague striking Trondheim in 1,349 CE, (2) the 17th-19th century, and (3) the present. We find that the pandemic period shaped the gene pool by reducing long distance immigration, in particular from the British Isles, and inducing a bottleneck that reduced genetic diversity. Although we also observe an excess of large FST values at multiple loci in the genome, these are shaped by reference biases introduced by mapping our relatively low genome coverage degraded DNA to the reference genome. This implies that attempts to detect selection using ancient DNA (aDNA) datasets that vary by read length and depth of sequencing coverage may be particularly challenging until methods have been developed to account for the impact of differential reference bias on test statistics.
  2. Reeve AH, Kennedy JD, Pujolar JM, Petersen B, Blom MPK, Alström P, et al.
    Nat Commun, 2023 Dec 11;14(1):8215.
    PMID: 38081809 DOI: 10.1038/s41467-023-43964-y
    The processes generating the earth's montane biodiversity remain a matter of debate. Two contrasting hypotheses have been advanced to explain how montane populations form: via direct colonization from other mountains, or, alternatively, via upslope range shifts from adjacent lowland areas. We seek to reconcile these apparently conflicting hypotheses by asking whether a species' ancestral geographic origin determines its mode of mountain colonization. Island-dwelling passerine birds at the faunal crossroads between Eurasia and Australo-Papua provide an ideal study system. We recover the phylogenetic relationships of the region's montane species and reconstruct their ancestral geographic ranges, elevational ranges, and migratory behavior. We also perform genomic population studies of three super-dispersive montane species/clades with broad island distributions. Eurasian-origin species populated archipelagos via direct colonization between mountains. This mode of colonization appears related to ancestral adaptations to cold and seasonal climates, specifically short-distance migration. Australo-Papuan-origin mountain populations, by contrast, evolved from lowland ancestors, and highland distribution mostly precludes their further colonization of island mountains. Our study explains much of the distributional variation within a complex biological system, and provides a synthesis of two seemingly discordant hypotheses for montane community formation.
  3. Cai Z, Petersen B, Sahana G, Madsen LB, Larsen K, Thomsen B, et al.
    Sci Rep, 2017 Nov 06;7(1):14564.
    PMID: 29109430 DOI: 10.1038/s41598-017-15169-z
    The American mink (Neovison vison) is a semiaquatic species of mustelid native to North America. It's an important animal for the fur industry. Many efforts have been made to locate genes influencing fur quality and color, but this search has been impeded by the lack of a reference genome. Here we present the first draft genome of mink. In our study, two mink individuals were sequenced by Illumina sequencing with 797 Gb sequence generated. Assembly yielded 7,175 scaffolds with an N50 of 6.3 Mb and length of 2.4 Gb including gaps. Repeat sequences constitute around 31% of the genome, which is lower than for dog and cat genomes. The alignments of mink, ferret and dog genomes help to illustrate the chromosomes rearrangement. Gene annotation identified 21,053 protein-coding sequences present in mink genome. The reference genome's structure is consistent with the microsatellite-based genetic map. Mapping of well-studied genes known to be involved in coat quality and coat color, and previously located fur quality QTL provide new knowledge about putative candidate genes for fur traits. The draft genome shows great potential to facilitate genomic research towards improved breeding for high fur quality animals and strengthen our understanding on evolution of Carnivora.
  4. Jørgensen TS, Nielsen BLH, Petersen B, Browne PD, Hansen BW, Hansen LH
    G3 (Bethesda), 2019 05 07;9(5):1295-1302.
    PMID: 30923136 DOI: 10.1534/g3.119.400085
    Copepoda is one of the most ecologically important animal groups on Earth, yet very few genetic resources are available for this Subclass. Here, we present the first whole genome sequence (WGS, acc. UYDY01) and the first mRNA transcriptome assembly (TSA, Acc. GHAJ01) for the tropical cyclopoid copepod species Apocyclops royi Until now, only the 18S small subunit of ribosomal RNA gene and the COI gene has been available from A. royi, and WGS resources was only available from one other cyclopoid copepod species. Overall, the provided resources are the 8th copepod species to have WGS resources available and the 19th copepod species with TSA information available. We analyze the length and GC content of the provided WGS scaffolds as well as the coverage and gene content of both the WGS and the TSA assembly. Finally, we place the resources within the copepod order Cyclopoida as a member of the Apocyclops genus. We estimate the total genome size of A. royi to 450 Mb, with 181 Mb assembled nonrepetitive sequence, 76 Mb assembled repeats and 193 Mb unassembled sequence. The TSA assembly consists of 29,737 genes and an additional 45,756 isoforms. In the WGS and TSA assemblies, >80% and >95% of core genes can be found, though many in fragmented versions. The provided resources will allow researchers to conduct physiological experiments on A. royi, and also increase the possibilities for copepod gene set analysis, as it adds substantially to the copepod datasets available.
  5. Cerca J, Armstrong EE, Vizueta J, Fernández R, Dimitrov D, Petersen B, et al.
    Genome Biol Evol, 2021 Dec 01;13(12).
    PMID: 34849853 DOI: 10.1093/gbe/evab262
    Spiders (Araneae) have a diverse spectrum of morphologies, behaviors, and physiologies. Attempts to understand the genomic-basis of this diversity are often hindered by their large, heterozygous, and AT-rich genomes with high repeat content resulting in highly fragmented, poor-quality assemblies. As a result, the key attributes of spider genomes, including gene family evolution, repeat content, and gene function, remain poorly understood. Here, we used Illumina and Dovetail Chicago technologies to sequence the genome of the long-jawed spider Tetragnatha kauaiensis, producing an assembly distributed along 3,925 scaffolds with an N50 of ∼2 Mb. Using comparative genomics tools, we explore genome evolution across available spider assemblies. Our findings suggest that the previously reported and vast genome size variation in spiders is linked to the different representation and number of transposable elements. Using statistical tools to uncover gene-family level evolution, we find expansions associated with the sensory perception of taste, immunity, and metabolism. In addition, we report strikingly different histories of chemosensory, venom, and silk gene families, with the first two evolving much earlier, affected by the ancestral whole genome duplication in Arachnopulmonata (∼450 Ma) and exhibiting higher numbers. Together, our findings reveal that spider genomes are highly variable and that genomic novelty may have been driven by the burst of an ancient whole genome duplication, followed by gene family and transposable element expansion.
  6. Jørgensen TS, Petersen B, Petersen HCB, Browne PD, Prost S, Stillman JH, et al.
    Genome Biol Evol, 2019 May 01;11(5):1440-1450.
    PMID: 30918947 DOI: 10.1093/gbe/evz067
    Members of the crustacean subclass Copepoda are likely the most abundant metazoans worldwide. Pelagic marine species are critical in converting planktonic microalgae to animal biomass, supporting oceanic food webs. Despite their abundance and ecological importance, only six copepod genomes are publicly available, owing to a number of factors including large genome size, repetitiveness, GC-content, and small animal size. Here, we report the seventh representative copepod genome and the first genome and the first transcriptome from the calanoid copepod species Acartia tonsa Dana, which is among the most numerous mesozooplankton in boreal coastal and estuarine waters. The ecology, physiology, and behavior of A. tonsa have been studied extensively. The genetic resources contributed in this work will allow researchers to link experimental results to molecular mechanisms. From PCR-free whole genome sequence and mRNA Illumina data, we assemble the largest copepod genome to date. We estimate that A. tonsa has a total genome size of 2.5 Gb including repetitive elements we could not resolve. The nonrepetitive fraction of the genome assembly is estimated to be 566 Mb. Our DNA sequencing-based analyses suggest there is a 14-fold difference in genome size between the six members of Copepoda with available genomic information. This finding complements nucleus staining genome size estimations, where 100-fold difference has been reported within 70 species. We briefly analyze the repeat structure in the existing copepod whole genome sequence data sets. The information presented here confirms the evolution of genome size in Copepoda and expands the scope for evolutionary inferences in Copepoda by providing several levels of genetic information from a key planktonic crustacean species.
  7. Jorquera R, González C, Clausen PTLC, Petersen B, Holmes DS
    Database (Oxford), 2021 01 28;2021.
    PMID: 33507271 DOI: 10.1093/database/baab002
    Single-exon coding sequences (CDSs), also known as 'single-exon genes' (SEGs), are defined as nuclear, protein-coding genes that lack introns in their CDSs. They have been studied not only to determine their origin and evolution but also because their expression has been linked to several types of human cancers and neurological/developmental disorders, and many exhibit tissue-specific transcription. We developed SinEx DB that houses DNA and protein sequence information of SEGs from 10 mammalian genomes including human. SinEx DB includes their functional predictions (KOG (euKaryotic Orthologous Groups)) and the relative distribution of these functions within species. Here, we report SinEx 2.0, a major update of SinEx DB that includes information of the occurrence, distribution and functional prediction of SEGs from 60 completely sequenced eukaryotic genomes, representing animals, fungi, protists and plants. The information is stored in a relational database built with MySQL Server 5.7, and the complete dataset of SEG sequences and their GO (Gene Ontology) functional assignations are available for downloading. SinEx DB 2.0 was built with a novel pipeline that helps disambiguate single-exon isoforms from SEGs. SinEx DB 2.0 is the largest available database for SEGs and provides a rich source of information for advancing our understanding of the evolution, function of SEGs and their associations with disorders including cancers and neurological and developmental diseases. Database URL: http://v2.sinex.cl/.
  8. Reeve AH, Gower G, Pujolar JM, Smith BT, Petersen B, Olsson U, et al.
    Evol Lett, 2023 Feb 01;7(1):24-36.
    PMID: 37065434 DOI: 10.1093/evlett/qrac006
    Tropical islands are renowned as natural laboratories for evolutionary study. Lineage radiations across tropical archipelagos are ideal systems for investigating how colonization, speciation, and extinction processes shape biodiversity patterns. The expansion of the island thrush across the Indo-Pacific represents one of the largest yet most perplexing island radiations of any songbird species. The island thrush exhibits a complex mosaic of pronounced plumage variation across its range and is arguably the world's most polytypic bird. It is a sedentary species largely restricted to mountain forests, yet it has colonized a vast island region spanning a quarter of the globe. We conducted a comprehensive sampling of island thrush populations and obtained genome-wide SNP data, which we used to reconstruct its phylogeny, population structure, gene flow, and demographic history. The island thrush evolved from migratory Palearctic ancestors and radiated explosively across the Indo-Pacific during the Pleistocene, with numerous instances of gene flow between populations. Its bewildering plumage variation masks a biogeographically intuitive stepping stone colonization path from the Philippines through the Greater Sundas, Wallacea, and New Guinea to Polynesia. The island thrush's success in colonizing Indo-Pacific mountains can be understood in light of its ancestral mobility and adaptation to cool climates; however, shifts in elevational range, degree of plumage variation and apparent dispersal rates in the eastern part of its range raise further intriguing questions about its biology.
  9. Sinding MS, Gopalakrishan S, Vieira FG, Samaniego Castruita JA, Raundrup K, Heide Jørgensen MP, et al.
    PLoS Genet, 2018 11;14(11):e1007745.
    PMID: 30419012 DOI: 10.1371/journal.pgen.1007745
    North America is currently home to a number of grey wolf (Canis lupus) and wolf-like canid populations, including the coyote (Canis latrans) and the taxonomically controversial red, Eastern timber and Great Lakes wolves. We explored their population structure and regional gene flow using a dataset of 40 full genome sequences that represent the extant diversity of North American wolves and wolf-like canid populations. This included 15 new genomes (13 North American grey wolves, 1 red wolf and 1 Eastern timber/Great Lakes wolf), ranging from 0.4 to 15x coverage. In addition to providing full genome support for the previously proposed coyote-wolf admixture origin for the taxonomically controversial red, Eastern timber and Great Lakes wolves, the discriminatory power offered by our dataset suggests all North American grey wolves, including the Mexican form, are monophyletic, and thus share a common ancestor to the exclusion of all other wolves. Furthermore, we identify three distinct populations in the high arctic, one being a previously unidentified "Polar wolf" population endemic to Ellesmere Island and Greenland. Genetic diversity analyses reveal particularly high inbreeding and low heterozygosity in these Polar wolves, consistent with long-term isolation from the other North American wolves.
  10. Mutusamy P, Banga Singh KK, Su Yin L, Petersen B, Sicheritz-Ponten T, Clokie MRJ, et al.
    Int J Mol Sci, 2023 Feb 12;24(4).
    PMID: 36835084 DOI: 10.3390/ijms24043678
    Salmonella infections across the globe are becoming more challenging to control due to the emergence of multidrug-resistant (MDR) strains. Lytic phages may be suitable alternatives for treating these multidrug-resistant Salmonella infections. Most Salmonella phages to date were collected from human-impacted environments. To further explore the Salmonella phage space, and to potentially identify phages with novel characteristics, we characterized Salmonella-specific phages isolated from the Penang National Park, a conserved rainforest. Four phages with a broad lytic spectrum (kills >5 Salmonella serovars) were further characterized; they have isometric heads and cone-shaped tails, and genomes of ~39,900 bp, encoding 49 CDSs. As the genomes share a <95% sequence similarity to known genomes, the phages were classified as a new species within the genus Kayfunavirus. Interestingly, the phages displayed obvious differences in their lytic spectrum and pH stability, despite having a high sequence similarity (~99% ANI). Subsequent analysis revealed that the phages differed in the nucleotide sequence in the tail spike proteins, tail tubular proteins, and portal proteins, suggesting that the SNPs were responsible for their differing phenotypes. Our findings highlight the diversity of novel Salmonella bacteriophages from rainforest regions, which can be explored as an antimicrobial agent against MDR-Salmonella strains.
  11. Ramos-Madrigal J, Runge AKW, Bouby L, Lacombe T, Samaniego Castruita JA, Adam-Blondon AF, et al.
    Nat Plants, 2019 Jun;5(6):595-603.
    PMID: 31182840 DOI: 10.1038/s41477-019-0437-5
    The Eurasian grapevine (Vitis vinifera) has long been important for wine production as well as being a food source. Despite being clonally propagated, modern cultivars exhibit great morphological and genetic diversity, with thousands of varieties described in historic and contemporaneous records. Through historical accounts, some varieties can be traced to the Middle Ages, but the genetic relationships between ancient and modern vines remain unknown. We present target-enriched genome-wide sequencing data from 28 archaeological grape seeds dating to the Iron Age, Roman era and medieval period. When compared with domesticated and wild accessions, we found that the archaeological samples were closely related to western European cultivars used for winemaking today. We identified seeds with identical genetic signatures present at different Roman sites, as well as seeds sharing parent-offspring relationships with varieties grown today. Furthermore, we discovered that one seed dated to ~1100 CE was a genetic match to 'Savagnin Blanc', providing evidence for 900 years of uninterrupted vegetative propagation.
  12. Høie MH, Kiehl EN, Petersen B, Nielsen M, Winther O, Nielsen H, et al.
    Nucleic Acids Res, 2022 Jul 05;50(W1):W510-W515.
    PMID: 35648435 DOI: 10.1093/nar/gkac439
    Recent advances in machine learning and natural language processing have made it possible to profoundly advance our ability to accurately predict protein structures and their functions. While such improvements are significantly impacting the fields of biology and biotechnology at large, such methods have the downside of high demands in terms of computing power and runtime, hampering their applicability to large datasets. Here, we present NetSurfP-3.0, a tool for predicting solvent accessibility, secondary structure, structural disorder and backbone dihedral angles for each residue of an amino acid sequence. This NetSurfP update exploits recent advances in pre-trained protein language models to drastically improve the runtime of its predecessor by two orders of magnitude, while displaying similar prediction performance. We assessed the accuracy of NetSurfP-3.0 on several independent test datasets and found it to consistently produce state-of-the-art predictions for each of its output features, with a runtime that is up to to 600 times faster than the most commonly available methods performing the same tasks. The tool is freely available as a web server with a user-friendly interface to navigate the results, as well as a standalone downloadable package.
  13. Klausen MS, Jespersen MC, Nielsen H, Jensen KK, Jurtz VI, Sønderby CK, et al.
    Proteins, 2019 06;87(6):520-527.
    PMID: 30785653 DOI: 10.1002/prot.25674
    The ability to predict local structural features of a protein from the primary sequence is of paramount importance for unraveling its function in absence of experimental structural information. Two main factors affect the utility of potential prediction tools: their accuracy must enable extraction of reliable structural information on the proteins of interest, and their runtime must be low to keep pace with sequencing data being generated at a constantly increasing speed. Here, we present NetSurfP-2.0, a novel tool that can predict the most important local structural features with unprecedented accuracy and runtime. NetSurfP-2.0 is sequence-based and uses an architecture composed of convolutional and long short-term memory neural networks trained on solved protein structures. Using a single integrated model, NetSurfP-2.0 predicts solvent accessibility, secondary structure, structural disorder, and backbone dihedral angles for each residue of the input sequences. We assessed the accuracy of NetSurfP-2.0 on several independent test datasets and found it to consistently produce state-of-the-art predictions for each of its output features. We observe a correlation of 80% between predictions and experimental data for solvent accessibility, and a precision of 85% on secondary structure 3-class predictions. In addition to improved accuracy, the processing time has been optimized to allow predicting more than 1000 proteins in less than 2 hours, and complete proteomes in less than 1 day.
  14. Westbury MV, Petersen B, Garde E, Heide-Jørgensen MP, Lorenzen ED
    iScience, 2019 Apr 08.
    PMID: 31054839 DOI: 10.1016/j.isci.2019.03.023
    The narwhal (Monodon monoceros) is a highly specialized endemic Arctic cetacean, restricted to the Arctic seas bordering the North Atlantic. Low levels of genetic diversity have been observed across several narwhal populations using mitochondrial DNA and microsatellites. Despite this, the global abundance of narwhals was recently estimated at ∼170,000 individuals. However, the species is still considered vulnerable to changing climates due to its high specialization and restricted Arctic distribution. We assembled and annotated a genome from a narwhal from West Greenland. We find relatively low diversity at the genomic scale and show that this did not arise by recent inbreeding, but rather has been stable over an extended evolutionary timescale. We also find that the current large global abundance most likely reflects a recent rapid expansion from a much smaller founding population.
  15. Rey-Iglesia A, Gopalakrishan S, Carøe C, Alquezar-Planas DE, Ahlmann Nielsen A, Röder T, et al.
    Mol Ecol Resour, 2019 Mar;19(2):512-525.
    PMID: 30575257 DOI: 10.1111/1755-0998.12984
    In recent years, the availability of reduced representation library (RRL) methods has catalysed an expansion of genome-scale studies to characterize both model and non-model organisms. Most of these methods rely on the use of restriction enzymes to obtain DNA sequences at a genome-wide level. These approaches have been widely used to sequence thousands of markers across individuals for many organisms at a reasonable cost, revolutionizing the field of population genomics. However, there are still some limitations associated with these methods, in particular the high molecular weight DNA required as starting material, the reduced number of common loci among investigated samples, and the short length of the sequenced site-associated DNA. Here, we present MobiSeq, a RRL protocol exploiting simple laboratory techniques, that generates genomic data based on PCR targeted enrichment of transposable elements and the sequencing of the associated flanking region. We validate its performance across 103 DNA extracts derived from three mammalian species: grey wolf (Canis lupus), red deer complex (Cervus sp.) and brown rat (Rattus norvegicus). MobiSeq enables the sequencing of hundreds of thousands loci across the genome and performs SNP discovery with relatively low rates of clonality. Given the ease and flexibility of MobiSeq protocol, the method has the potential to be implemented for marker discovery and population genomics across a wide range of organisms-enabling the exploration of diverse evolutionary and conservation questions.
  16. Battlay P, Wilson J, Bieker VC, Lee C, Prapas D, Petersen B, et al.
    Nat Commun, 2023 Mar 27;14(1):1717.
    PMID: 36973251 DOI: 10.1038/s41467-023-37303-4
    Adaptation is the central feature and leading explanation for the evolutionary diversification of life. Adaptation is also notoriously difficult to study in nature, owing to its complexity and logistically prohibitive timescale. Here, we leverage extensive contemporary and historical collections of Ambrosia artemisiifolia-an aggressively invasive weed and primary cause of pollen-induced hayfever-to track the phenotypic and genetic causes of recent local adaptation across its native and invasive ranges in North America and Europe, respectively. Large haploblocks-indicative of chromosomal inversions-contain a disproportionate share (26%) of genomic regions conferring parallel adaptation to local climates between ranges, are associated with rapidly adapting traits, and exhibit dramatic frequency shifts over space and time. These results highlight the importance of large-effect standing variants in rapid adaptation, which have been critical to A. artemisiifolia's global spread across vast climatic gradients.
  17. Sinding MS, Ciucani MM, Ramos-Madrigal J, Carmagnini A, Rasmussen JA, Feng S, et al.
    iScience, 2021 Nov 19;24(11):103226.
    PMID: 34712923 DOI: 10.1016/j.isci.2021.103226
    The evolution of the genera Bos and Bison, and the nature of gene flow between wild and domestic species, is poorly understood, with genomic data of wild species being limited. We generated two genomes from the likely extinct kouprey (Bos sauveli) and analyzed them alongside other Bos and Bison genomes. We found that B. sauveli possessed genomic signatures characteristic of an independent species closely related to Bos javanicus and Bos gaurus. We found evidence for extensive incomplete lineage sorting across the three species, consistent with a polytomic diversification of the major ancestry in the group, potentially followed by secondary gene flow. Finally, we detected significant gene flow from an unsampled Asian Bos-like source into East Asian zebu cattle, demonstrating both that the full genomic diversity and evolutionary history of the Bos complex has yet to be elucidated and that museum specimens and ancient DNA are valuable resources to do so.
  18. Gopalakrishnan S, Sinding MS, Ramos-Madrigal J, Niemann J, Samaniego Castruita JA, Vieira FG, et al.
    Curr Biol, 2018 11 05;28(21):3441-3449.e5.
    PMID: 30344120 DOI: 10.1016/j.cub.2018.08.041
    The evolutionary history of the wolf-like canids of the genus Canis has been heavily debated, especially regarding the number of distinct species and their relationships at the population and species level [1-6]. We assembled a dataset of 48 resequenced genomes spanning all members of the genus Canis except the black-backed and side-striped jackals, encompassing the global diversity of seven extant canid lineages. This includes eight new genomes, including the first resequenced Ethiopian wolf (Canis simensis), one dhole (Cuon alpinus), two East African hunting dogs (Lycaon pictus), two Eurasian golden jackals (Canis aureus), and two Middle Eastern gray wolves (Canis lupus). The relationships between the Ethiopian wolf, African golden wolf, and golden jackal were resolved. We highlight the role of interspecific hybridization in the evolution of this charismatic group. Specifically, we find gene flow between the ancestors of the dhole and African hunting dog and admixture between the gray wolf, coyote (Canis latrans), golden jackal, and African golden wolf. Additionally, we report gene flow from gray and Ethiopian wolves to the African golden wolf, suggesting that the African golden wolf originated through hybridization between these species. Finally, we hypothesize that coyotes and gray wolves carry genetic material derived from a "ghost" basal canid lineage.
  19. Jorquera R, González C, Clausen P, Petersen B, Holmes DS
    Database (Oxford), 2018 01 01;2018:1-6.
    PMID: 30239665 DOI: 10.1093/database/bay089
    Efficient extraction of knowledge from biological data requires the development of structured vocabularies to unambiguously define biological terms. This paper proposes descriptions and definitions to disambiguate the term 'single-exon gene'. Eukaryotic Single-Exon Genes (SEGs) have been defined as genes that do not have introns in their protein coding sequences. They have been studied not only to determine their origin and evolution but also because their expression has been linked to several types of human cancer and neurological/developmental disorders and many exhibit tissue-specific transcription. Unfortunately, the term 'SEGs' is rife with ambiguity, leading to biological misinterpretations. In the classic definition, no distinction is made between SEGs that harbor introns in their untranslated regions (UTRs) versus those without. This distinction is important to make because the presence of introns in UTRs affects transcriptional regulation and post-transcriptional processing of the mRNA. In addition, recent whole-transcriptome shotgun sequencing has led to the discovery of many examples of single-exon mRNAs that arise from alternative splicing of multi-exon genes, these single-exon isoforms are being confused with SEGs despite their clearly different origin. The increasing expansion of RNA-seq datasets makes it imperative to distinguish the different SEG types before annotation errors become indelibly propagated in biological databases. This paper develops a structured vocabulary for their disambiguation, allowing a major reassessment of their evolutionary trajectories, regulation, RNA processing and transport, and provides the opportunity to improve the detection of gene associations with disorders including cancers, neurological and developmental diseases.
  20. Renaud G, Petersen B, Seguin-Orlando A, Bertelsen MF, Waller A, Newton R, et al.
    Sci Adv, 2018 04;4(4):eaaq0392.
    PMID: 29740610 DOI: 10.1126/sciadv.aaq0392
    Donkeys and horses share a common ancestor dating back to about 4 million years ago. Although a high-quality genome assembly at the chromosomal level is available for the horse, current assemblies available for the donkey are limited to moderately sized scaffolds. The absence of a better-quality assembly for the donkey has hampered studies involving the characterization of patterns of genetic variation at the genome-wide scale. These range from the application of genomic tools to selective breeding and conservation to the more fundamental characterization of the genomic loci underlying speciation and domestication. We present a new high-quality donkey genome assembly obtained using the Chicago HiRise assembly technology, providing scaffolds of subchromosomal size. We make use of this new assembly to obtain more accurate measures of heterozygosity for equine species other than the horse, both genome-wide and locally, and to detect runs of homozygosity potentially pertaining to positive selection in domestic donkeys. Finally, this new assembly allowed us to identify fine-scale chromosomal rearrangements between the horse and the donkey that likely played an active role in their divergence and, ultimately, speciation.
Related Terms
Filters
Contact Us

Please provide feedback to Administrator (afdal@afpm.org.my)

External Links