Durian (Durio zibethinus) is a Southeast Asian tropical plant known for its hefty, spine-covered fruit and sulfury and onion-like odor. Here we present a draft genome assembly of D. zibethinus, representing the third plant genus in the Malvales order and first in the Helicteroideae subfamily to be sequenced. Single-molecule sequencing and chromosome contact maps enabled assembly of the highly heterozygous durian genome at chromosome-scale resolution. Transcriptomic analysis showed upregulation of sulfur-, ethylene-, and lipid-related pathways in durian fruits. We observed paleopolyploidization events shared by durian and cotton and durian-specific gene expansions in MGL (methionine γ-lyase), associated with production of volatile sulfur compounds (VSCs). MGL and the ethylene-related gene ACS (aminocyclopropane-1-carboxylic acid synthase) were upregulated in fruits concomitantly with their downstream metabolites (VSCs and ethylene), suggesting a potential association between ethylene biosynthesis and methionine regeneration via the Yang cycle. The durian genome provides a resource for tropical fruit biology and agronomy.
Noncoding repeat expansions cause various neuromuscular diseases, including myotonic dystrophies, fragile X tremor/ataxia syndrome, some spinocerebellar ataxias, amyotrophic lateral sclerosis and benign adult familial myoclonic epilepsies. Inspired by the striking similarities in the clinical and neuroimaging findings between neuronal intranuclear inclusion disease (NIID) and fragile X tremor/ataxia syndrome caused by noncoding CGG repeat expansions in FMR1, we directly searched for repeat expansion mutations and identified noncoding CGG repeat expansions in NBPF19 (NOTCH2NLC) as the causative mutations for NIID. Further prompted by the similarities in the clinical and neuroimaging findings with NIID, we identified similar noncoding CGG repeat expansions in two other diseases: oculopharyngeal myopathy with leukoencephalopathy and oculopharyngodistal myopathy, in LOC642361/NUTM2B-AS1 and LRP12, respectively. These findings expand our knowledge of the clinical spectra of diseases caused by expansions of the same repeat motif, and further highlight how directly searching for expanded repeats can help identify mutations underlying diseases.
Galloway-Mowat syndrome (GAMOS) is an autosomal-recessive disease characterized by the combination of early-onset nephrotic syndrome (SRNS) and microcephaly with brain anomalies. Here we identified recessive mutations in OSGEP, TP53RK, TPRKB, and LAGE3, genes encoding the four subunits of the KEOPS complex, in 37 individuals from 32 families with GAMOS. CRISPR-Cas9 knockout in zebrafish and mice recapitulated the human phenotype of primary microcephaly and resulted in early lethality. Knockdown of OSGEP, TP53RK, or TPRKB inhibited cell proliferation, which human mutations did not rescue. Furthermore, knockdown of these genes impaired protein translation, caused endoplasmic reticulum stress, activated DNA-damage-response signaling, and ultimately induced apoptosis. Knockdown of OSGEP or TP53RK induced defects in the actin cytoskeleton and decreased the migration rate of human podocytes, an established intermediate phenotype of SRNS. We thus identified four new monogenic causes of GAMOS, describe a link between KEOPS function and human disease, and delineate potential pathogenic mechanisms.
We conducted a combined genome-wide association study (GWAS) of 7,481 individuals with bipolar disorder (cases) and 9,250 controls as part of the Psychiatric GWAS Consortium. Our replication study tested 34 SNPs in 4,496 independent cases with bipolar disorder and 42,422 independent controls and found that 18 of 34 SNPs had P < 0.05, with 31 of 34 SNPs having signals with the same direction of effect (P = 3.8 × 10(-7)). An analysis of all 11,974 bipolar disorder cases and 51,792 controls confirmed genome-wide significant evidence of association for CACNA1C and identified a new intronic variant in ODZ4. We identified a pathway comprised of subunits of calcium channels enriched in bipolar disorder association intervals. Finally, a combined GWAS analysis of schizophrenia and bipolar disorder yielded strong association evidence for SNPs in CACNA1C and in the region of NEK4-ITIH1-ITIH3-ITIH4. Our replication results imply that increasing sample sizes in bipolar disorder will confirm many additional loci.
Most common breast cancer susceptibility variants have been identified through genome-wide association studies (GWAS) of predominantly estrogen receptor (ER)-positive disease. We conducted a GWAS using 21,468 ER-negative cases and 100,594 controls combined with 18,908 BRCA1 mutation carriers (9,414 with breast cancer), all of European origin. We identified independent associations at P < 5 × 10-8 with ten variants at nine new loci. At P < 0.05, we replicated associations with 10 of 11 variants previously reported in ER-negative disease or BRCA1 mutation carrier GWAS and observed consistent associations with ER-negative disease for 105 susceptibility variants identified by other studies. These 125 variants explain approximately 16% of the familial risk of this breast cancer subtype. There was high genetic correlation (0.72) between risk of ER-negative breast cancer and breast cancer risk for BRCA1 mutation carriers. These findings may lead to improved risk prediction and inform further fine-mapping and functional work to better understand the biological basis of ER-negative breast cancer.
Genome-wide association studies (GWAS) have identified 12 epithelial ovarian cancer (EOC) susceptibility alleles. The pattern of association at these loci is consistent in BRCA1 and BRCA2 mutation carriers who are at high risk of EOC. After imputation to 1000 Genomes Project data, we assessed associations of 11 million genetic variants with EOC risk from 15,437 cases unselected for family history and 30,845 controls and from 15,252 BRCA1 mutation carriers and 8,211 BRCA2 mutation carriers (3,096 with ovarian cancer), and we combined the results in a meta-analysis. This new study design yielded increased statistical power, leading to the discovery of six new EOC susceptibility loci. Variants at 1p36 (nearest gene, WNT4), 4q26 (SYNPO2), 9q34.2 (ABO) and 17q11.2 (ATAD5) were associated with EOC risk, and at 1p34.3 (RSPO1) and 6p22.1 (GPX6) variants were specifically associated with the serous EOC subtype, all with P < 5 × 10(-8). Incorporating these variants into risk assessment tools will improve clinical risk predictions for BRCA1 and BRCA2 mutation carriers.
Autism spectrum disorder (ASD) is a highly heritable and heterogeneous group of neurodevelopmental phenotypes diagnosed in more than 1% of children. Common genetic variants contribute substantially to ASD susceptibility, but to date no individual variants have been robustly associated with ASD. With a marked sample-size increase from a unique Danish population resource, we report a genome-wide association meta-analysis of 18,381 individuals with ASD and 27,969 controls that identified five genome-wide-significant loci. Leveraging GWAS results from three phenotypes with significantly overlapping genetic architectures (schizophrenia, major depression, and educational attainment), we identified seven additional loci shared with other traits at equally strict significance levels. Dissecting the polygenic architecture, we found both quantitative and qualitative polygenic heterogeneity across ASD subtypes. These results highlight biological insights, particularly relating to neuronal function and corticogenesis, and establish that GWAS performed at scale will be much more productive in the near term in ASD.
To identify common alleles associated with different histotypes of epithelial ovarian cancer (EOC), we pooled data from multiple genome-wide genotyping projects totaling 25,509 EOC cases and 40,941 controls. We identified nine new susceptibility loci for different EOC histotypes: six for serous EOC histotypes (3q28, 4q32.3, 8q21.11, 10q24.33, 18q11.2 and 22q12.1), two for mucinous EOC (3q22.3 and 9q31.1) and one for endometrioid EOC (5q12.3). We then performed meta-analysis on the results for high-grade serous ovarian cancer with the results from analysis of 31,448 BRCA1 and BRCA2 mutation carriers, including 3,887 mutation carriers with EOC. This identified three additional susceptibility loci at 2q13, 8q24.1 and 12q24.31. Integrated analyses of genes and regulatory biofeatures at each locus predicted candidate susceptibility genes, including OBFC1, a new candidate susceptibility gene for low-grade and borderline serous EOC.
Transforming growth factor (TGF)-β1 (encoded by TGFB1) is the prototypic member of the TGF-β family of 33 proteins that orchestrate embryogenesis, development and tissue homeostasis1,2. Following its discovery 3 , enormous interest and numerous controversies have emerged about the role of TGF-β in coordinating the balance of pro- and anti-oncogenic properties4,5, pro- and anti-inflammatory effects 6 , or pro- and anti-fibrinogenic characteristics 7 . Here we describe three individuals from two pedigrees with biallelic loss-of-function mutations in the TGFB1 gene who presented with severe infantile inflammatory bowel disease (IBD) and central nervous system (CNS) disease associated with epilepsy, brain atrophy and posterior leukoencephalopathy. The proteins encoded by the mutated TGFB1 alleles were characterized by impaired secretion, function or stability of the TGF-β1-LAP complex, which is suggestive of perturbed bioavailability of TGF-β1. Our study shows that TGF-β1 has a critical and nonredundant role in the development and homeostasis of intestinal immunity and the CNS in humans.
Systemic lupus erythematosus (SLE) has a strong but incompletely understood genetic architecture. We conducted an association study with replication in 4,478 SLE cases and 12,656 controls from six East Asian cohorts to identify new SLE susceptibility loci and better localize known loci. We identified ten new loci and confirmed 20 known loci with genome-wide significance. Among the new loci, the most significant locus was GTF2IRD1-GTF2I at 7q11.23 (rs73366469, Pmeta = 3.75 × 10(-117), odds ratio (OR) = 2.38), followed by DEF6, IL12B, TCF7, TERT, CD226, PCNXL3, RASGRP1, SYNGR1 and SIGLEC6. We identified the most likely functional variants at each locus by analyzing epigenetic marks and gene expression data. Ten candidate variants are known to alter gene expression in cis or in trans. Enrichment analysis highlights the importance of these loci in B cell and T cell biology. The new loci, together with previously known loci, increase the explained heritability of SLE to 24%. The new loci share functional and ontological characteristics with previously reported loci and are possible drug targets for SLE therapeutics.
The widespread distribution and relapsing nature of Plasmodium vivax infection present major challenges for the elimination of malaria. To characterize the genetic diversity of this parasite in individual infections and across the population, we performed deep genome sequencing of >200 clinical samples collected across the Asia-Pacific region and analyzed data on >300,000 SNPs and nine regions of the genome with large copy number variations. Individual infections showed complex patterns of genetic structure, with variation not only in the number of dominant clones but also in their level of relatedness and inbreeding. At the population level, we observed strong signals of recent evolutionary selection both in known drug resistance genes and at new loci, and these varied markedly between geographical locations. These findings demonstrate a dynamic landscape of local evolutionary adaptation in the parasite population and provide a foundation for genomic surveillance to guide effective strategies for control and elimination of P. vivax.
Primary angle closure glaucoma (PACG) is a major cause of blindness worldwide. We conducted a genome-wide association study (GWAS) followed by replication in a combined total of 10,503 PACG cases and 29,567 controls drawn from 24 countries across Asia, Australia, Europe, North America, and South America. We observed significant evidence of disease association at five new genetic loci upon meta-analysis of all patient collections. These loci are at EPDR1 rs3816415 (odds ratio (OR) = 1.24, P = 5.94 × 10(-15)), CHAT rs1258267 (OR = 1.22, P = 2.85 × 10(-16)), GLIS3 rs736893 (OR = 1.18, P = 1.43 × 10(-14)), FERMT2 rs7494379 (OR = 1.14, P = 3.43 × 10(-11)), and DPM2-FAM102A rs3739821 (OR = 1.15, P = 8.32 × 10(-12)). We also confirmed significant association at three previously described loci (P < 5 × 10(-8) for each sentinel SNP at PLEKHA7, COL11A1, and PCMTD1-ST18), providing new insights into the biology of PACG.
Breast cancer susceptibility variants frequently show heterogeneity in associations by tumor subtype1-3. To identify novel loci, we performed a genome-wide association study including 133,384 breast cancer cases and 113,789 controls, plus 18,908 BRCA1 mutation carriers (9,414 with breast cancer) of European ancestry, using both standard and novel methodologies that account for underlying tumor heterogeneity by estrogen receptor, progesterone receptor and human epidermal growth factor receptor 2 status and tumor grade. We identified 32 novel susceptibility loci (P
Genome-wide association studies (GWAS) and large-scale replication studies have identified common variants in 79 loci associated with breast cancer, explaining ∼14% of the familial risk of the disease. To identify new susceptibility loci, we performed a meta-analysis of 11 GWAS, comprising 15,748 breast cancer cases and 18,084 controls together with 46,785 cases and 42,892 controls from 41 studies genotyped on a 211,155-marker custom array (iCOGS). Analyses were restricted to women of European ancestry. We generated genotypes for more than 11 million SNPs by imputation using the 1000 Genomes Project reference panel, and we identified 15 new loci associated with breast cancer at P < 5 × 10(-8). Combining association analysis with ChIP-seq chromatin binding data in mammary cell lines and ChIA-PET chromatin interaction data from ENCODE, we identified likely target genes in two regions: SETBP1 at 18q12.3 and RNF115 and PDZK1 at 1q21.1. One association appears to be driven by an amino acid substitution encoded in EXO1.
In a three-stage genome-wide association study among East Asian women including 22,780 cases and 24,181 controls, we identified 3 genetic loci newly associated with breast cancer risk, including rs4951011 at 1q32.1 (in intron 2 of the ZC3H11A gene; P=8.82×10(-9)), rs10474352 at 5q14.3 (near the ARRDC3 gene; P=1.67×10(-9)) and rs2290203 at 15q26.1 (in intron 14 of the PRC1 gene; P=4.25×10(-8)). We replicated these associations in 16,003 cases and 41,335 controls of European ancestry (P=0.030, 0.004 and 0.010, respectively). Data from the ENCODE Project suggest that variants rs4951011 and rs10474352 might be located in an enhancer region and transcription factor binding sites, respectively. This study provides additional insights into the genetics and biology of breast cancer.
Primary angle closure glaucoma (PACG) is a major cause of blindness worldwide. We conducted a genome-wide association study including 1,854 PACG cases and 9,608 controls across 5 sample collections in Asia. Replication experiments were conducted in 1,917 PACG cases and 8,943 controls collected from a further 6 sample collections. We report significant associations at three new loci: rs11024102 in PLEKHA7 (per-allele odds ratio (OR)=1.22; P=5.33×10(-12)), rs3753841 in COL11A1 (per-allele OR=1.20; P=9.22×10(-10)) and rs1015213 located between PCMTD1 and ST18 on chromosome 8q (per-allele OR=1.50; P=3.29×10(-9)). Our findings, accumulated across these independent worldwide collections, suggest possible mechanisms explaining the pathogenesis of PACG.
We conducted a genome-wide association study of oral cavity and pharyngeal cancer in 6,034 cases and 6,585 controls from Europe, North America and South America. We detected eight significantly associated loci (P < 5 × 10-8), seven of which are new for these cancer sites. Oral and pharyngeal cancers combined were associated with loci at 6p21.32 (rs3828805, HLA-DQB1), 10q26.13 (rs201982221, LHPP) and 11p15.4 (rs1453414, OR52N2-TRIM5). Oral cancer was associated with two new regions, 2p23.3 (rs6547741, GPN1) and 9q34.12 (rs928674, LAMC3), and with known cancer-related loci-9p21.3 (rs8181047, CDKN2B-AS1) and 5p15.33 (rs10462706, CLPTM1L). Oropharyngeal cancer associations were limited to the human leukocyte antigen (HLA) region, and classical HLA allele imputation showed a protective association with the class II haplotype HLA-DRB1*1301-HLA-DQA1*0103-HLA-DQB1*0603 (odds ratio (OR) = 0.59, P = 2.7 × 10-9). Stratified analyses on a subgroup of oropharyngeal cases with information available on human papillomavirus (HPV) status indicated that this association was considerably stronger in HPV-positive (OR = 0.23, P = 1.6 × 10-6) than in HPV-negative (OR = 0.75, P = 0.16) cancers.
Genome-wide association studies have identified breast cancer risk variants in over 150 genomic regions, but the mechanisms underlying risk remain largely unknown. These regions were explored by combining association analysis with in silico genomic feature annotations. We defined 205 independent risk-associated signals with the set of credible causal variants in each one. In parallel, we used a Bayesian approach (PAINTOR) that combines genetic association, linkage disequilibrium and enriched genomic features to determine variants with high posterior probabilities of being causal. Potentially causal variants were significantly over-represented in active gene regulatory regions and transcription factor binding sites. We applied our INQUSIT pipeline for prioritizing genes as targets of those potentially causal variants, using gene expression (expression quantitative trait loci), chromatin interaction and functional annotations. Known cancer drivers, transcription factors and genes in the developmental, apoptosis, immune system and DNA integrity checkpoint gene ontology pathways were over-represented among the highest-confidence target genes.
Coronary artery disease (CAD) is a leading cause of morbidity and mortality worldwide. Although 58 genomic regions have been associated with CAD thus far, most of the heritability is unexplained, indicating that additional susceptibility loci await identification. An efficient discovery strategy may be larger-scale evaluation of promising associations suggested by genome-wide association studies (GWAS). Hence, we genotyped 56,309 participants using a targeted gene array derived from earlier GWAS results and performed meta-analysis of results with 194,427 participants previously genotyped, totaling 88,192 CAD cases and 162,544 controls. We identified 25 new SNP-CAD associations (P < 5 × 10-8, in fixed-effects meta-analysis) from 15 genomic regions, including SNPs in or near genes involved in cellular adhesion, leukocyte migration and atherosclerosis (PECAM1, rs1867624), coagulation and inflammation (PROCR, rs867186 (p.Ser219Gly)) and vascular smooth muscle cell differentiation (LMOD1, rs2820315). Correlation of these regions with cell-type-specific gene expression and plasma protein levels sheds light on potential disease mechanisms.
Allelic heterogeneity in disease-causing genes presents a substantial challenge to the translation of genomic variation into clinical practice. Few of the almost 2,000 variants in the cystic fibrosis transmembrane conductance regulator gene CFTR have empirical evidence that they cause cystic fibrosis. To address this gap, we collected both genotype and phenotype data for 39,696 individuals with cystic fibrosis in registries and clinics in North America and Europe. In these individuals, 159 CFTR variants had an allele frequency of ł0.01%. These variants were evaluated for both clinical severity and functional consequence, with 127 (80%) meeting both clinical and functional criteria consistent with disease. Assessment of disease penetrance in 2,188 fathers of individuals with cystic fibrosis enabled assignment of 12 of the remaining 32 variants as neutral, whereas the other 20 variants remained of indeterminate effect. This study illustrates that sourcing data directly from well-phenotyped subjects can address the gap in our ability to interpret clinically relevant genomic variation.