Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies.
Tibetan high-altitude adaptation (HAA) has been studied extensively, and many candidate genes have been reported. Subsequent efforts targeting HAA functional variants, however, have not been that successful (e.g., no functional variant has been suggested for the top candidate HAA gene, EPAS1). With WinXPCNVer, a method developed in this study, we detected in microarray data a Tibetan-enriched deletion (TED) carried by 90% of Tibetans; 50% were homozygous for the deletion, whereas only 3% carried the TED and 0% carried the homozygous deletion in 2,792 worldwide samples (p < 10(-15)). We employed long PCR and Sanger sequencing technologies to determine the exact copy number and breakpoints of the TED in 70 additional Tibetan and 182 diverse samples. The TED had identical boundaries (chr2: 46,694,276-46,697,683; hg19) and was 80 kb downstream of EPAS1. Notably, the TED was in strong linkage disequilibrium (LD; r(2) = 0.8) with EPAS1 variants associated with reduced blood concentrations of hemoglobin. It was also in complete LD with the 5-SNP motif, which was suspected to be introgressed from Denisovans, but the deletion itself was absent from the Denisovan sequence. Correspondingly, we detected that footprints of positive selection for the TED occurred 12,803 (95% confidence interval = 12,075-14,725) years ago. We further whole-genome deep sequenced (>60×) seven Tibetans and verified the TED but failed to identify any other copy-number variations with comparable patterns, giving this TED top priority for further study. We speculate that the specific patterns of the TED resulted from its own functionality in HAA of Tibetans or LD with a functional variant of EPAS1.
The jungle habitat of the Temuan aborigines harbors a variety of infectious diseases, the most notable being malaria. Our study of 15 genetic systems in the Temuan revealed substantial polymorphism and within-population genetic diversity. The polymorphisms for Hb beta, G6PD, and El are of interest in regard to genetic adaptation to malaria. Among the polymorphisms investigated we conclude that G6PD deficiency and elliptocytosis are likely to have malaria-resistant effects as evidenced by their low association with malarial parasitemia or their higher frequency in adults than in children. These findings suggest that the malarial habitat of the Temuans is livable in the long range sense for them because of the cluster of malaria-resistant alleles in their gene pool (G6PD)-, El, and possibly, but not tested here because of its low frequency, Hb beta E). The same condition probably holds for the Semai, the nearest aborigine neighbors of the Temuan (although the Semai have not been tested for malarial parasitemia and for these polymorphisms simultaneously), since the Semai have substantial Hb betaE, G6PD-, and El. The Temuan have a cultural identity system of rituals, beliefs, and certain aspects of language which effectively isolates them genetically from Malays and other nonaborigines. This system hinders the dilution of the malaria-resistant alleles of the Temuan gene pool with the malaria-susceptible alleles of the nonaborigine gene pools.
De novo variants (DNVs) cause many genetic diseases. When DNVs are examined in the whole coding regions of genes in next-generation sequencing analyses, pathogenic DNVs often cluster in a specific region. One such region is the last exon and the last 50 bp of the penultimate exon, where truncating DNVs cause escape from nonsense-mediated mRNA decay [NMD(-) region]. Such variants can have dominant-negative or gain-of-function effects. Here, we first developed a resource of rates of truncating DNVs in NMD(-) regions under the null model of DNVs. Utilizing this resource, we performed enrichment analysis of truncating DNVs in NMD(-) regions in 346 developmental and epileptic encephalopathy (DEE) trios. We observed statistically significant enrichment of truncating DNVs in semaphorin 6B (SEMA6B) (p value: 2.8 × 10-8; exome-wide threshold: 2.5 × 10-6). The initial analysis of the 346 individuals and additional screening of 1,406 and 4,293 independent individuals affected by DEE and developmental disorders collectively identified four truncating DNVs in the SEMA6B NMD(-) region in five individuals who came from unrelated families (p value: 1.9 × 10-13) and consistently showed progressive myoclonic epilepsy. RNA analysis of lymphoblastoid cells established from an affected individual showed that the mutant allele escaped NMD, indicating stable production of the truncated protein. Importantly, heterozygous truncating variants in the NMD(+) region of SEMA6B are observed in general populations, and SEMA6B is most likely loss-of-function tolerant. Zebrafish expressing truncating variants in the NMD(-) region of SEMA6B orthologs displayed defective development of brain neurons and enhanced pentylenetetrazole-induced seizure behavior. In summary, we show that truncating DNVs in the final exon of SEMA6B cause progressive myoclonic epilepsy.
The identification of disease alleles underlying human autoinflammatory diseases can provide important insights into the mechanisms that maintain neutrophil homeostasis. Here, we focused our attention on generalized pustular psoriasis (GPP), a potentially life-threatening disorder presenting with cutaneous and systemic neutrophilia. Following the whole-exome sequencing of 19 unrelated affected individuals, we identified a subject harboring a homozygous splice-site mutation (c.2031-2A>C) in MPO. This encodes myeloperoxidase, an essential component of neutrophil azurophil granules. MPO screening in conditions phenotypically related to GPP uncovered further disease alleles in one subject with acral pustular psoriasis (c.2031-2A>C;c.2031-2A>C) and in two individuals with acute generalized exanthematous pustulosis (c.1705C>T;c.2031-2A>C and c.1552_1565del;c.1552_1565del). A subsequent analysis of UK Biobank data demonstrated that the c.2031-2A>C and c.1705C>T (p.Arg569Trp) disease alleles were also associated with increased neutrophil abundance in the general population (p = 5.1 × 10-6 and p = 3.6 × 10-5, respectively). The same applied to three further deleterious variants that had been genotyped in the cohort, with two alleles (c.995C>T [p.Ala332Val] and c.752T>C [p.Met251Thr]) yielding p values < 10-10. Finally, treatment of healthy neutrophils with an MPO inhibitor (4-Aminobenzoic acid hydrazide) increased cell viability and delayed apoptosis, highlighting a mechanism whereby MPO mutations affect granulocyte numbers. These findings identify MPO as a genetic determinant of pustular skin disease and neutrophil abundance. Given the recent interest in the development of MPO antagonists for the treatment of neurodegenerative disease, our results also suggest that the pro-inflammatory effects of these agents should be closely monitored.
We ascertained a multi-generation Malaysian family with Joubert syndrome (JS). The presence of asymptomatic obligate carrier females suggested an X-linked recessive inheritance pattern. Affected males presented with mental retardation accompanied by postaxial polydactyly and retinitis pigmentosa. Brain MRIs showed the presence of a "molar tooth sign," which classifies this syndrome as classic JS with retinal involvement. Linkage analysis showed linkage to Xpter-Xp22.2 and a maximum LOD score of 2.06 for marker DXS8022. Mutation analysis revealed a frameshift mutation, p.K948NfsX8, in exon 21 of OFD1. In an isolated male with JS, a second frameshift mutation, p.E923KfsX3, in the same exon was identified. OFD1 has previously been associated with oral-facial-digital type 1 (OFD1) syndrome, a male-lethal X-linked dominant condition, and with X-linked recessive Simpson-Golabi-Behmel syndrome type 2 (SGBS2). In a yeast two-hybrid screen of a retinal cDNA library, we identified OFD1 as an interacting partner of the LCA5-encoded ciliary protein lebercilin. We show that X-linked recessive mutations in OFD1 reduce, but do not eliminate, the interaction with lebercilin, whereas X-linked dominant OFD1 mutations completely abolish binding to lebercilin. In addition, recessive mutations in OFD1 did not affect the pericentriolar localization of the recombinant protein in hTERT-RPE1 cells, whereas this localization was lost for dominant mutations. These findings offer a molecular explanation for the phenotypic spectrum observed for OFD1 mutations; this spectrum now includes OFD1 syndrome, SGBS2, and JS.
Previous research has shown that polygenic risk scores (PRSs) can be used to stratify women according to their risk of developing primary invasive breast cancer. This study aimed to evaluate the association between a recently validated PRS of 313 germline variants (PRS313) and contralateral breast cancer (CBC) risk. We included 56,068 women of European ancestry diagnosed with first invasive breast cancer from 1990 onward with follow-up from the Breast Cancer Association Consortium. Metachronous CBC risk (N = 1,027) according to the distribution of PRS313 was quantified using Cox regression analyses. We assessed PRS313 interaction with age at first diagnosis, family history, morphology, ER status, PR status, and HER2 status, and (neo)adjuvant therapy. In studies of Asian women, with limited follow-up, CBC risk associated with PRS313 was assessed using logistic regression for 340 women with CBC compared with 12,133 women with unilateral breast cancer. Higher PRS313 was associated with increased CBC risk: hazard ratio per standard deviation (SD) = 1.25 (95%CI = 1.18-1.33) for Europeans, and an OR per SD = 1.15 (95%CI = 1.02-1.29) for Asians. The absolute lifetime risks of CBC, accounting for death as competing risk, were 12.4% for European women at the 10th percentile and 20.5% at the 90th percentile of PRS313. We found no evidence of confounding by or interaction with individual characteristics, characteristics of the primary tumor, or treatment. The C-index for the PRS313 alone was 0.563 (95%CI = 0.547-0.586). In conclusion, PRS313 is an independent factor associated with CBC risk and can be incorporated into CBC risk prediction models to help improve stratification and optimize surveillance and treatment strategies.
Whereas large-scale statistical analyses can robustly identify disease-gene relationships, they do not accurately capture genotype-phenotype correlations or disease mechanisms. We use multiple lines of independent evidence to show that different variant types in a single gene, SATB1, cause clinically overlapping but distinct neurodevelopmental disorders. Clinical evaluation of 42 individuals carrying SATB1 variants identified overt genotype-phenotype relationships, associated with different pathophysiological mechanisms, established by functional assays. Missense variants in the CUT1 and CUT2 DNA-binding domains result in stronger chromatin binding, increased transcriptional repression, and a severe phenotype. In contrast, variants predicted to result in haploinsufficiency are associated with a milder clinical presentation. A similarly mild phenotype is observed for individuals with premature protein truncating variants that escape nonsense-mediated decay, which are transcriptionally active but mislocalized in the cell. Our results suggest that in-depth mutation-specific genotype-phenotype studies are essential to capture full disease complexity and to explain phenotypic variability.
Progressive myoclonus epilepsies (PMEs) comprise a group of clinically and genetically heterogeneous rare diseases. Over 70% of PME cases can now be molecularly solved. Known PME genes encode a variety of proteins, many involved in lysosomal and endosomal function. We performed whole-exome sequencing (WES) in 84 (78 unrelated) unsolved PME-affected individuals, with or without additional family members, to discover novel causes. We identified likely disease-causing variants in 24 out of 78 (31%) unrelated individuals, despite previous genetic analyses. The diagnostic yield was significantly higher for individuals studied as trios or families (14/28) versus singletons (10/50) (OR = 3.9, p value = 0.01, Fisher's exact test). The 24 likely solved cases of PME involved 18 genes. First, we found and functionally validated five heterozygous variants in NUS1 and DHDDS and a homozygous variant in ALG10, with no previous disease associations. All three genes are involved in dolichol-dependent protein glycosylation, a pathway not previously implicated in PME. Second, we independently validate SEMA6B as a dominant PME gene in two unrelated individuals. Third, in five families, we identified variants in established PME genes; three with intronic or copy-number changes (CLN6, GBA, NEU1) and two very rare causes (ASAH1, CERS1). Fourth, we found a group of genes usually associated with developmental and epileptic encephalopathies, but here, remarkably, presenting as PME, with or without prior developmental delay. Our systematic analysis of these cases suggests that the small residuum of unsolved cases will most likely be a collection of very rare, genetically heterogeneous etiologies.
We describe four families with affected siblings showing unique clinical features: early-onset (before 1 year of age) progressive diffuse brain atrophy with regression, postnatal microcephaly, postnatal growth retardation, muscle weakness/atrophy, and respiratory failure. By whole-exome sequencing, we identified biallelic TBCD mutations in eight affected individuals from the four families. TBCD encodes TBCD (tubulin folding co-factor D), which is one of five tubulin-specific chaperones playing a pivotal role in microtubule assembly in all cells. A total of seven mutations were found: five missense mutations, one nonsense, and one splice site mutation resulting in a frameshift. In vitro cell experiments revealed the impaired binding between most mutant TBCD proteins and ARL2, TBCE, and β-tubulin. The in vivo experiments using olfactory projection neurons in Drosophila melanogaster indicated that the TBCD mutations caused loss of function. The wide range of clinical severity seen in this neurodegenerative encephalopathy may result from the residual function of mutant TBCD proteins. Furthermore, the autopsied brain from one deceased individual showed characteristic neurodegenerative findings: cactus and somatic sprout formations in the residual Purkinje cells in the cerebellum, which are also seen in some diseases associated with mitochondrial impairment. Defects of microtubule formation caused by TBCD mutations may underlie the pathomechanism of this neurodegenerative encephalopathy.
The ABO, MN and Rh blood types, and the Hp, Tf, and Gm [Gm (a), Gm (x), Gm(b), and Gm-like] factors were determined for 128 unrelated Indians (parents of families, 63 with two parents tested and two with one parent tested), and 90 unrelated Chinese (parents of 46 families, 44 with two parents tested and two with one parent tested), and for the offspring from these families. The frequencies of the several blood types are presented. They were done primarily to aid in paternity testing. They compare favorably with the findings of previous studies. The allele Hp1 is rare in the Indian population (.09) and relatively infrequent in the Chinese (.29). Unfortunately, the data shed no light on the problem of the inheritance of the phenotype Hp O. Only Tf C was found among the Indians. About four per cent of the Chinese were heterozygous for Tf CD,, all other were Tf CC. The Indians have a high frequency of Gm(a) and of Gm (x), and a low frequency of Gm (b). They appear to have alleles Gma, Gmax, and Gmb in the following frequencies: .535, .234(5), and .230(5), respectively. Three families appear to have a GMxb allele, providing the offspring are not extra-marital. The Chinese appear to have the alleles Gm^ab, Gm^a, and Gm^ax in the following frequencies: .741, .231, and .028, respectively.
Genetic risk prediction has several potential applications in medical research and clinical practice and could be used, for example, to stratify a heterogeneous population of patients by their predicted genetic risk. However, for polygenic traits, such as psychiatric disorders, the accuracy of risk prediction is low. Here we use a multivariate linear mixed model and apply multi-trait genomic best linear unbiased prediction for genetic risk prediction. This method exploits correlations between disorders and simultaneously evaluates individual risk for each disorder. We show that the multivariate approach significantly increases the prediction accuracy for schizophrenia, bipolar disorder, and major depressive disorder in the discovery as well as in independent validation datasets. By grouping SNPs based on genome annotation and fitting multiple random effects, we show that the prediction accuracy could be further improved. The gain in prediction accuracy of the multivariate approach is equivalent to an increase in sample size of 34% for schizophrenia, 68% for bipolar disorder, and 76% for major depressive disorders using single trait models. Because our approach can be readily applied to any number of GWAS datasets of correlated traits, it is a flexible and powerful tool to maximize prediction accuracy. With current sample size, risk predictors are not useful in a clinical setting but already are a valuable research tool, for example in experimental designs comparing cases with high and low polygenic risk.
RAC1 is a widely studied Rho GTPase, a class of molecules that modulate numerous cellular functions essential for normal development. RAC1 is highly conserved across species and is under strict mutational constraint. We report seven individuals with distinct de novo missense RAC1 mutations and varying degrees of developmental delay, brain malformations, and additional phenotypes. Four individuals, each harboring one of c.53G>A (p.Cys18Tyr), c.116A>G (p.Asn39Ser), c.218C>T (p.Pro73Leu), and c.470G>A (p.Cys157Tyr) variants, were microcephalic, with head circumferences between -2.5 to -5 SD. In contrast, two individuals with c.151G>A (p.Val51Met) and c.151G>C (p.Val51Leu) alleles were macrocephalic with head circumferences of +4.16 and +4.5 SD. One individual harboring a c.190T>G (p.Tyr64Asp) allele had head circumference in the normal range. Collectively, we observed an extraordinary spread of ∼10 SD of head circumferences orchestrated by distinct mutations in the same gene. In silico modeling, mouse fibroblasts spreading assays, and in vivo overexpression assays using zebrafish as a surrogate model demonstrated that the p.Cys18Tyr and p.Asn39Ser RAC1 variants function as dominant-negative alleles and result in microcephaly, reduced neuronal proliferation, and cerebellar abnormalities in vivo. Conversely, the p.Tyr64Asp substitution is constitutively active. The remaining mutations are probably weakly dominant negative or their effects are context dependent. These findings highlight the importance of RAC1 in neuronal development. Along with TRIO and HACE1, a sub-category of rare developmental disorders is emerging with RAC1 as the central player. We show that ultra-rare disorders caused by private, non-recurrent missense mutations that result in varying phenotypes are challenging to dissect, but can be delineated through focused international collaboration.
Adaptor protein complex 1 (AP-1) is an evolutionary conserved heterotetramer that promotes vesicular trafficking between the trans-Golgi network and the endosomes. The knockout of most murine AP-1 complex subunits is embryonically lethal, so the identification of human disease-associated alleles has the unique potential to deliver insights into gene function. Here, we report two founder mutations (c.11T>G [p.Phe4Cys] and c.97C>T [p.Arg33Trp]) in AP1S3, the gene encoding AP-1 complex subunit σ1C, in 15 unrelated individuals with a severe autoinflammatory skin disorder known as pustular psoriasis. Because the variants are predicted to destabilize the 3D structure of the AP-1 complex, we generated AP1S3-knockdown cell lines to investigate the consequences of AP-1 deficiency in skin keratinocytes. We found that AP1S3 silencing disrupted the endosomal translocation of the innate pattern-recognition receptor TLR-3 (Toll-like receptor 3) and resulted in a marked inhibition of downstream signaling. These findings identify pustular psoriasis as an autoinflammatory phenotype caused by defects in vesicular trafficking and demonstrate a requirement of AP-1 for Toll-like receptor homeostasis.
Leucine zipper-EF-hand containing transmembrane protein 1 (LETM1) encodes an inner mitochondrial membrane protein with an osmoregulatory function controlling mitochondrial volume and ion homeostasis. The putative association of LETM1 with a human disease was initially suggested in Wolf-Hirschhorn syndrome, a disorder that results from de novo monoallelic deletion of chromosome 4p16.3, a region encompassing LETM1. Utilizing exome sequencing and international gene-matching efforts, we have identified 18 affected individuals from 11 unrelated families harboring ultra-rare bi-allelic missense and loss-of-function LETM1 variants and clinical presentations highly suggestive of mitochondrial disease. These manifested as a spectrum of predominantly infantile-onset (14/18, 78%) and variably progressive neurological, metabolic, and dysmorphic symptoms, plus multiple organ dysfunction associated with neurodegeneration. The common features included respiratory chain complex deficiencies (100%), global developmental delay (94%), optic atrophy (83%), sensorineural hearing loss (78%), and cerebellar ataxia (78%) followed by epilepsy (67%), spasticity (53%), and myopathy (50%). Other features included bilateral cataracts (42%), cardiomyopathy (36%), and diabetes (27%). To better understand the pathogenic mechanism of the identified LETM1 variants, we performed biochemical and morphological studies on mitochondrial K+/H+ exchange activity, proteins, and shape in proband-derived fibroblasts and muscles and in Saccharomyces cerevisiae, which is an important model organism for mitochondrial osmotic regulation. Our results demonstrate that bi-allelic LETM1 variants are associated with defective mitochondrial K+ efflux, swollen mitochondrial matrix structures, and loss of important mitochondrial oxidative phosphorylation protein components, thus highlighting the implication of perturbed mitochondrial osmoregulation caused by LETM1 variants in neurological and mitochondrial pathologies.
A combination of genetic and functional approaches has identified three independent breast cancer risk loci at 2q35. A recent fine-scale mapping analysis to refine these associations resulted in 1 (signal 1), 5 (signal 2), and 42 (signal 3) credible causal variants at these loci. We used publicly available in silico DNase I and ChIP-seq data with in vitro reporter gene and CRISPR assays to annotate signals 2 and 3. We identified putative regulatory elements that enhanced cell-type-specific transcription from the IGFBP5 promoter at both signals (30- to 40-fold increased expression by the putative regulatory element at signal 2, 2- to 3-fold by the putative regulatory element at signal 3). We further identified one of the five credible causal variants at signal 2, a 1.4 kb deletion (esv3594306), as the likely causal variant; the deletion allele of this variant was associated with an average additional increase in IGFBP5 expression of 1.3-fold (MCF-7) and 2.2-fold (T-47D). We propose a model in which the deletion allele of esv3594306 juxtaposes two transcription factor binding regions (annotated by estrogen receptor alpha ChIP-seq peaks) to generate a single extended regulatory element. This regulatory element increases cell-type-specific expression of the tumor suppressor gene IGFBP5 and, thereby, reduces risk of estrogen receptor-positive breast cancer (odds ratio = 0.77, 95% CI 0.74-0.81, p = 3.1 × 10-31).
Genome-wide association studies (GWASs) have revealed SNP rs889312 on 5q11.2 to be associated with breast cancer risk in women of European ancestry. In an attempt to identify the biologically relevant variants, we analyzed 909 genetic variants across 5q11.2 in 103,991 breast cancer individuals and control individuals from 52 studies in the Breast Cancer Association Consortium. Multiple logistic regression analyses identified three independent risk signals: the strongest associations were with 15 correlated variants (iCHAV1), where the minor allele of the best candidate, rs62355902, associated with significantly increased risks of both estrogen-receptor-positive (ER(+): odds ratio [OR] = 1.24, 95% confidence interval [CI] = 1.21-1.27, ptrend = 5.7 × 10(-44)) and estrogen-receptor-negative (ER(-): OR = 1.10, 95% CI = 1.05-1.15, ptrend = 3.0 × 10(-4)) tumors. After adjustment for rs62355902, we found evidence of association of a further 173 variants (iCHAV2) containing three subsets with a range of effects (the strongest was rs113317823 [pcond = 1.61 × 10(-5)]) and five variants composing iCHAV3 (lead rs11949391; ER(+): OR = 0.90, 95% CI = 0.87-0.93, pcond = 1.4 × 10(-4)). Twenty-six percent of the prioritized candidate variants coincided with four putative regulatory elements that interact with the MAP3K1 promoter through chromatin looping and affect MAP3K1 promoter activity. Functional analysis indicated that the cancer risk alleles of four candidates (rs74345699 and rs62355900 [iCHAV1], rs16886397 [iCHAV2a], and rs17432750 [iCHAV3]) increased MAP3K1 transcriptional activity. Chromatin immunoprecipitation analysis revealed diminished GATA3 binding to the minor (cancer-protective) allele of rs17432750, indicating a mechanism for its action. We propose that the cancer risk alleles act to increase MAP3K1 expression in vivo and might promote breast cancer cell survival.
Genome-wide association studies have identified SNPs near ZNF365 at 10q21.2 that are associated with both breast cancer risk and mammographic density. To identify the most likely causal SNPs, we fine mapped the association signal by genotyping 428 SNPs across the region in 89,050 European and 12,893 Asian case and control subjects from the Breast Cancer Association Consortium. We identified four independent sets of correlated, highly trait-associated variants (iCHAVs), three of which were located within ZNF365. The most strongly risk-associated SNP, rs10995201 in iCHAV1, showed clear evidence of association with both estrogen receptor (ER)-positive (OR = 0.85 [0.82-0.88]) and ER-negative (OR = 0.87 [0.82-0.91]) disease, and was also the SNP most strongly associated with percent mammographic density. iCHAV2 (lead SNP, chr10: 64,258,684:D) and iCHAV3 (lead SNP, rs7922449) were also associated with ER-positive (OR = 0.93 [0.91-0.95] and OR = 1.06 [1.03-1.09]) and ER-negative (OR = 0.95 [0.91-0.98] and OR = 1.08 [1.04-1.13]) disease. There was weaker evidence for iCHAV4, located 5' of ADO, associated only with ER-positive breast cancer (OR = 0.93 [0.90-0.96]). We found 12, 17, 18, and 2 candidate causal SNPs for breast cancer in iCHAVs 1-4, respectively. Chromosome conformation capture analysis showed that iCHAV2 interacts with the ZNF365 and NRBF2 (more than 600 kb away) promoters in normal and cancerous breast epithelial cells. Luciferase assays did not identify SNPs that affect transactivation of ZNF365, but identified a protective haplotype in iCHAV2, associated with silencing of the NRBF2 promoter, implicating this gene in the etiology of breast cancer.