METHODS: We used a panel of 34 putative susceptibility genes to perform sequencing on samples from 60,466 women with breast cancer and 53,461 controls. In separate analyses for protein-truncating variants and rare missense variants in these genes, we estimated odds ratios for breast cancer overall and tumor subtypes. We evaluated missense-variant associations according to domain and classification of pathogenicity.
RESULTS: Protein-truncating variants in 5 genes (ATM, BRCA1, BRCA2, CHEK2, and PALB2) were associated with a risk of breast cancer overall with a P value of less than 0.0001. Protein-truncating variants in 4 other genes (BARD1, RAD51C, RAD51D, and TP53) were associated with a risk of breast cancer overall with a P value of less than 0.05 and a Bayesian false-discovery probability of less than 0.05. For protein-truncating variants in 19 of the remaining 25 genes, the upper limit of the 95% confidence interval of the odds ratio for breast cancer overall was less than 2.0. For protein-truncating variants in ATM and CHEK2, odds ratios were higher for estrogen receptor (ER)-positive disease than for ER-negative disease; for protein-truncating variants in BARD1, BRCA1, BRCA2, PALB2, RAD51C, and RAD51D, odds ratios were higher for ER-negative disease than for ER-positive disease. Rare missense variants (in aggregate) in ATM, CHEK2, and TP53 were associated with a risk of breast cancer overall with a P value of less than 0.001. For BRCA1, BRCA2, and TP53, missense variants (in aggregate) that would be classified as pathogenic according to standard criteria were associated with a risk of breast cancer overall, with the risk being similar to that of protein-truncating variants.
CONCLUSIONS: The results of this study define the genes that are most clinically useful for inclusion on panels for the prediction of breast cancer risk, as well as provide estimates of the risks associated with protein-truncating variants, to guide genetic counseling. (Funded by European Union Horizon 2020 programs and others.).
RESULTS: In this study we generated Whole Exome Sequencing (WES), Reduced Representation Bisulfite Sequencing (RRBS) and RNA sequencing (RNA-seq) data from samples with known mixtures of mouse and human DNA or RNA and from a cohort of human breast cancers and their derived PDTXs. We show that using an In silico Combined human-mouse Reference Genome (ICRG) for alignment discriminates between human and mouse reads with up to 99.9% accuracy and decreases the number of false positive somatic mutations caused by misalignment by >99.9%. We also derived a model to estimate the human DNA content in independent PDTX samples. For RNA-seq and RRBS data analysis, the use of the ICRG allows dissecting computationally the transcriptome and methylome of human tumour cells and mouse stroma. In a direct comparison with previously reported approaches, our method showed similar or higher accuracy while requiring significantly less computing time.
CONCLUSIONS: The computational pipeline we describe here is a valuable tool for the molecular analysis of PDTXs as well as any other mixture of DNA or RNA species.