Many algorithms to detect copy number variations (CNVs) using exome sequencing (ES) data have been reported and evaluated on their sensitivity and specificity, reproducibility, and precision. However, operational optimization of such algorithms for a better performance has not been fully addressed. ES of 1199 samples including 763 patients with different disease profiles was performed. ES data were analyzed to detect CNVs by both the eXome Hidden Markov Model (XHMM) and modified Nord's method. To efficiently detect rare CNVs, we aimed to decrease sequencing biases by analyzing, at the same time, the data of all unrelated samples sequenced in the same flow cell as a batch, and to eliminate sex effects of X-linked CNVs by analyzing female and male sequences separately. We also applied several filtering steps for more efficient CNV selection. The average number of CNVs detected in one sample was <5. This optimization together with targeted CNV analysis by Nord's method identified pathogenic/likely pathogenic CNVs in 34 patients (4.5%, 34/763). In particular, among 142 patients with epilepsy, the current protocol detected clinically relevant CNVs in 19 (13.4%) patients, whereas the previous protocol identified them in only 14 (9.9%) patients. Thus, this batch-based XHMM analysis efficiently selected rare pathogenic CNVs in genetic diseases.
Coffin-Siris syndrome (CSS, MIM#135900) is a congenital disorder characterized by coarse facial features, intellectual disability, and hypoplasia of the fifth digit and nails. Pathogenic variants for CSS have been found in genes encoding proteins in the BAF (BRG1-associated factor) chromatin-remodeling complex. To date, more than 150 CSS patients with pathogenic variants in nine BAF-related genes have been reported. We previously reported 71 patients of whom 39 had pathogenic variants. Since then, we have recruited an additional 182 CSS-suspected patients. We performed comprehensive genetic analysis on these 182 patients and on the previously unresolved 32 patients, targeting pathogenic single nucleotide variants, short insertions/deletions and copy number variations (CNVs). We confirmed 78 pathogenic variations in 78 patients. Pathogenic variations in ARID1B, SMARCB1, SMARCA4, ARID1A, SOX11, SMARCE1, and PHF6 were identified in 48, 8, 7, 6, 4, 1, and 1 patients, respectively. In addition, we found three CNVs including SMARCA2. Of particular note, we found a partial deletion of SMARCB1 in one CSS patient and we thoroughly investigated the resulting abnormal transcripts.