We report heterozygous CELF2 (NM_006561.3) variants in five unrelated individuals: Individuals 1-4 exhibited developmental and epileptic encephalopathy (DEE) and Individual 5 had intellectual disability and autistic features. CELF2 encodes a nucleocytoplasmic shuttling RNA-binding protein that has multiple roles in RNA processing and is involved in the embryonic development of the central nervous system and heart. Whole-exome sequencing identified the following CELF2 variants: two missense variants [c.1558C>T:p.(Pro520Ser) in unrelated Individuals 1 and 2, and c.1516C>G:p.(Arg506Gly) in Individual 3], one frameshift variant in Individual 4 that removed the last amino acid of CELF2 c.1562dup:p.(Tyr521Ter), possibly resulting in escape from nonsense-mediated mRNA decay (NMD), and one canonical splice site variant, c.272-1G>C in Individual 5, also probably leading to NMD. The identified variants in Individuals 1, 2, 4, and 5 were de novo, while the variant in Individual 3 was inherited from her mosaic mother. Notably, all identified variants, except for c.272-1G>C, were clustered within 20 amino acid residues of the C-terminus, which might be a nuclear localization signal. We demonstrated the extranuclear mislocalization of mutant CELF2 protein in cells transfected with mutant CELF2 complementary DNA plasmids. Our findings indicate that CELF2 variants that disrupt its nuclear localization are associated with DEE.
Many algorithms to detect copy number variations (CNVs) using exome sequencing (ES) data have been reported and evaluated on their sensitivity and specificity, reproducibility, and precision. However, operational optimization of such algorithms for a better performance has not been fully addressed. ES of 1199 samples including 763 patients with different disease profiles was performed. ES data were analyzed to detect CNVs by both the eXome Hidden Markov Model (XHMM) and modified Nord's method. To efficiently detect rare CNVs, we aimed to decrease sequencing biases by analyzing, at the same time, the data of all unrelated samples sequenced in the same flow cell as a batch, and to eliminate sex effects of X-linked CNVs by analyzing female and male sequences separately. We also applied several filtering steps for more efficient CNV selection. The average number of CNVs detected in one sample was <5. This optimization together with targeted CNV analysis by Nord's method identified pathogenic/likely pathogenic CNVs in 34 patients (4.5%, 34/763). In particular, among 142 patients with epilepsy, the current protocol detected clinically relevant CNVs in 19 (13.4%) patients, whereas the previous protocol identified them in only 14 (9.9%) patients. Thus, this batch-based XHMM analysis efficiently selected rare pathogenic CNVs in genetic diseases.
Whereas large-scale statistical analyses can robustly identify disease-gene relationships, they do not accurately capture genotype-phenotype correlations or disease mechanisms. We use multiple lines of independent evidence to show that different variant types in a single gene, SATB1, cause clinically overlapping but distinct neurodevelopmental disorders. Clinical evaluation of 42 individuals carrying SATB1 variants identified overt genotype-phenotype relationships, associated with different pathophysiological mechanisms, established by functional assays. Missense variants in the CUT1 and CUT2 DNA-binding domains result in stronger chromatin binding, increased transcriptional repression, and a severe phenotype. In contrast, variants predicted to result in haploinsufficiency are associated with a milder clinical presentation. A similarly mild phenotype is observed for individuals with premature protein truncating variants that escape nonsense-mediated decay, which are transcriptionally active but mislocalized in the cell. Our results suggest that in-depth mutation-specific genotype-phenotype studies are essential to capture full disease complexity and to explain phenotypic variability.