The use of single nucleotide polymorphisms (SNPs) in forensic genetics has been limited to challenged samples with low template and/or degraded DNA. The recent introduction of massively parallel sequencing (MPS) technologies has expanded the potential applications of these markers and increased the discrimination power of well-established loci by considering variation in the flanking regions of target loci. The ForenSeq Signature Preparation Kit contains 165 SNP amplicons for ancestry- (aiSNPs), identity- (iiSNPs), and phenotype-inference (piSNPs). In this study, 714 individuals from four major populations (African American, AFA; East Asian, ASN; US Caucasian, CAU; and Southwest US Hispanic, HIS) previously reported by Churchill et al. [Forensic Sci Int Genet. 30 (2017) 81-92; DOI: https://doi.org/10.1016/j.fsigen.2017.06.004] were assessed using STRait Razor v2s to determine the level of diversity in the flanking regions of these amplicons. The results show that nearly 70% of loci showed some level of flanking region variation with 22 iiSNPs and 8 aiSNPs categorized as microhaplotypes in this study. The heterozygosities of these microhaplotypes approached, and in one instance surpassed, those of some core STR loci. Also, the impact of the flanking region on other forensic parameters (e.g., power of exclusion and power of discrimination) was examined. Sixteen of the 94 iiSNPs had an effective allele number greater than 2.00 across the four populations. To assess what effect the flanking region information had on the ancestry inference, genotype probabilities and likelihood ratios were determined. Additionally, concordance with the ForenSeq UAS and Nextera Rapid Capture was evaluated, and patterns of heterozygote imbalance were identified. Pairwise comparison of the iiSNP diplotypes determined the probability of detecting a mixture (i.e., observing ≥ 3 haplotypes) using these loci alone was 0.9952. The improvement in random match probabilities for the full regions over the target iiSNPs was found to be significant. When combining the iiSNPs with the autosomal STRs, the combined match probabilities ranged from 6.40 × 10-73 (ASN) to 1.02 × 10-79 (AFA).
Massively parallel sequencing (MPS) can identify sequence variation within short tandem repeat (STR) alleles as well as their nominal allele lengths that traditionally have been obtained by capillary electrophoresis. Using the MiSeq FGx Forensic Genomics System (Illumina), STRait Razor, and in-house excel workbooks, genetic variation was characterized within STR repeat and flanking regions of 27 autosomal, 7 X-chromosome and 24 Y-chromosome STR markers in 777 unrelated individuals from four population groups. Seven hundred and forty six autosomal, 227 X-chromosome, and 324 Y-chromosome STR alleles were identified by sequence compared with 357 autosomal, 107 X-chromosome, and 189 Y-chromosome STR alleles that were identified by length. Within the observed sequence variation, 227 autosomal, 156 X-chromosome, and 112 Y-chromosome novel alleles were identified and described. One hundred and seventy six autosomal, 123 X-chromosome, and 93 Y-chromosome sequence variants resided within STR repeat regions, and 86 autosomal, 39 X-chromosome, and 20 Y-chromosome variants were located in STR flanking regions. Three markers, D18S51, DXS10135, and DYS385a-b had 1, 4, and 1 alleles, respectively, which contained both a novel repeat region variant and a flanking sequence variant in the same nucleotide sequence. There were 50 markers that demonstrated a relative increase in diversity with the variant sequence alleles compared with those of traditional nominal length alleles. These population data illustrate the genetic variation that exists in the commonly used STR markers in the selected population samples and provide allele frequencies for statistical calculations related to STR profiling with MPS data.
The MiSeq FGx Forensic Genomics System (Illumina) enables amplification and massively parallel sequencing of 59 STRs, 94 identity informative SNPs, 54 ancestry informative SNPs, and 24 phenotypic informative SNPs. Allele frequency and population statistics data were generated for the 172 SNP loci included in this panel on four major population groups (Chinese, African Americans, US Caucasians, and Southwest Hispanics). Single-locus and combined random match probability values were generated for the identity informative SNPs. The average combined STR and identity informative SNP random match probabilities (assuming independence) across all four populations were 1.75E-67 and 2.30E-71 with length-based and sequence-based STR alleles, respectively. Ancestry and phenotype predictions were obtained using the ForenSeq™ Universal Analysis System (UAS; Illumina) based on the ancestry informative and phenotype informative SNP profiles generated for each sample. Additionally, performance metrics, including profile completeness, read depth, relative locus performance, and allele coverage ratios, were evaluated and detailed for the 725 samples included in this study. While some genetic markers included in this panel performed notably better than others, performance across populations was generally consistent. The performance and population data included in this study support that accurate and reliable profiles were generated and provide valuable background information for laboratories considering internal validation studies and implementation.