Allele frequencies for the 13 CODIS (Combined DNA Index System, USA) STR loci included in the AmpFISTR Profiler Plus and AmpFISTR Cofiler kits (Applied Biosystems, Foster City, USA) were determined in a sample of 197 unrelated Malays in Singapore.
The use of single nucleotide polymorphisms (SNPs) in forensic genetics has been limited to challenged samples with low template and/or degraded DNA. The recent introduction of massively parallel sequencing (MPS) technologies has expanded the potential applications of these markers and increased the discrimination power of well-established loci by considering variation in the flanking regions of target loci. The ForenSeq Signature Preparation Kit contains 165 SNP amplicons for ancestry- (aiSNPs), identity- (iiSNPs), and phenotype-inference (piSNPs). In this study, 714 individuals from four major populations (African American, AFA; East Asian, ASN; US Caucasian, CAU; and Southwest US Hispanic, HIS) previously reported by Churchill et al. [Forensic Sci Int Genet. 30 (2017) 81-92; DOI: https://doi.org/10.1016/j.fsigen.2017.06.004] were assessed using STRait Razor v2s to determine the level of diversity in the flanking regions of these amplicons. The results show that nearly 70% of loci showed some level of flanking region variation with 22 iiSNPs and 8 aiSNPs categorized as microhaplotypes in this study. The heterozygosities of these microhaplotypes approached, and in one instance surpassed, those of some core STR loci. Also, the impact of the flanking region on other forensic parameters (e.g., power of exclusion and power of discrimination) was examined. Sixteen of the 94 iiSNPs had an effective allele number greater than 2.00 across the four populations. To assess what effect the flanking region information had on the ancestry inference, genotype probabilities and likelihood ratios were determined. Additionally, concordance with the ForenSeq UAS and Nextera Rapid Capture was evaluated, and patterns of heterozygote imbalance were identified. Pairwise comparison of the iiSNP diplotypes determined the probability of detecting a mixture (i.e., observing ≥ 3 haplotypes) using these loci alone was 0.9952. The improvement in random match probabilities for the full regions over the target iiSNPs was found to be significant. When combining the iiSNPs with the autosomal STRs, the combined match probabilities ranged from 6.40 × 10-73 (ASN) to 1.02 × 10-79 (AFA).
Massively parallel sequencing (MPS) can identify sequence variation within short tandem repeat (STR) alleles as well as their nominal allele lengths that traditionally have been obtained by capillary electrophoresis. Using the MiSeq FGx Forensic Genomics System (Illumina), STRait Razor, and in-house excel workbooks, genetic variation was characterized within STR repeat and flanking regions of 27 autosomal, 7 X-chromosome and 24 Y-chromosome STR markers in 777 unrelated individuals from four population groups. Seven hundred and forty six autosomal, 227 X-chromosome, and 324 Y-chromosome STR alleles were identified by sequence compared with 357 autosomal, 107 X-chromosome, and 189 Y-chromosome STR alleles that were identified by length. Within the observed sequence variation, 227 autosomal, 156 X-chromosome, and 112 Y-chromosome novel alleles were identified and described. One hundred and seventy six autosomal, 123 X-chromosome, and 93 Y-chromosome sequence variants resided within STR repeat regions, and 86 autosomal, 39 X-chromosome, and 20 Y-chromosome variants were located in STR flanking regions. Three markers, D18S51, DXS10135, and DYS385a-b had 1, 4, and 1 alleles, respectively, which contained both a novel repeat region variant and a flanking sequence variant in the same nucleotide sequence. There were 50 markers that demonstrated a relative increase in diversity with the variant sequence alleles compared with those of traditional nominal length alleles. These population data illustrate the genetic variation that exists in the commonly used STR markers in the selected population samples and provide allele frequencies for statistical calculations related to STR profiling with MPS data.
The MiSeq FGx Forensic Genomics System (Illumina) enables amplification and massively parallel sequencing of 59 STRs, 94 identity informative SNPs, 54 ancestry informative SNPs, and 24 phenotypic informative SNPs. Allele frequency and population statistics data were generated for the 172 SNP loci included in this panel on four major population groups (Chinese, African Americans, US Caucasians, and Southwest Hispanics). Single-locus and combined random match probability values were generated for the identity informative SNPs. The average combined STR and identity informative SNP random match probabilities (assuming independence) across all four populations were 1.75E-67 and 2.30E-71 with length-based and sequence-based STR alleles, respectively. Ancestry and phenotype predictions were obtained using the ForenSeq™ Universal Analysis System (UAS; Illumina) based on the ancestry informative and phenotype informative SNP profiles generated for each sample. Additionally, performance metrics, including profile completeness, read depth, relative locus performance, and allele coverage ratios, were evaluated and detailed for the 725 samples included in this study. While some genetic markers included in this panel performed notably better than others, performance across populations was generally consistent. The performance and population data included in this study support that accurate and reliable profiles were generated and provide valuable background information for laboratories considering internal validation studies and implementation.
Mitochondrial DNA sequences of the hypervariable regions HV1 and HV2 were analyzed in 205 unrelated ethnic Malays residing in Singapore as an initial effort to generate a database for forensic identification purposes. Sequence polymorphism was detected using PCR and direct sequencing analysis. A total of 152 haplotypes was found containing 152 polymorphisms. Out of the 152 haplotypes, 115 were observed only once and 37 types were seen in multiple individuals. The most common haplotype (16223T, 16295T, 16362C, 73G, 146C, 199C, 263G, and 315.1C) was shared by 7 (3.41%) individuals, two haplotypes were shared by 4 individuals, seven haplotypes were shared by 3 individuals, and 27 haplotypes by 2 individuals. Haplotype diversity and random match probability were estimated to be 0.9961% and 0.87%, respectively.
The emergence of Massively Parallel Sequencing technologies enabled the analysis of full mitochondrial (mt)DNA sequences from forensically relevant samples that have, so far, only been typed in the control region or its hypervariable segments. In this study, we evaluated the performance of a commercially available multiplex-PCR-based assay, the Precision ID mtDNA Whole Genome Panel (Thermo Fisher Scientific), for the amplification and sequencing of the entire mitochondrial genome (mitogenome) from even degraded forensic specimens. For this purpose, more than 500 samples from 24 different populations were selected to cover the vast majority of established superhaplogroups. These are known to harbor different signature sequence motifs corresponding to their phylogenetic background that could have an effect on primer binding and, thus, could limit a broad application of this molecular genetic tool. The selected samples derived from various forensically relevant tissue sources and were DNA extracted using different methods. We evaluated sequence concordance and heteroplasmy detection and compared the findings to conventional Sanger sequencing as well as an orthogonal MPS platform. We discuss advantages and limitations of this approach with respect to forensic genetic workflow and analytical requirements.