17 Y-STRs (DYS456, DYS389I, DYS390, DYS389II, DYS458, DYS19, DYS385a/b, DYS393, DYS391, DYS439, DYS635 or Y-GATA C4, DYS392, Y-GATA H4, DYS437, DYS438 and DYS448) have been analyzed in 320 male individuals from Sarawak, an eastern state of Malaysia on the Borneo island using the AmpFlSTR Y-filer (Applied Biosystems, Foster City, CA). These individuals were from three indigenous ethnic groups in Sarawak comprising of 103 Ibans, 113 Bidayuhs and 104 Melanaus. The observed 17-loci haplotypes and the individual allele frequencies for each locus were estimated, whilst the locus diversity, haplotype diversity and discrimination capacity were calculated in the three groups. Analysis of molecular variance (AMOVA) indicated that 87.6% of the haplotypic variation was found within population and 12.4% between populations (fixation index F(ST)=0.124, p=0.000). This study has revealed that the indigenous populations in Sarawak are distinctly different to each other, and to the three major ethnic groups in Malaysia (Malays, Chinese and Indians), with the Melanaus having a strikingly high degree of shared haplotypes within. There are rare unusual variants and microvariants that were not present in Malaysian Malay, Chinese or Indian groups. In addition, occurrences of DYS385 duplications which were only noticeably present in Chinese group previously was also observed in the Iban group whilst null alleles were detected at several Y-loci (namely DYS19, DYS392, DYS389II and DYS448) in the Iban and Melanau groups.
Illegal logging and smuggling of Gonystylus bancanus (Thymelaeaceae) poses a serious threat to this fragile valuable peat swamp timber species. Using G. bancanus as a case study, DNA markers were used to develop identification databases at the species, population and individual level. The species level database for Gonystylus comprised of an rDNA (ITS2) and two cpDNA (trnH-psbA and trnL) markers based on a 20 Gonystylus species database. When concatenated, taxonomic species recognition was achieved with a resolution of 90% (18 out of the 20 species). In addition, based on 17 natural populations of G. bancanus throughout West (Peninsular Malaysia) and East (Sabah and Sarawak) Malaysia, population and individual identification databases were developed using cpDNA and STR markers respectively. A haplotype distribution map for Malaysia was generated using six cpDNA markers, resulting in 12 unique multilocus haplotypes, from 24 informative intraspecific variable sites. These unique haplotypes suggest a clear genetic structuring of West and East regions. A simulation procedure based on the composition of the samples was used to test whether a suspected sample conformed to a given regional origin. Overall, the observed type I and II errors of the databases showed good concordance with the predicted 5% threshold which indicates that the databases were useful in revealing provenance and establishing conformity of samples from West and East Malaysia. Sixteen STRs were used to develop the DNA profiling databases for individual identification. Bayesian clustering analyses divided the 17 populations into two main genetic clusters, corresponding to the regions of West and East Malaysia. Population substructuring (K=2) was observed within each region. After removal of bias resulting from sampling effects and population subdivision, conservativeness tests showed that the West and East Malaysia databases were conservative. This suggests that both databases can be used independently for random match probability estimation within respective regions. The reliability of the databases was further determined by independent self-assignment tests based on the likelihood of each individual's multilocus genotype occurring in each identified population, genetic cluster and region with an average percentage of correctly assigned individuals of 54.80%, 99.60% and 100% respectively. Thus, after appropriate validation, the genetic identification databases developed for G. bancanus in this study could support forensic applications and help safeguard this valuable species into the future.
The MiSeq FGx Forensic Genomics System (Illumina) enables amplification and massively parallel sequencing of 59 STRs, 94 identity informative SNPs, 54 ancestry informative SNPs, and 24 phenotypic informative SNPs. Allele frequency and population statistics data were generated for the 172 SNP loci included in this panel on four major population groups (Chinese, African Americans, US Caucasians, and Southwest Hispanics). Single-locus and combined random match probability values were generated for the identity informative SNPs. The average combined STR and identity informative SNP random match probabilities (assuming independence) across all four populations were 1.75E-67 and 2.30E-71 with length-based and sequence-based STR alleles, respectively. Ancestry and phenotype predictions were obtained using the ForenSeq™ Universal Analysis System (UAS; Illumina) based on the ancestry informative and phenotype informative SNP profiles generated for each sample. Additionally, performance metrics, including profile completeness, read depth, relative locus performance, and allele coverage ratios, were evaluated and detailed for the 725 samples included in this study. While some genetic markers included in this panel performed notably better than others, performance across populations was generally consistent. The performance and population data included in this study support that accurate and reliable profiles were generated and provide valuable background information for laboratories considering internal validation studies and implementation.
Malay, the main ethnic group in Peninsular Malaysia, is represented by various sub-ethnic groups such as Melayu Banjar, Melayu Bugis, Melayu Champa, Melayu Java, Melayu Kedah Melayu Kelantan, Melayu Minang and Melayu Patani. Using data retrieved from the MyHVP (Malaysian Human Variome Project) database, a total of 135 individuals from these sub-ethnic groups were profiled using the Affymetrix GeneChip Mapping Xba 50-K single nucleotide polymorphism (SNP) array to identify SNPs that were ancestry-informative markers (AIMs) for Malays of Peninsular Malaysia. Prior to selecting the AIMs, the genetic structure of Malays was explored with reference to 11 other populations obtained from the Pan-Asian SNP Consortium database using principal component analysis (PCA) and ADMIXTURE. Iterative pruning principal component analysis (ipPCA) was further used to identify sub-groups of Malays. Subsequently, we constructed an AIMs panel for Malays using the informativeness for assignment (In) of genetic markers, and the K-nearest neighbor classifier (KNN) was used to teach the classification models. A model of 250 SNPs ranked by In, correctly classified Malay individuals with an accuracy of up to 90%. The identified panel of SNPs could be utilized as a panel of AIMs to ascertain the specific ancestry of Malays, which may be useful in disease association studies, biomedical research or forensic investigation purposes.
Massively parallel sequencing (MPS) can identify sequence variation within short tandem repeat (STR) alleles as well as their nominal allele lengths that traditionally have been obtained by capillary electrophoresis. Using the MiSeq FGx Forensic Genomics System (Illumina), STRait Razor, and in-house excel workbooks, genetic variation was characterized within STR repeat and flanking regions of 27 autosomal, 7 X-chromosome and 24 Y-chromosome STR markers in 777 unrelated individuals from four population groups. Seven hundred and forty six autosomal, 227 X-chromosome, and 324 Y-chromosome STR alleles were identified by sequence compared with 357 autosomal, 107 X-chromosome, and 189 Y-chromosome STR alleles that were identified by length. Within the observed sequence variation, 227 autosomal, 156 X-chromosome, and 112 Y-chromosome novel alleles were identified and described. One hundred and seventy six autosomal, 123 X-chromosome, and 93 Y-chromosome sequence variants resided within STR repeat regions, and 86 autosomal, 39 X-chromosome, and 20 Y-chromosome variants were located in STR flanking regions. Three markers, D18S51, DXS10135, and DYS385a-b had 1, 4, and 1 alleles, respectively, which contained both a novel repeat region variant and a flanking sequence variant in the same nucleotide sequence. There were 50 markers that demonstrated a relative increase in diversity with the variant sequence alleles compared with those of traditional nominal length alleles. These population data illustrate the genetic variation that exists in the commonly used STR markers in the selected population samples and provide allele frequencies for statistical calculations related to STR profiling with MPS data.
The use of single nucleotide polymorphisms (SNPs) in forensic genetics has been limited to challenged samples with low template and/or degraded DNA. The recent introduction of massively parallel sequencing (MPS) technologies has expanded the potential applications of these markers and increased the discrimination power of well-established loci by considering variation in the flanking regions of target loci. The ForenSeq Signature Preparation Kit contains 165 SNP amplicons for ancestry- (aiSNPs), identity- (iiSNPs), and phenotype-inference (piSNPs). In this study, 714 individuals from four major populations (African American, AFA; East Asian, ASN; US Caucasian, CAU; and Southwest US Hispanic, HIS) previously reported by Churchill et al. [Forensic Sci Int Genet. 30 (2017) 81-92; DOI: https://doi.org/10.1016/j.fsigen.2017.06.004] were assessed using STRait Razor v2s to determine the level of diversity in the flanking regions of these amplicons. The results show that nearly 70% of loci showed some level of flanking region variation with 22 iiSNPs and 8 aiSNPs categorized as microhaplotypes in this study. The heterozygosities of these microhaplotypes approached, and in one instance surpassed, those of some core STR loci. Also, the impact of the flanking region on other forensic parameters (e.g., power of exclusion and power of discrimination) was examined. Sixteen of the 94 iiSNPs had an effective allele number greater than 2.00 across the four populations. To assess what effect the flanking region information had on the ancestry inference, genotype probabilities and likelihood ratios were determined. Additionally, concordance with the ForenSeq UAS and Nextera Rapid Capture was evaluated, and patterns of heterozygote imbalance were identified. Pairwise comparison of the iiSNP diplotypes determined the probability of detecting a mixture (i.e., observing ≥ 3 haplotypes) using these loci alone was 0.9952. The improvement in random match probabilities for the full regions over the target iiSNPs was found to be significant. When combining the iiSNPs with the autosomal STRs, the combined match probabilities ranged from 6.40 × 10-73 (ASN) to 1.02 × 10-79 (AFA).
The emergence of Massively Parallel Sequencing technologies enabled the analysis of full mitochondrial (mt)DNA sequences from forensically relevant samples that have, so far, only been typed in the control region or its hypervariable segments. In this study, we evaluated the performance of a commercially available multiplex-PCR-based assay, the Precision ID mtDNA Whole Genome Panel (Thermo Fisher Scientific), for the amplification and sequencing of the entire mitochondrial genome (mitogenome) from even degraded forensic specimens. For this purpose, more than 500 samples from 24 different populations were selected to cover the vast majority of established superhaplogroups. These are known to harbor different signature sequence motifs corresponding to their phylogenetic background that could have an effect on primer binding and, thus, could limit a broad application of this molecular genetic tool. The selected samples derived from various forensically relevant tissue sources and were DNA extracted using different methods. We evaluated sequence concordance and heteroplasmy detection and compared the findings to conventional Sanger sequencing as well as an orthogonal MPS platform. We discuss advantages and limitations of this approach with respect to forensic genetic workflow and analytical requirements.
The illegal ivory trade continues to drive elephant poaching. Large ivory seizures in Africa and Asia are still commonplace. Wildlife forensics is recognised as a key enforcement tool to combat this trade. However, the time and resources required to effectively test large ivory seizures is often prohibitive. This limits or delays testing, which may impede investigations and/or prosecutions. Typically, DNA analysis of an ivory seizure involves pairing and sorting the tusks, sampling the tusks, powdering the sample, decalcification, then DNA extraction. Here, we optimize the most time-consuming components of this process: sampling and decalcification. Firstly, using simulations, we demonstrate that tusks do not need to be paired to ensure an adequate number of unique elephants are sampled in a large seizure. Secondly, we determined that directly powdering the ivory using a Dremel drill with a high-speed cutter bit, instead of cutting the ivory with a circular saw and subsequently powdering the sample in liquid nitrogen with a freezer mill, produces comparable results. Finally, we optimized a rapid 2 -h decalcification protocol that produces comparable results to a standard 3-day protocol. We tested/optimised the protocols on 33 raw and worked ivory samples, and demonstrated their utility on a case study, successfully identifying 94% of samples taken from 123 tusks. Using these new rapid protocols, the entire sampling and DNA extraction process takes less than one day and requires less-expensive equipment. We expect that the implementation of these rapid protocols will promote more consistent and timely testing of ivory seizures suitable for enforcement action.
To inform product users about the origin of timber, the implementation of a traceability system is necessary for the forestry industry. In this study, we developed a comprehensive genetic database for the important tropical timber species Merbau, Intsia palembanica, to trace its geographic origin within peninsular Malaysia. A total of 1373 individual trees representing 39 geographically distinct populations of I. palembanica were sampled throughout peninsular Malaysia. We analyzed the samples using a combination of four chloroplast DNA (cpDNA) markers and 14 short tandem repeat (STR) markers to establish both cpDNA haplotype and STR allele frequency databases. A haplotype map was generated through cpDNA sequencing for population identification, resulting in six unique haplotypes based on 10 informative intraspecifically variable sites. Subsequently, an STR allele frequency database was developed from 14 STRs allowing individual identification. Bayesian cluster analysis divided the individuals into two genetic clusters corresponding to the northern and southern regions of peninsular Malaysia. Tests of conservativeness showed that the databases were conservative after the adjustment of the θ values to 0.2000 and 0.2900 for the northern (f = 0.0163) and southern (f = 0.0285) regions, respectively. Using self-assignment tests, we observed that individuals were correctly assigned to populations at rates of 40.54-94.12% and to the identified regions at rates of 79.80-80.62%. Both the cpDNA and STR markers appear to be useful for tracking Merbau timber originating from peninsular Malaysia. The use of these forensic tools in addition to the existing paper-based timber tracking system will help to verify the legality of the origin of I. palembanica and to combat illegal logging issues associated with the species.