The Singapore Genome Variation Project (SGVP) provides a publicly available resource of 1.6 million single nucleotide polymorphisms (SNPs) genotyped in 268 individuals from the Chinese, Malay, and Indian population groups in Southeast Asia. This online database catalogs information and summaries on genotype and phased haplotype data, including allele frequencies, assessment of linkage disequilibrium (LD), and recombination rates in a format similar to the International HapMap Project. Here, we introduce this resource and describe the analysis of human genomic variation upon agglomerating data from the HapMap and the Human Genome Diversity Project, providing useful insights into the population structure of the three major population groups in Asia. In addition, this resource also surveyed across the genome for variation in regional patterns of LD between the HapMap and SGVP populations, and for signatures of positive natural selection using two well-established metrics: iHS and XP-EHH. The raw and processed genetic data, together with all population genetic summaries, are publicly available for download and browsing through a web browser modeled with the Generic Genome Browser.
The Cancer Genetic Markers of Susceptibility genome-wide association study (GWAS) originally identified a single nucleotide polymorphism (SNP) rs11249433 at 1p11.2 associated with breast cancer risk. To fine-map this locus, we genotyped 92 SNPs in a 900kb region (120,505,799-121,481,132) flanking rs11249433 in 45,276 breast cancer cases and 48,998 controls of European, Asian and African ancestry from 50 studies in the Breast Cancer Association Consortium. Genotyping was done using iCOGS, a custom-built array. Due to the complicated nature of the region on chr1p11.2: 120,300,000-120,505,798, that lies near the centromere and contains seven duplicated genomic segments, we restricted analyses to 429 SNPs excluding the duplicated regions (42 genotyped and 387 imputed). Per-allelic associations with breast cancer risk were estimated using logistic regression models adjusting for study and ancestry-specific principal components. The strongest association observed was with the original identified index SNP rs11249433 (minor allele frequency (MAF) 0.402; per-allele odds ratio (OR) = 1.10, 95% confidence interval (CI) 1.08-1.13, P = 1.49 x 10-21). The association for rs11249433 was limited to ER-positive breast cancers (test for heterogeneity P≤8.41 x 10-5). Additional analyses by other tumor characteristics showed stronger associations with moderately/well differentiated tumors and tumors of lobular histology. Although no significant eQTL associations were observed, in silico analyses showed that rs11249433 was located in a region that is likely a weak enhancer/promoter. Fine-mapping analysis of the 1p11.2 breast cancer susceptibility locus confirms this region to be limited to risk to cancers that are ER-positive.
Previous genome-wide association studies among women of European ancestry identified two independent breast cancer susceptibility loci represented by single nucleotide polymorphisms (SNPs) rs13281615 and rs11780156 at 8q24. A fine-mapping study across 2.06 Mb (chr8:127,561,724-129,624,067, hg19) in 55,540 breast cancer cases and 51,168 controls within the Breast Cancer Association Consortium was conducted. Three additional independent association signals in women of European ancestry, represented by rs35961416 (OR = 0.95, 95% CI = 0.93-0.97, conditional p = 5.8 × 10(-6) ), rs7815245 (OR = 0.94, 95% CI = 0.91-0.96, conditional p = 1.1 × 10(-6) ) and rs2033101 (OR = 1.05, 95% CI = 1.02-1.07, conditional p = 1.1 × 10(-4) ) were found. Integrative analysis using functional genomic data from the Roadmap Epigenomics, the Encyclopedia of DNA Elements project, the Cancer Genome Atlas and other public resources implied that SNPs rs7815245 in Signal 3, and rs1121948 in Signal 5 (in linkage disequilibrium with rs11780156, r(2) = 0.77), were putatively functional variants for two of the five independent association signals. The results highlighted multiple 8q24 variants associated with breast cancer susceptibility in women of European ancestry.
The allele frequencies for the apolipoprotein B (apo B) 5'-Ins/Del and 3'-VNTR polymorphisms varied significantly (p < 0.01) among Singaporeans of Chinese, Malay and Indian descent. We calculated the unbiased expected heterozygosities for the 5'-Ins/Del polymorphism as 0.3357, 0.1984 and 0.2418, and for the 3'-VNTR as 0.5980, 0.5260 and 0.6749, respectively, in the Chinese, Malays and Indians. Compared to heterozygosities reported for other populations, the Singaporeans differed from most Caucasians in having significantly lower values but were closely related to other non-Caucasians. Thirteen alleles, with a bimodal distribution, were observed at the 3'-VNTR polymorphic locus; the alleles occurring most frequently among the Chinese and Malays were of 35 or 53 repeats, and among the Indians, of 37 or 47 repeats. The Del allele was associated with elevated serum cholesterol (p = 0.023), LDL-cholesterol (LDL-C) (p = 0.001) in the Chinese, and apo B (p = 0.007) in the Indians. Likewise, the larger 3'-VNTR alleles (> 41 repeats) were associated with raised cholesterol (p = 0.018), LDL-C (p = 0.025), and triglyceride (p = 0.001) in the Chinese. The two polymorphisms were not in significant linkage disequilibrium (D = -0.0029, p = 0.494) in the three ethnic groups.
Angiotensin II type 1 receptor (AGTR1) has been reported to play a fibrogenic role in non-alcoholic fatty liver disease (NAFLD). In this study, five variants of the AGTR1 gene (rs3772622, rs3772627, rs3772630, rs3772633, and rs2276736) were examined for their association with susceptibility to NAFLD. Subjects made up of 144 biopsy-proven NAFLD patients and 198 controls were genotyped using TaqMan assays. The liver biopsy specimens were histologically graded and scored according to the method of Brunt. Single locus analysis in pooled subjects revealed no association between each of the five variants with susceptibility to NAFLD. In the Indian ethnic group, the rs2276736, rs3772630 and rs3772627 appear to be protective against NAFLD (p = 0.010, p = 0.016 and p = 0.026, respectively). Haplotype ACGCA is shown to be protective against NAFLD for the Indian ethnic subgroup (p = 0.03). Gene-gene interaction between the AGTR1 gene and the patatin-like phospholipase domain-containing 3 (PNPLA3) gene, which we previously reported as associated with NAFLD in this sample, showed a strong interaction between AGTR1 (rs3772627), AGTRI (rs3772630) and PNPLA3 (rs738409) polymorphisms on NAFLD susceptibility (p = 0.007). Further analysis of the NAFLD patients revealed that the G allele of the AGTR1 rs3772622 is associated with increased fibrosis score (p = 0.003). This is the first study that replicates an association between AGTR1 polymorphism and NAFLD, with further details in histological features of NAFLD. There is lack of evidence to suggest an association between any of the five variants of the AGTR1 gene and NAFLD in the Malays and Chinese. In the Indians, the rs2276736, rs3772630 and rs3772627 appear to protect against NAFLD. We report novel findings of an association between the G allele of the rs3772622 with occurrence of fibrosis and of the gene-gene interaction between AGTR1gene and the much-studied PNPLA3 gene.
Tibetan high-altitude adaptation (HAA) has been studied extensively, and many candidate genes have been reported. Subsequent efforts targeting HAA functional variants, however, have not been that successful (e.g., no functional variant has been suggested for the top candidate HAA gene, EPAS1). With WinXPCNVer, a method developed in this study, we detected in microarray data a Tibetan-enriched deletion (TED) carried by 90% of Tibetans; 50% were homozygous for the deletion, whereas only 3% carried the TED and 0% carried the homozygous deletion in 2,792 worldwide samples (p < 10(-15)). We employed long PCR and Sanger sequencing technologies to determine the exact copy number and breakpoints of the TED in 70 additional Tibetan and 182 diverse samples. The TED had identical boundaries (chr2: 46,694,276-46,697,683; hg19) and was 80 kb downstream of EPAS1. Notably, the TED was in strong linkage disequilibrium (LD; r(2) = 0.8) with EPAS1 variants associated with reduced blood concentrations of hemoglobin. It was also in complete LD with the 5-SNP motif, which was suspected to be introgressed from Denisovans, but the deletion itself was absent from the Denisovan sequence. Correspondingly, we detected that footprints of positive selection for the TED occurred 12,803 (95% confidence interval = 12,075-14,725) years ago. We further whole-genome deep sequenced (>60×) seven Tibetans and verified the TED but failed to identify any other copy-number variations with comparable patterns, giving this TED top priority for further study. We speculate that the specific patterns of the TED resulted from its own functionality in HAA of Tibetans or LD with a functional variant of EPAS1.
Amelogenin paralogs on Chromosome X (AMELX) and Y (AMELY) are commonly used sexing markers. Interstitial deletion of Yp involving the AMELY locus has previously been reported. The combined frequency of the AMELY null allele in Singapore and Malaysia populations is 2.7%, 0.6% in Indian and Malay ethnic groups respectively. It is absent among 541 Chinese screened. The null allele in this study belongs to 3 Y haplogroups; J2e1 (85.7%), F* (9.5%) and D* (4.8%). Low and high-resolution STS mapping, followed by sequence analysis of breakpoint junction confirmed a large deletion of 3 to 3.7-Mb located at the Yp11.2 region. Both breakpoints were located in TSPY repeat arrays, suggesting a non-allelic homologous recombination (NAHR) mechanism of deletion. All regional null samples shared identical breakpoint sequences according to their haplogroup affiliation, providing molecular evidence of a common ancestry origin for each haplogroup, and at least 3 independent deletion events recurred in history. The estimated ages based on Y-SNP and STR analysis were approximately 13.5 +/- 3.1 kyears and approximately 0.9 +/- 0.9 kyears for the J2e1 and F* mutations, respectively. A novel polymorphism G > A at Y-GATA-H4 locus in complete linkage disequilibrium with J2e1 null mutations is a more recent event. This work re-emphasizes the need to include other sexing markers for gender determination in certain regional populations. The frequency difference among global populations suggests it constitutes another structural variation locus of human chromosome Y. The breakpoint sequences provide further information to a better understanding of the NAHR mechanism and DNA rearrangements due to higher order genomic architecture.
Nasopharyngeal carcinoma (NPC) is a multifactorial and polygenic disease with high incidence in Asian countries. Epstein-Barr virus infection, environmental and genetic factors are believed to be involved in the tumorigenesis of NPC. The association of single nucleotide polymorphisms (SNPs) in LPLUNC1 and SPLUNC1 genes with NPC was investigated by performing a two-stage case control association study in a Malaysian Chinese population. The initial screening consisted of 81 NPC patients and 147 healthy controls while the replication study consisted of 366 NPC patients and 340 healthy controls. The combined analysis showed that a SNP (rs2752903) of SPLUNC1 was significantly associated with the risk of NPC (combined P = 0.00032, odds ratio = 1.62, 95% confidence interval = 1.25-2.11). In the subsequent dense fine mapping of SPLUNC1 locus, 36 SNPs in strong linkage disequilibrium with rs2752903 (r(2) ≥ 0.85) were associated with NPC susceptibility. Screening of these variants by electrophoretic mobility shift and luciferase reporter assays showed that rs1407019 located in intron 3 (r(2) = 0.994 with rs2752903) caused allelic difference in the binding of specificity protein 1 (Sp1) transcription factor and affected luciferase activity. This SNP may consequently alter the expression of SPLUNC1 in the epithelial cells. In summary, our study suggested that rs1407019 in intronic enhancer of SPLUNC1 is associated with NPC susceptibility in which its A allele confers an increased risk of NPC in the Malaysian Chinese population.
This study investigated the association of hepatocyte nuclear factor 4 (HNF4) alpha single nucleotide polymorphisms (SNPs) with type 2 diabetes with or without metabolic syndrome in Malaysia. Nine HNF4 alpha SNPs were genotyped in 390 type 2 diabetic subjects with metabolic syndrome, 135 type 2 diabetic subjects without metabolic syndrome, and 160 control subjects. The SNPs rs4810424, rs1884613, and rs2144908 were associated with protection against type 2 diabetes without metabolic syndrome (recessive P = 0.018, OR 0.32; P = 0.004, OR 0.25; P = 0.005, OR 0.24, respectively). The 6-SNP haplotype2 CCCGTC containing the risk genotype of these SNPs was associated with higher risk for type 2 diabetes with or without metabolic syndrome (P = 0.002, OR 2.2; P = 0.004, OR 3.1). These data suggest that HNF4 alpha SNPs and haplotypes contributed to increased type 2 diabetes risk in the Malaysian population.
Tumour necrosis factor superfamily 4 (TNFSF4) gene has been reported to be associated with systemic lupus erythematosus (SLE) susceptibility due to its encoding for OX40L protein that can increase autoantibody production and cause imbalance of T-cell proliferation. The purpose of this study was to investigate the association of TNFSF4 rs2205960, rs1234315, rs8446748 and rs704840 with SLE in the Malaysian population. A total of 476 patients with SLE and 509 healthy controls were recruited. Real-time polymerase chain reaction (PCR) was applied to genotype the selected single nucleotide polymorphisms (SNPs). Allelic and genotypic frequencies of each SNP were calculated for each ethnic group, and association test was performed using logistic regression. The overall association of each SNP in Malaysian patients with SLE was determined with meta-analysis. The frequency of minor T allele of TNFSF4 rs2205960 was significant in Chinese and Indian patients with SLE, with P values of 0.05 (OR = 1.27, 95% CI: 1.00-1.61) and 0.004 (OR = 3.16, 95% CI: 1.41-7.05), respectively. Significant association of minor G allele of rs704840 with SLE was also observed in Chinese (P = 0.03, OR = 1.26, 95% CI: 1.02-1.56). However, after Bonferroni correction, only T allele of rs2205960 remained significantly associated with Indian cohort. Overall, minor G allele of rs704840 showed significant association with SLE in the Malaysian population with P values of 0.05 (OR = 1.20, 95% CI: 1.00-1.43). We suggested TNFSF4 rs704840 could be the potential SLE risk factors in the Malaysian population.
Analysis of haplotypes based on multiple single-nucleotide polymorphisms (SNP) is becoming common for both candidate gene and fine-mapping studies. Before embarking on studies of haplotypes from genetically distinct populations, however, it is important to consider variation both in linkage disequilibrium (LD) and in haplotype frequencies within and across populations, as both vary. Such diversity will influence the choice of "tagging" SNPs for candidate gene or whole-genome association studies because some markers will not be polymorphic in all samples and some haplotypes will be poorly represented or completely absent. Here we analyze 11 genes, originally chosen as candidate genes for oral clefts, where multiple markers were genotyped on individuals from four populations. Estimated haplotype frequencies, measures of pairwise LD, and genetic diversity were computed for 135 European-Americans, 57 Chinese-Singaporeans, 45 Malay-Singaporeans, and 46 Indian-Singaporeans. Patterns of pairwise LD were compared across these four populations and haplotype frequencies were used to assess genetic variation. Although these populations are fairly similar in allele frequencies and overall patterns of LD, both haplotype frequencies and genetic diversity varied significantly across populations. Such haplotype diversity has implications for designing studies of association involving samples from genetically distinct populations.
Candidate gene and genome-wide association studies (GWAS) have identified 15 independent genomic regions associated with bladder cancer risk. In search for additional susceptibility variants, we followed up on four promising single-nucleotide polymorphisms (SNPs) that had not achieved genome-wide significance in 6911 cases and 11 814 controls (rs6104690, rs4510656, rs5003154 and rs4907479, P < 1 × 10(-6)), using additional data from existing GWAS datasets and targeted genotyping for studies that did not have GWAS data. In a combined analysis, which included data on up to 15 058 cases and 286 270 controls, two SNPs achieved genome-wide statistical significance: rs6104690 in a gene desert at 20p12.2 (P = 2.19 × 10(-11)) and rs4907479 within the MCF2L gene at 13q34 (P = 3.3 × 10(-10)). Imputation and fine-mapping analyses were performed in these two regions for a subset of 5551 bladder cancer cases and 10 242 controls. Analyses at the 13q34 region suggest a single signal marked by rs4907479. In contrast, we detected two signals in the 20p12.2 region-the first signal is marked by rs6104690, and the second signal is marked by two moderately correlated SNPs (r(2) = 0.53), rs6108803 and the previously reported rs62185668. The second 20p12.2 signal is more strongly associated with the risk of muscle-invasive (T2-T4 stage) compared with non-muscle-invasive (Ta, T1 stage) bladder cancer (case-case P ≤ 0.02 for both rs62185668 and rs6108803). Functional analyses are needed to explore the biological mechanisms underlying these novel genetic associations with risk for bladder cancer.
Noncoding repeat expansions cause various neuromuscular diseases, including myotonic dystrophies, fragile X tremor/ataxia syndrome, some spinocerebellar ataxias, amyotrophic lateral sclerosis and benign adult familial myoclonic epilepsies. Inspired by the striking similarities in the clinical and neuroimaging findings between neuronal intranuclear inclusion disease (NIID) and fragile X tremor/ataxia syndrome caused by noncoding CGG repeat expansions in FMR1, we directly searched for repeat expansion mutations and identified noncoding CGG repeat expansions in NBPF19 (NOTCH2NLC) as the causative mutations for NIID. Further prompted by the similarities in the clinical and neuroimaging findings with NIID, we identified similar noncoding CGG repeat expansions in two other diseases: oculopharyngeal myopathy with leukoencephalopathy and oculopharyngodistal myopathy, in LOC642361/NUTM2B-AS1 and LRP12, respectively. These findings expand our knowledge of the clinical spectra of diseases caused by expansions of the same repeat motif, and further highlight how directly searching for expanded repeats can help identify mutations underlying diseases.
β-Thalassemia/HbE disease has a wide spectrum of clinical phenotypes ranging from asymptomatic to dependent on regular blood transfusions. Ability to predict disease severity is helpful for clinical management and treatment decision making. A thalassemia severity score has been developed from Mediterranean β-thalassemia patients. However, different ethnic groups may have different allele frequency and linkage disequilibrium structures. Here, Thai β0-thalassemia/HbE disease genome-wild association studies (GWAS) data of 487 patients were analyzed by SNP interaction prioritization algorithm, interacting Loci (iLoci), to find predictive SNPs for disease severity. Three SNPs from two SNP interaction pairs associated with disease severity were identifies. The three-SNP disease severity risk score composed of rs766432 in BCL11A, rs9399137 in HBS1L-MYB and rs72872548 in HBE1 showed more than 85% specificity and 75% accuracy. The three-SNP predictive score was then validated in two independent cohorts of Thai and Malaysian β0-thalassemia/HbE patients with comparable specificity and accuracy. The SNP risk score could be used for prediction of clinical severity for Southeast Asia β0-thalassemia/HbE population.
This study aimed to explore the influence of SLC22A1, PXR, ABCG2, ABCB1 and CYP3A5 3 genetic polymorphisms on imatinib mesylate (IM) pharmacokinetics in Asian patients with chronic myeloid leukemia (CML).