METHOD: Whole-genome sequencing (WGS) was performed in seven early-age-onset Malay CRC patients. Potential germline genetic variants, including single-nucleotide variations and insertions and deletions (indels), were prioritized using functional and predictive algorithms.
RESULTS: An average of 3.2 million single-nucleotide variations (SNVs) and over 800 indels were identified. Three potential candidate variants in three genes-IFNE, PTCH2 and SEMA3D-which were predicted to affect protein function, were identified in three Malay CRC patients. In addition, 19 candidate genes-ANKDD1B, CENPM, CLDN5, MAGEB16, MAP3K14, MOB3C, MS4A12, MUC19, OR2L8, OR51Q1, OR51AR1, PDE4DIP, PKD1L3, PRIM2, PRM3, SEC22B, TPTE, USP29 and ZNF117-harbouring nonsense variants were prioritised. These genes are suggested to play a role in cancer predisposition and to be associated with cancer risk. Pathway enrichment analysis indicated significant enrichment in the olfactory signalling pathway.
CONCLUSION: This study provides a new spectrum of insights into the potential genes, variants and pathways associated with CRC in Malay patients.
RESULTS: We analyzed 1451 extant genomes, 189 AAs from India and Malaysia, and 43 ancient genomes from S&SEA. Population structure analysis reveals neither language nor geography appropriately correlates with genetic diversity. The inconsistency between "language and genetics" or "geography and genetics" can largely be attributed to ancient admixture with East Asian populations. We estimated a pre-Neolithic origin of AA language speakers, with shared ancestry between Indian and Malaysian populations until about 470 generations ago, contesting the existing model of Neolithic expansion of the AA culture. We observed a spatio-temporal transition in the genetic ancestry of SEA with genetic contribution from East Asia significantly increasing in the post-Neolithic period.
CONCLUSION: Our study shows that contrary to assumptions in many previous studies and despite having linguistic commonality, Indian AAs have a distinct genomic structure compared to Malaysian AAs. This linguistic-genetic discordance is reflective of the complex history of population migration and admixture shaping the genomic landscape of S&SEA. We postulate that pre-Neolithic ancestors of today's AAs were widespread in S&SEA, and the fragmentation and dissipation of the population have largely been a resultant of multiple migrations of East Asian farmers during the Neolithic period. It also highlights the resilience of AAs in continuing to speak their language in spite of checkered population distribution and possible dominance from other linguistic groups.
RESULTS: We identified a total of 644,225 SNPs in 131 neuropeptide genes in 6 worldwide population groups from a public database. Of these, 5163 SNPs that had ΔDAF |(African - non-African)| ≥ 0.20 were identified and fully annotated. A total of 20 outlier SNPs that included 19 missense SNPs with a moderate impact and one stop lost SNP with high impact, were identified in 16 neuropeptide genes. Our results indicate that an overall strong population differentiation was observed in the non-African populations that had a higher derived allele frequency for 15/20 of those SNPs. Highly differentiated SNPs in four genes were particularly striking: NPPA (rs5065) with high impact stop lost variant; CHGB (rs6085324, rs236150, rs236152, rs742710 and rs742711) with multiple moderate impact missense variants; IGF2 (rs10770125) and INS (rs3842753) with moderate impact missense variants that are in linkage disequilibrium. Phenotype and disease associations of these differentiated SNPs indicated their association with hypertension and diabetes and highlighted the pleiotropic effects of these neuropeptides and their role in maintaining physiological homeostasis in humans.
CONCLUSIONS: We compiled a list of 131 human neuropeptide genes from multiple databases and literature survey. We detect significant population differentiation in the derived allele frequencies of variants in several neuropeptide genes in African and non-African populations. The results highlights SNPs in these genes that may also contribute to population disparities in prevalence of diseases such as hypertension and diabetes.
RESULTS: We analyzed the whole-genome deep sequencing data (~ 30×) of five native trios from Peninsular Malaysia and North Borneo, and characterized the genomic variants, including single nucleotide variants (SNVs), small insertions and deletions (indels) and copy number variants (CNVs). We discovered approximately 6.9 million SNVs, 1.2 million indels, and 9000 CNVs in the 15 samples, of which 2.7% SNVs, 2.3% indels and 22% CNVs were novel, implying the insufficient coverage of population diversity in existing databases. We identified a higher proportion of novel variants in the Orang Asli (OA) samples, i.e., the indigenous people from Peninsular Malaysia, than that of the North Bornean (NB) samples, likely due to more complex demographic history and long-time isolation of the OA groups. We used the pedigree information to identify de novo variants and estimated the autosomal mutation rates to be 0.81 × 10- 8 - 1.33 × 10- 8, 1.0 × 10- 9 - 2.9 × 10- 9, and ~ 0.001 per site per generation for SNVs, indels, and CNVs, respectively. The trio-genomes also allowed for haplotype phasing with high accuracy, which serves as references to the future genomic studies of OA and NB populations. In addition, high-frequency inherited CNVs specific to OA or NB were identified. One example is a 50-kb duplication in DEFA1B detected only in the Negrito trios, implying plausible effects on host defense against the exposure of diverse microbial in tropical rainforest environment of these hunter-gatherers. The CNVs shared between OA and NB groups were much fewer than those specific to each group. Nevertheless, we identified a 142-kb duplication in AMY1A in all the 15 samples, and this gene is associated with the high-starch diet. Moreover, novel insertions shared with archaic hominids were identified in our samples.
CONCLUSION: Our study presents a full catalogue of the genome variants of the native Malaysian populations, which is a complement of the genome diversity in Southeast Asians. It implies specific population history of the native inhabitants, and demonstrated the necessity of more genome sequencing efforts on the multi-ethnic native groups of Malaysia and Southeast Asia.
RESULTS: We propose a succinct representation of the distance matrices which tremendously reduces the space requirement. We give a complete solution, called SuperRec, for the inference of chromosomal structures from Hi-C data, through iterative solving the large-scale weighted multidimensional scaling problem.
CONCLUSIONS: SuperRec runs faster than earlier systems without compromising on result accuracy. The SuperRec package can be obtained from http://www.cs.cityu.edu.hk/~shuaicli/SuperRec .