MyMedR

Displaying publications 1 - 20 of 57 in total

Abstract:

Sort:

Fulltext Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data

Chan KL, Rosli R, Tatarinova TV, Hogan M, Firdaus-Raih M, Low EL

BMC Bioinformatics, 2017 Jan 27;18(Suppl 1):1426.
PMID: 28466793 DOI: 10.1186/s12859-016-1426-6

BACKGROUND: Gene prediction is one of the most important steps in the genome annotation process. A large number of software tools and pipelines developed by various computing techniques are available for gene prediction. However, these systems have yet to accurately predict all or even most of the protein-coding regions. Furthermore, none of the currently available gene-finders has a universal Hidden Markov Model (HMM) that can perform gene prediction for all organisms equally well in an automatic fashion.
RESULTS: We present an automated gene prediction pipeline, Seqping that uses self-training HMM models and transcriptomic data. The pipeline processes the genome and transcriptome sequences of the target species using GlimmerHMM, SNAP, and AUGUSTUS pipelines, followed by MAKER2 program to combine predictions from the three tools in association with the transcriptomic evidence. Seqping generates species-specific HMMs that are able to offer unbiased gene predictions. The pipeline was evaluated using the Oryza sativa and Arabidopsis thaliana genomes. Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis showed that the pipeline was able to identify at least 95% of BUSCO's plantae dataset. Our evaluation shows that Seqping was able to generate better gene predictions compared to three HMM-based programs (MAKER2, GlimmerHMM and AUGUSTUS) using their respective available HMMs. Seqping had the highest accuracy in rice (0.5648 for CDS, 0.4468 for exon, and 0.6695 nucleotide structure) and A. thaliana (0.5808 for CDS, 0.5955 for exon, and 0.8839 nucleotide structure).
CONCLUSIONS: Seqping provides researchers a seamless pipeline to train species-specific HMMs and predict genes in newly sequenced or less-studied genomes. We conclude that the Seqping pipeline predictions are more accurate than gene predictions using the other three approaches with the default or available HMMs.

Matched MeSH terms: Genomics/methods*
Fulltext YersiniaBase: a genomic resource and analysis platform for comparative analysis of Yersinia

Tan SY, Dutta A, Jakubovics NS, Ang MY, Siow CC, Mutha NV, et al.

BMC Bioinformatics, 2015;16:9.
PMID: 25591325 DOI: 10.1186/s12859-014-0422-y

Yersinia is a Gram-negative bacteria that includes serious pathogens such as the Yersinia pestis, which causes plague, Yersinia pseudotuberculosis, Yersinia enterocolitica. The remaining species are generally considered non-pathogenic to humans, although there is evidence that at least some of these species can cause occasional infections using distinct mechanisms from the more pathogenic species. With the advances in sequencing technologies, many genomes of Yersinia have been sequenced. However, there is currently no specialized platform to hold the rapidly-growing Yersinia genomic data and to provide analysis tools particularly for comparative analyses, which are required to provide improved insights into their biology, evolution and pathogenicity.

Matched MeSH terms: Genomics/methods*
Fulltext A phylogenomic approach to bacterial subspecies classification: proof of concept in Mycobacterium abscessus

Tan JL, Khang TF, Ngeow YF, Choo SW

BMC Genomics, 2013;14:879.
PMID: 24330254 DOI: 10.1186/1471-2164-14-879

Mycobacterium abscessus is a rapidly growing mycobacterium that is often associated with human infections. The taxonomy of this species has undergone several revisions and is still being debated. In this study, we sequenced the genomes of 12 M. abscessus strains and used phylogenomic analysis to perform subspecies classification.

Matched MeSH terms: Genomics/methods*
Fulltext Computational approach to discriminate human and mouse sequences in patient-derived tumour xenografts

Callari M, Batra AS, Batra RN, Sammut SJ, Greenwood W, Clifford H, et al.

BMC Genomics, 2018 01 05;19(1):19.
PMID: 29304755 DOI: 10.1186/s12864-017-4414-y

BACKGROUND: Patient-Derived Tumour Xenografts (PDTXs) have emerged as the pre-clinical models that best represent clinical tumour diversity and intra-tumour heterogeneity. The molecular characterization of PDTXs using High-Throughput Sequencing (HTS) is essential; however, the presence of mouse stroma is challenging for HTS data analysis. Indeed, the high homology between the two genomes results in a proportion of mouse reads being mapped as human.
RESULTS: In this study we generated Whole Exome Sequencing (WES), Reduced Representation Bisulfite Sequencing (RRBS) and RNA sequencing (RNA-seq) data from samples with known mixtures of mouse and human DNA or RNA and from a cohort of human breast cancers and their derived PDTXs. We show that using an In silico Combined human-mouse Reference Genome (ICRG) for alignment discriminates between human and mouse reads with up to 99.9% accuracy and decreases the number of false positive somatic mutations caused by misalignment by >99.9%. We also derived a model to estimate the human DNA content in independent PDTX samples. For RNA-seq and RRBS data analysis, the use of the ICRG allows dissecting computationally the transcriptome and methylome of human tumour cells and mouse stroma. In a direct comparison with previously reported approaches, our method showed similar or higher accuracy while requiring significantly less computing time.
CONCLUSIONS: The computational pipeline we describe here is a valuable tool for the molecular analysis of PDTXs as well as any other mixture of DNA or RNA species.

Matched MeSH terms: Genomics/methods*
Fulltext Genomic analysis of a riboflavin-overproducing Ashbya gossypii mutant isolated by disparity mutagenesis

Kato T, Azegami J, Yokomori A, Dohra H, El Enshasy HA, Park EY

BMC Genomics, 2020 Apr 23;21(1):319.
PMID: 32326906 DOI: 10.1186/s12864-020-6709-7

BACKGROUND: Ashbya gossypii naturally overproduces riboflavin and has been utilized for industrial riboflavin production. To improve riboflavin production, various approaches have been developed. In this study, to investigate the change in metabolism of a riboflavin-overproducing mutant, namely, the W122032 strain (MT strain) that was isolated by disparity mutagenesis, genomic analysis was carried out.
RESULTS: In the genomic analysis, 33 homozygous and 1377 heterozygous mutations in the coding sequences of the genome of MT strain were detected. Among these heterozygous mutations, the proportion of mutated reads in each gene was different, ranging from 21 to 75%. These results suggest that the MT strain may contain multiple nuclei containing different mutations. We tried to isolate haploid spores from the MT strain to prove its ploidy, but this strain did not sporulate under the conditions tested. Heterozygous mutations detected in genes which are important for sporulation likely contribute to the sporulation deficiency of the MT strain. Homozygous and heterozygous mutations were found in genes encoding enzymes involved in amino acid metabolism, the TCA cycle, purine and pyrimidine nucleotide metabolism and the DNA mismatch repair system. One homozygous mutation in AgILV2 gene encoding acetohydroxyacid synthase, which is also a flavoprotein in mitochondria, was found. Gene ontology (GO) enrichment analysis showed heterozygous mutations in all 22 DNA helicase genes and genes involved in oxidation-reduction process.
CONCLUSION: This study suggests that oxidative stress and the aging of cells were involved in the riboflavin over-production in A. gossypii riboflavin over-producing mutant and provides new insights into riboflavin production in A. gossypii and the usefulness of disparity mutagenesis for the creation of new types of mutants for metabolic engineering.

Matched MeSH terms: Genomics/methods*
Fulltext A bioinformatics potpourri

Schönbach C, Li J, Ma L, Horton P, Sjaugi MF, Ranganathan S

BMC Genomics, 2018 01 19;19(Suppl 1):920.
PMID: 29363432 DOI: 10.1186/s12864-017-4326-x

The 16th International Conference on Bioinformatics (InCoB) was held at Tsinghua University, Shenzhen from September 20 to 22, 2017. The annual conference of the Asia-Pacific Bioinformatics Network featured six keynotes, two invited talks, a panel discussion on big data driven bioinformatics and precision medicine, and 66 oral presentations of accepted research articles or posters. Fifty-seven articles comprising a topic assortment of algorithms, biomolecular networks, cancer and disease informatics, drug-target interactions and drug efficacy, gene regulation and expression, imaging, immunoinformatics, metagenomics, next generation sequencing for genomics and transcriptomics, ontologies, post-translational modification, and structural bioinformatics are the subject of this editorial for the InCoB2017 supplement issues in BMC Genomics, BMC Bioinformatics, BMC Systems Biology and BMC Medical Genomics. New Delhi will be the location of InCoB2018, scheduled for September 26-28, 2018.

Matched MeSH terms: Genomics/methods*
Fulltext Association of Genomic Domains in BRCA1 and BRCA2 with Prostate Cancer Risk and Aggressiveness

Patel VL, Busch EL, Friebel TM, Cronin A, Leslie G, McGuffog L, et al.

Cancer Res, 2020 Feb 01;80(3):624-638.
PMID: 31723001 DOI: 10.1158/0008-5472.CAN-19-1840

Pathogenic sequence variants (PSV) in BRCA1 or BRCA2 (BRCA1/2) are associated with increased risk and severity of prostate cancer. We evaluated whether PSVs in BRCA1/2 were associated with risk of overall prostate cancer or high grade (Gleason 8+) prostate cancer using an international sample of 65 BRCA1 and 171 BRCA2 male PSV carriers with prostate cancer, and 3,388 BRCA1 and 2,880 BRCA2 male PSV carriers without prostate cancer. PSVs in the 3' region of BRCA2 (c.7914+) were significantly associated with elevated risk of prostate cancer compared with reference bin c.1001-c.7913 [HR = 1.78; 95% confidence interval (CI), 1.25-2.52; P = 0.001], as well as elevated risk of Gleason 8+ prostate cancer (HR = 3.11; 95% CI, 1.63-5.95; P = 0.001). c.756-c.1000 was also associated with elevated prostate cancer risk (HR = 2.83; 95% CI, 1.71-4.68; P = 0.00004) and elevated risk of Gleason 8+ prostate cancer (HR = 4.95; 95% CI, 2.12-11.54; P = 0.0002). No genotype-phenotype associations were detected for PSVs in BRCA1. These results demonstrate that specific BRCA2 PSVs may be associated with elevated risk of developing aggressive prostate cancer. SIGNIFICANCE: Aggressive prostate cancer risk in BRCA2 mutation carriers may vary according to the specific BRCA2 mutation inherited by the at-risk individual.

Matched MeSH terms: Genomics/methods*
Utility of public knowledge bases for the interpretation of comprehensive tumor molecular profiling results

Lebedeva A, Timokhin G, Ignatova E, Kavun A, Veselovsky E, Sharova M, et al.

Clin Exp Med, 2023 Oct;23(6):2663-2674.
PMID: 36752890 DOI: 10.1007/s10238-023-01011-6

With the growing use of comprehensive tumor molecular profiling (CTMP), the therapeutic landscape of cancer is rapidly evolving. NGS produces large amounts of genomic data requiring complex analysis and subsequent interpretation. We sought to determine the utility of publicly available knowledge bases (KB) for the interpretation of the cancer mutational profile in clinical practice. Analysis was performed across patients who previously underwent CTMP. Independent interpretation of the CTMP was performed manually, and then, the recommendations were compared to ones present in KBs (OncoKB, CIViC, CGI, CGA, VICC, MolecularMatch). A total of 222 CTMP reports from 222 patients with 932 genomic alterations (GA) were identified. For 368 targetable GA identified in 171 (77%) of the patients, 1381 therapy recommendations were compiled. Except for CGA, therapy ESCAT LOE I, II, IIIA and IIIB therapy options were equally represented in the majority of KB. Personalized treatment options with ESCAT LOE I-II were provided for 35 patients (16%); MolecularMatch/CIViC allowed to collect ESCAT I-II treatment options for 34 of them (97%), OncoKB/CGI-for 33 of them (94%). Employing VICC and CGA 6 (17%) and 20 (57%) of patients were left without ESCAT I or II treatment options. For 88 patients with ESCAT level III-B therapy recommendations: only 2 (2%), 3 (3%), 4 (5%) and 6 (7%) of patients were left without options with CIViC, MolecularMatch, CGI and OncoKB, and with VICC-12 (14%). Highest overlap ratio was observed for IIIA (0.81) biomarkers, with the comparable results for LOE I-II. Meanwhile, overlap ratio for ESCAT LOE IV was 0.22. Public KBs provide substantial information on ESCAT-I/R1 biomarkers, but the information on ESCAT II-IV and resistance biomarkers is underrepresented. Manual curation should be considered the gold standard for the CTMP interpretation.

Matched MeSH terms: Genomics/methods
Fulltext Pre-extinction Demographic Stability and Genomic Signatures of Adaptation in the Woolly Rhinoceros

Lord E, Dussex N, Kierczak M, Díez-Del-Molino D, Ryder OA, Stanton DWG, et al.

Curr Biol, 2020 10 05;30(19):3871-3879.e7.
PMID: 32795436 DOI: 10.1016/j.cub.2020.07.046

Ancient DNA has significantly improved our understanding of the evolution and population history of extinct megafauna. However, few studies have used complete ancient genomes to examine species responses to climate change prior to extinction. The woolly rhinoceros (Coelodonta antiquitatis) was a cold-adapted megaherbivore widely distributed across northern Eurasia during the Late Pleistocene and became extinct approximately 14 thousand years before present (ka BP). While humans and climate change have been proposed as potential causes of extinction [1-3], knowledge is limited on how the woolly rhinoceros was impacted by human arrival and climatic fluctuations [2]. Here, we use one complete nuclear genome and 14 mitogenomes to investigate the demographic history of woolly rhinoceros leading up to its extinction. Unlike other northern megafauna, the effective population size of woolly rhinoceros likely increased at 29.7 ka BP and subsequently remained stable until close to the species' extinction. Analysis of the nuclear genome from a ∼18.5-ka-old specimen did not indicate any increased inbreeding or reduced genetic diversity, suggesting that the population size remained steady for more than 13 ka following the arrival of humans [4]. The population contraction leading to extinction of the woolly rhinoceros may have thus been sudden and mostly driven by rapid warming in the Bølling-Allerød interstadial. Furthermore, we identify woolly rhinoceros-specific adaptations to arctic climate, similar to those of the woolly mammoth. This study highlights how species respond differently to climatic fluctuations and further illustrates the potential of palaeogenomics to study the evolutionary history of extinct species.

Matched MeSH terms: Genomics/methods
Allele Mining Strategies: Principles and Utilisation for Blast Resistance Genes in Rice (Oryza sativa L.)

Ashkani S, Yusop MR, Shabanimofrad M, Azady A, Ghasemzadeh A, Azizi P, et al.

Curr Issues Mol Biol, 2015;17:57-73.
PMID: 25706446

Allele mining is a promising way to dissect naturally occurring allelic variants of candidate genes with essential agronomic qualities. With the identification, isolation and characterisation of blast resistance genes in rice, it is now possible to dissect the actual allelic variants of these genes within an array of rice cultivars via allele mining. Multiple alleles from the complex locus serve as a reservoir of variation to generate functional genes. The routine sequence exchange is one of the main mechanisms of R gene evolution and development. Allele mining for resistance genes can be an important method to identify additional resistance alleles and new haplotypes along with the development of allele-specific markers for use in marker-assisted selection. Allele mining can be visualised as a vital link between effective utilisation of genetic and genomic resources in genomics-driven modern plant breeding. This review studies the actual concepts and potential of mining approaches for the discovery of alleles and their utilisation for blast resistance genes in rice. The details provided here will be important to provide the rice breeder with a worthwhile introduction to allele mining and its methodology for breakthrough discovery of fresh alleles hidden in hereditary diversity, which is vital for crop improvement.

Matched MeSH terms: Genomics/methods*
Fulltext FusoBase: an online Fusobacterium comparative genomic analysis platform

Ang MY, Heydari H, Jakubovics NS, Mahmud MI, Dutta A, Wee WY, et al.

Database (Oxford), 2014;2014.
PMID: 25149689 DOI: 10.1093/database/bau082

Fusobacterium are anaerobic gram-negative bacteria that have been associated with a wide spectrum of human infections and diseases. As the biology of Fusobacterium is still not well understood, comparative genomic analysis on members of this species will provide further insights on their taxonomy, phylogeny, pathogenicity and other information that may contribute to better management of infections and diseases. To facilitate the ongoing genomic research on Fusobacterium, a specialized database with easy-to-use analysis tools is necessary. Here we present FusoBase, an online database providing access to genome-wide annotated sequences of Fusobacterium strains as well as bioinformatics tools, to support the expanding scientific community. Using our custom-developed Pairwise Genome Comparison tool, we demonstrate how differences between two user-defined genomes and how insertion of putative prophages can be identified. In addition, Pathogenomics Profiling Tool is capable of clustering predicted genes across Fusobacterium strains and visualizing the results in the form of a heat map with dendrogram.

Matched MeSH terms: Genomics/methods*
Fulltext StaphyloBase: a specialized genomic resource for the staphylococcal research community

Heydari H, Mutha NV, Mahmud MI, Siow CC, Wee WY, Wong GJ, et al.

Database (Oxford), 2014;2014:bau010.
PMID: 24578355 DOI: 10.1093/database/bau010

With the advent of high-throughput sequencing technologies, many staphylococcal genomes have been sequenced. Comparative analysis of these strains will provide better understanding of their biology, phylogeny, virulence and taxonomy, which may contribute to better management of diseases caused by staphylococcal pathogens. We developed StaphyloBase with the goal of having a one-stop genomic resource platform for the scientific community to access, retrieve, download, browse, search, visualize and analyse the staphylococcal genomic data and annotations. We anticipate this resource platform will facilitate the analysis of staphylococcal genomic data, particularly in comparative analyses. StaphyloBase currently has a collection of 754 032 protein-coding sequences (CDSs), 19 258 rRNAs and 15 965 tRNAs from 292 genomes of different staphylococcal species. Information about these features is also included, such as putative functions, subcellular localizations and gene/protein sequences. Our web implementation supports diverse query types and the exploration of CDS- and RNA-type information in detail using an AJAX-based real-time search system. JBrowse has also been incorporated to allow rapid and seamless browsing of staphylococcal genomes. The Pairwise Genome Comparison tool is designed for comparative genomic analysis, for example, to reveal the relationships between two user-defined staphylococcal genomes. A newly designed Pathogenomics Profiling Tool (PathoProT) is also included in this platform to facilitate comparative pathogenomics analysis of staphylococcal strains. In conclusion, StaphyloBase offers access to a range of staphylococcal genomic resources as well as analysis tools for comparative analyses. Database URL: http://staphylococcus.um.edu.my/.

Matched MeSH terms: Genomics/methods*
Fulltext Evolutionary and genomic analysis of four SARS-CoV-2 isolates circulating in March 2020 in Sri Lanka; Additional evidence on multiple introduction and further transmission

Satharasinghe DA, Parakatawella PMSDK, Premarathne JMKJK, Jayasooriya LJPAP, Prathapasinghe GA, Yeap SK

Epidemiol Infect, 2021 03 16;149:e78.
PMID: 33722321 DOI: 10.1017/S0950268821000583

The molecular epidemiology of the virus and mapping helps understand the epidemics' evolution and apply quick control measures. This study provides genomic evidence of multiple severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) introductions into Sri Lanka and virus evolution during circulation. Whole-genome sequences of four SARS-CoV-2 strains obtained from coronavirus disease 2019 (COVID-19) positive patients reported in Sri Lanka during March 2020 were compared with sequences from Europe, Asia, Africa, Australia and North America. The phylogenetic analysis revealed that the sequence of the sample of the first local patient collected on 10 March, who contacted tourists from Italy, was clustered with SARS-CoV-2 strains collected from Italy, Germany, France and Mexico. Subsequently, the sequence of the isolate obtained on 19 March also clustered in the same group with the samples collected in March and April from Belgium, France, India and South Africa. The other two strains of SARS-CoV-2 were segregated from the main cluster, and the sample collected from 16 March clustered with England and the sample collected on 30 March showed the highest genetic divergence to the isolate of Wuhan, China. Here we report the first molecular epidemiological study conducted on circulating SARS-CoV-2 in Sri Lanka. The finding provides the robustness of molecular epidemiological tools and their application in tracing possible exposure in disease transmission during the pandemic.

Matched MeSH terms: Genomics/methods
Recombinant vaccines

Barnard RT

Expert Rev Vaccines, 2010 May;9(5):461-3.
PMID: 20450319 DOI: 10.1586/erv.10.48

The Recombinant Vaccines: Strategies for Candidate Discovery and Vaccine Delivery conference, organized by EuroSciCon, hosted a group of UK-based and international scientists from as far afield as Malaysia and Australia. Genomic analyses of pathogens and elucidation of mechanisms of pathogenesis has advanced candidate discovery and development of vaccines. Therefore, it was timely that this conference featured, in addition to detailed expositions of target selection and clinical trials, presentations on manufacturability, scale-up and delivery of vaccines. Ten talks were presented. This meeting report describes the key topics presented and the themes that emerged from this conference.

Matched MeSH terms: Genomics/methods
Fulltext Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm

Abdulrauf Sharifai G, Zainol Z

Genes (Basel), 2020 06 27;11(7).
PMID: 32605144 DOI: 10.3390/genes11070717

The training machine learning algorithm from an imbalanced data set is an inherently challenging task. It becomes more demanding with limited samples but with a massive number of features (high dimensionality). The high dimensional and imbalanced data set has posed severe challenges in many real-world applications, such as biomedical data sets. Numerous researchers investigated either imbalanced class or high dimensional data sets and came up with various methods. Nonetheless, few approaches reported in the literature have addressed the intersection of the high dimensional and imbalanced class problem due to their complicated interactions. Lately, feature selection has become a well-known technique that has been used to overcome this problem by selecting discriminative features that represent minority and majority class. This paper proposes a new method called Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm (rCBR-BGOA); rCBR-BGOA has employed an ensemble of multi-filters coupled with the Correlation-Based Redundancy method to select optimal feature subsets. A binary Grasshopper optimisation algorithm (BGOA) is used to construct the feature selection process as an optimisation problem to select the best (near-optimal) combination of features from the majority and minority class. The obtained results, supported by the proper statistical analysis, indicate that rCBR-BGOA can improve the classification performance for high dimensional and imbalanced datasets in terms of G-mean and the Area Under the Curve (AUC) performance metrics.

Matched MeSH terms: Genomics/methods
Fulltext Genome-Wide Novel Genic Microsatellite Marker Resource Development and Validation for Genetic Diversity and Population Structure Analysis of Banana

Biswas MK, Bagchi M, Biswas D, Harikrishna JA, Liu Y, Li C, et al.

Genes (Basel), 2020 12 09;11(12).
PMID: 33317074 DOI: 10.3390/genes11121479

Trait tagging through molecular markers is an important molecular breeding tool for crop improvement. SSR markers encoded by functionally relevant parts of a genome are well suited for this task because they may be directly related to traits. However, a limited number of these markers are known for Musa spp. Here, we report 35136 novel functionally relevant SSR markers (FRSMs). Among these, 17,561, 15,373 and 16,286 FRSMs were mapped in-silico to the genomes of Musa acuminata, M. balbisiana and M. schizocarpa, respectively. A set of 273 markers was validated using eight accessions of Musa spp., from which 259 markers (95%) produced a PCR product of the expected size and 203 (74%) were polymorphic. In-silico comparative mapping of FRSMs onto Musa and related species indicated sequence-based orthology and synteny relationships among the chromosomes of Musa and other plant species. Fifteen FRSMs were used to estimate the phylogenetic relationships among 50 banana accessions, and the results revealed that all banana accessions group into two major clusters according to their genomic background. Here, we report the first large-scale development and characterization of functionally relevant Musa SSR markers. We demonstrate their utility for germplasm characterization, genetic diversity studies, and comparative mapping in Musa spp. and other monocot species. The sequences for these novel markers are freely available via a searchable web interface called Musa Marker Database.

Matched MeSH terms: Genomics/methods
Fulltext Singapore Genome Variation Project: a haplotype map of three Southeast Asian populations

Teo YY, Sim X, Ong RT, Tan AK, Chen J, Tantoso E, et al.

Genome Res, 2009 Nov;19(11):2154-62.
PMID: 19700652 DOI: 10.1101/gr.095000.109

The Singapore Genome Variation Project (SGVP) provides a publicly available resource of 1.6 million single nucleotide polymorphisms (SNPs) genotyped in 268 individuals from the Chinese, Malay, and Indian population groups in Southeast Asia. This online database catalogs information and summaries on genotype and phased haplotype data, including allele frequencies, assessment of linkage disequilibrium (LD), and recombination rates in a format similar to the International HapMap Project. Here, we introduce this resource and describe the analysis of human genomic variation upon agglomerating data from the HapMap and the Human Genome Diversity Project, providing useful insights into the population structure of the three major population groups in Asia. In addition, this resource also surveyed across the genome for variation in regional patterns of LD between the HapMap and SGVP populations, and for signatures of positive natural selection using two well-established metrics: iHS and XP-EHH. The raw and processed genetic data, together with all population genetic summaries, are publicly available for download and browsing through a web browser modeled with the Generic Genome Browser.

Matched MeSH terms: Genomics/methods
Transcriptogenomics identification and characterization of RNA editing sites in human primary monocytes using high-depth next generation sequencing data

Leong WM, Ripen AM, Mirsafian H, Mohamad SB, Merican AF

Genomics, 2019 07;111(4):899-905.
PMID: 29885984 DOI: 10.1016/j.ygeno.2018.05.019

High-depth next generation sequencing data provide valuable insights into the number and distribution of RNA editing events. Here, we report the RNA editing events at cellular level of human primary monocyte using high-depth whole genomic and transcriptomic sequencing data. We identified over a ten thousand putative RNA editing sites and 69% of the sites were A-to-I editing sites. The sites enriched in repetitive sequences and intronic regions. High-depth sequencing datasets revealed that 90% of the canonical sites were edited at lower frequencies (<0.7). Single and multiple human monocytes and brain tissues samples were analyzed through genome sequence independent approach. The later approach was observed to identify more editing sites. Monocytes was observed to contain more C-to-U editing sites compared to brain tissues. Our results establish comparable pipeline that can address current limitations as well as demonstrate the potential for highly sensitive detection of RNA editing events in single cell type.

Matched MeSH terms: Genomics/methods*
Fulltext An expanded mammal mitogenome dataset from Southeast Asia

Mohd Salleh F, Ramos-Madrigal J, Peñaloza F, Liu S, Mikkel-Holger SS, Riddhi PP, et al.

Gigascience, 2017 08 01;6(8):1-8.
PMID: 28873965 DOI: 10.1093/gigascience/gix053

Southeast (SE) Asia is 1 of the most biodiverse regions in the world, and it holds approximately 20% of all mammal species. Despite this, the majority of SE Asia's genetic diversity is still poorly characterized. The growing interest in using environmental DNA to assess and monitor SE Asian species, in particular threatened mammals-has created the urgent need to expand the available reference database of mitochondrial barcode and complete mitogenome sequences. We have partially addressed this need by generating 72 new mitogenome sequences reconstructed from DNA isolated from a range of historical and modern tissue samples. Approximately 55 gigabases of raw sequence were generated. From this data, we assembled 72 complete mitogenome sequences, with an average depth of coverage of ×102.9 and ×55.2 for modern samples and historical samples, respectively. This dataset represents 52 species, of which 30 species had no previous mitogenome data available. The mitogenomes were geotagged to their sampling location, where known, to display a detailed geographical distribution of the species. Our new database of 52 taxa will strongly enhance the utility of environmental DNA approaches for monitoring mammals in SE Asia as it greatly increases the likelihoods that identification of metabarcoding sequencing reads can be assigned to reference sequences. This magnifies the confidence in species detections and thus allows more robust surveys and monitoring programmes of SE Asia's threatened mammal biodiversity. The extensive collections of historical samples from SE Asia in western and SE Asian museums should serve as additional valuable material to further enrich this reference database.

Matched MeSH terms: Genomics/methods
Fulltext Evolutionary and functional analysis of RBMY1 gene copy number variation on the human Y chromosome

Shi W, Louzada S, Grigorova M, Massaia A, Arciero E, Kibena L, et al.

Hum Mol Genet, 2019 Aug 15;28(16):2785-2798.
PMID: 31108506 DOI: 10.1093/hmg/ddz101

Human RBMY1 genes are located in four variable-sized clusters on the Y chromosome, expressed in male germ cells and possibly associated with sperm motility. We have re-investigated the mutational background and evolutionary history of the RBMY1 copy number distribution in worldwide samples and its relevance to sperm parameters in an Estonian cohort of idiopathic male factor infertility subjects. We estimated approximate RBMY1 copy numbers in 1218 1000 Genomes Project phase 3 males from sequencing read-depth, then chose 14 for valid ation by multicolour fibre-FISH. These fibre-FISH samples provided accurate calibration standards for the entire panel and led to detailed insights into population variation and mutational mechanisms. RBMY1 copy number worldwide ranged from 3 to 13 with a mode of 8. The two larger proximal clusters were the most variable, and additional duplications, deletions and inversions were detected. Placing the copy number estimates onto the published Y-SNP-based phylogeny of the same samples suggested a minimum of 562 mutational changes, translating to a mutation rate of 2.20 × 10-3 (95% CI 1.94 × 10-3 to 2.48 × 10-3) per father-to-son Y-transmission, higher than many short tandem repeat (Y-STRs), and showed no evidence for selection for increased or decreased copy number, but possible copy number stabilizing selection. An analysis of RBMY1 copy numbers among 376 infertility subjects failed to replicate a previously reported association with sperm motility and showed no significant effect on sperm count and concentration, serum follicle stimulating hormone (FSH), luteinizing hormone (LH) and testosterone levels or testicular and semen volume. These results provide the first in-depth insights into the structural rearrangements underlying RBMY1 copy number variation across diverse human lineages.

Matched MeSH terms: Genomics/methods

Filters

Please provide feedback to Administrator (afdal@afpm.org.my)

External Links