Affiliations 

  • 1 Research Institute for Systems Biology and Medicine, Nauchnyi proezd, 18, Moscow, 117246, Russia
  • 2 Federal Research and Clinical Center of Physical-Chemical Medicine, Federal Medical Biological Agency, Malaya Pirogovskaya st., 1a, Moscow, Russia
  • 3 Federal Research and Clinical Center of Physical-Chemical Medicine, Federal Medical Biological Agency, Pirogovskaya st., 1a, Moscow, Russia
Bioinformatics, 2023 Nov 20.
PMID: 37982752 DOI: 10.1093/bioinformatics/btad702

Abstract

MOTIVATION: The Oxford Nanopore technology has a great potential for the analysis of methylated motifs in genomes, including whole genome methylome profiling. However, we found that there are no methylation motifs detection algorithms which would be sensitive enough and return deterministic results. Thus, the MEME suit does not extract all H. pylori methylation sites de novo even using the iterative manually controlled approach implemented in the most up-to-date methylation analysis tool Nanodisco.

RESULTS: We present Snapper, a new highly-sensitive approach to extract methylation motif sequences based on a greedy motif selection algorithm. Snapper does not require manual control during the enrichment process and has enrichment sensitivity higher than MEME coupled with Tombo or Nanodisco instruments that was demonstrated on H. pylori strain J99 studied earlier by the PacBio technology and on four external datasets representing different bacterial species. We used Snapper to characterize the total methylome of a new H.pylori strain A45. At least four methylation sites that have not been described for H. pylori earlier were revealed. We experimentally confirmed the presence of a new CCAG-specific methyltransferase and inferred a gene encoding a new CCAAK-specific methyltransferase.

AVAILABILITY: Snapper is implemented using Python and freely available as a pip package named 'snapper-ont'. Also, Snapper and the demo dataset are available in Zenodo (10.5281/zenodo.10117651).

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.