The race for the discovery of enhancers at a genome-wide scale has been on since the commencement of next generation sequencing decades after the discovery of the first enhancer, SV40. A few enhancer-predicting features such as chromatin feature, histone modifications and sequence feature had been implemented with varying success rates. However, to date, there is no consensus yet on the single enhancer marker that can be employed to ultimately distinguish and uncover enhancers from the enormous genomic regions. Many supervised, unsupervised and semi-supervised computational approaches had emerged to complement and facilitate experimental approaches in enhancer discovery. In this review, we placed our focus on the recently emerged enhancer predictor tools that work on general enhancer features such as sequences, chromatin states and histone modifications, eRNA and of multiple feature approach. Comparisons of their prediction methods and outcomes were done across their functionally similar counterparts. We provide some recommendations and insights for future development of more comprehensive and robust tools.
The Javan mahseer (Tor tambra) is one of the most valuable freshwater fish found in Tor species. To date, other than mitogenomic data (BioProject: PRJNA422829), genomic and transcriptomic resources for this species are still lacking which is crucial to understand the molecular mechanisms associated with important traits such as growth, immune response, reproduction and sex determination. For the first time, we sequenced the transcriptome from a whole juvenile fish using Illumina NovaSEQ6000 generating raw paired-end reads. De novo transcriptome assembly generated a draft transcriptome (BUSCO5 completeness of 91.2% [Actinopterygii_odb10 database]) consisting of 259,403 putative transcripts with a total and N50 length of 333,881,215 bp and 2283 bp, respectively. A total count of 77,503 non-redundant protein coding sequences were predicted from the transcripts and used for functional annotation. We mapped the predicted proteins to 304 known KEGG pathways with signal transduction cluster having the highest representation followed by immune system and endocrine system. In addition, transcripts exhibiting significant similarity to previously published growth-and immune-related genes were identified which will facilitate future molecular breeding of Tor tambra.
The true mahseer (Tor spp.) is one of the highest valued fish in the world due to its high nutritional value and great unique taste. Nevertheless, its morphological characterization and single mitochondrial gene phylogeny in the past had yet to resolve the ambiguity in its taxonomical classification. In this study, we sequenced and assembled 11 complete mahseer mitogenomes collected from Java of Indonesia, Pahang and Terengganu of Peninsular Malaysia as well as Sarawak of East Malaysia. The mitogenome evolutionary relationships among closely related Tor spp. samples were investigated based on maximum likelihood phylogenetic tree construction. Compared to the commonly used COX1 gene fragment, the complete COX1, Cytb, ND2, ND4 and ND5 genes appear to be better phylogenetic markers for genetic differentiation at the population level. In addition, a total of six population-specific mitolineage haplotypes were identified among the mahseer samples analyzed, which this offers hints towards its taxonomical landscape.
The Trigonopoma pauciperforatum or the redstripe rasbora is a cyprinid commonly found in marshes and swampy areas with slight acidic tannin-stained water in the tropics. In this study, the complete mitogenome sequence of T. pauciperforatum was first amplified in two parts using two pairs of overlapping primers and then sequenced. The size of the mitogenome is 16,707 bp, encompassing 22 transfer RNA genes, 13 protein-coding genes, two ribosomal RNA genes and a putative control region. Identical gene organisation was detected between this species and other family members. The heavy strand accommodates 28 genes while the light strand houses the remaining nine genes. Most protein-coding genes utilise ATG as start codon except for COI gene which uses GTG instead. The terminal associated sequence (TAS), central conserved sequence block (CSB-F, CSB-D and CSB-E) as well as variable sequence block (CSB-1, CSB-2 and CSB-3) are conserved in the control region. The maximum likelihood phylogenetic tree revealed the divergence of T. pauciperforatum from the basal region of the major clade, where its evolutionary relationships with Boraras maculatus, Rasbora cephalotaenia and R. daniconius are poorly resolved as suggested by the low bootstrap values. This work contributes towards the genetic resource enrichment for peat swamp conservation and comprehensive in-depth comparisons across other phylogenetic researches done on the Rasbora-related genus.
The sago palm (Metroxylon sagu Rottboll) is a tropical halophytic starch-producing, economically important crop palm mainly located in Southeast Asian countries. Recently, a genome survey was conducted on this palm using the Illumina sequencing platform, with a very low (21.5%) BUSCO genome completeness score, and most of them (∼78%) are either fragmented or missing. Thus, in this study, the sago palm genome completeness was further improved with the utilization of the Nanopore sequencing platform that produced longer reads. A hybrid genome assembly was conducted, and the outcome was a much complete sago palm genome with BUSCO completeness achieved at as high as 97.9%, with only ∼2% of them either fragmented or missing. The estimated genome size of the sago palm is 509,812,790 bp in this study. A sum of 33,242 protein-coding genes was revealed from the sago palm genome and around 96.39% of them had been functionally annotated. An investigation on the carbohydrate metabolism KEGG pathways also unearthed that starch synthesis was one of the major sago palm activities. The genome data obtained from this work is indispensable for future molecular evolutionary and genome-wide association studies on the economically important sago palm.
The Blueline Rasbora (Rasbora sarawakensis) is a small ray-finned fish categorized under the genus Rasbora in the Cyprinidae family. In this study, the complete mitogenome sequence of R. sarawakensis was sequenced using four primers targeting overlapping regions. The mitogenome is 16,709 bp in size, accommodating 22 transfer RNA genes, 13 protein-coding genes, two ribosomal RNA genes and a putative control region. Identical gene organisation was detected between this species and other genus counterparts. The heavy strand houses 28 genes while the light strand stores the other nine genes. Most protein-coding genes employ ATG as start codon, excluding COI gene, which utilizes GTG instead. The central conserved sequence blocks (CSB-F, CSB-E and CSB-D), variable sequence blocks (CSB-3, CSB-2 and CSB-1) as well as the terminal associated sequence (TAS) are conserved in the control region. The maximum likelihood phylogenetic tree revealed the divergence of R. sarawakensis from the basal region of the Rasbora clade, where its evolutionary relationships with R. maculatus and R. pauciperforata are poorly resolved as indicated by the low bootstrap values. This work acts as steppingstone towards further molecular evolution and population genetics studies of Rasbora genus in future.
Shorea macrophylla belongs to the Shorea genus under the Dipterocarpaceae family. It is a woody tree that grows in the rainforest in Southeast Asia. The complete chloroplast (cp) genome sequence of S. macrophylla is reported here. The genomic size of S. macrophylla is 150,778 bp and it possesses a circular structure with conserved constitute regions of large single copy (LSC, 83,681 bp) and small single copy (SSC, 19,813 bp) regions, as well as a pair of inverted repeats with a length of 23,642 bp. It has 112 unique genes, including 78 protein-coding genes, 30 tRNA genes, and four rRNA genes. The genome exhibits a similar GC content, gene order, structure, and codon usage when compared to previously reported chloroplast genomes from other plant species. The chloroplast genome of S. macrophylla contained 262 SSRs, the most prevalent of which was A/T, followed by AAT/ATT. Furthermore, the sequences contain 43 long repeat sequences, practically most of them are forward or palindrome type long repeats. The genome structure of S. macrophylla was compared to the genomic structures of closely related species from the same family, and eight mutational hotspots were discovered. The phylogenetic analysis demonstrated a close relationship between Shorea and Parashorea species, indicating that Shorea is not monophyletic. The complete chloroplast genome sequence analysis of S. macrophylla reported in this paper will contribute to further studies in molecular identification, genetic diversity, and phylogenetic research.