Methods: In this study, comparative genome analysis was carried out using the G. boninense NJ3 genome to identify and characterize carbohydrate-active enzyme (CAZymes) including CWDE in the fungal genome. Augustus pipeline was employed for gene identification in G. boninense NJ3 and the produced protein sequences were analyzed via dbCAN pipeline and PhiBase 4.5 database annotation for CAZymes and plant-host interaction (PHI) gene analysis, respectively. Comparison of CAZymes from G. boninense NJ3 was made against G. lucidum, a well-studied model Ganoderma sp. and five selected pathogenic fungi for CAZymes characterization. Functional annotation of PHI genes was carried out using Web Gene Ontology Annotation Plot (WEGO) and was used for selecting candidate PHI genes related to cell wall degradation of G. boninense NJ3.
Results: G. boninense was enriched with CAZymes and CWDEs in a similar fashion to G. lucidum that corroborate with the lignocellulolytic abilities of both closely-related fungal strains. The role of polysaccharide and cell wall degrading enzymes in the hemibiotrophic mode of infection of G. boninense was investigated by analyzing the fungal CAZymes with necrotrophic Armillaria solidipes, A. mellea, biotrophic Ustilago maydis, Melampsora larici-populina and hemibiotrophic Moniliophthora perniciosa. Profiles of the selected pathogenic fungi demonstrated that necrotizing pathogens including G. boninense NJ3 exhibited an extensive set of CAZymes as compared to the more CAZymes-limited biotrophic pathogens. Following PHI analysis, several candidate genes including polygalacturonase, endo β-1,3-xylanase, β-glucanase and laccase were identified as potential CWDEs that contribute to the plant host interaction and pathogenesis.
Discussion: This study employed bioinformatics tools for providing a greater understanding of the biological mechanisms underlying the production of CAZymes in G. boninense NJ3. Identification and profiling of the fungal polysaccharide- and lignocellulosic-degrading enzymes would further facilitate in elucidating the infection mechanisms through the production of CWDEs by G. boninense. Identification of CAZymes and CWDE-related PHI genes in G. boninense would serve as the basis for functional studies of genes associated with the fungal virulence and pathogenicity using systems biology and genetic engineering approaches.
RESULTS: Using two independent gene-prediction pipelines, Fgenesh++ and Seqping, 26,059 oil palm genes with transcriptome and RefSeq support were identified from the oil palm genome. These coding regions of the genome have a characteristic broad distribution of GC3 (fraction of cytosine and guanine in the third position of a codon) with over half the GC3-rich genes (GC3 ≥ 0.75286) being intronless. In comparison, only one-seventh of the oil palm genes identified are intronless. Using comparative genomics analysis, characterization of conserved domains and active sites, and expression analysis, 42 key genes involved in FA biosynthesis in oil palm were identified. For three of them, namely EgFABF, EgFABH and EgFAD3, segmental duplication events were detected. Our analysis also identified 210 candidate resistance genes in six classes, grouped by their protein domain structures.
CONCLUSIONS: We present an accurate and comprehensive annotation of the oil palm genome, focusing on analysis of important categories of genes (GC3-rich and intronless), as well as those associated with important functions, such as FA biosynthesis and disease resistance. The study demonstrated the advantages of having an integrated approach to gene prediction and developed a computational framework for combining multiple genome annotations. These results, available in the oil palm annotation database ( http://palmxplore.mpob.gov.my ), will provide important resources for studies on the genomes of oil palm and related crops.
REVIEWERS: This article was reviewed by Alexander Kel, Igor Rogozin, and Vladimir A. Kuznetsov.
METHODS: Based on the EM transcriptomic datasets GSE7305 and GSE23339, as well as the IBD transcriptomic datasets GSE87466 and GSE126124, differential gene analysis was performed using the limma package in the R environment. Co-expressed differentially expressed genes were identified, and a protein-protein interaction (PPI) network for the differentially expressed genes was constructed using the 11.5 version of the STRING database. The MCODE tool in Cytoscape facilitated filtering out protein interaction subnetworks. Key genes in the PPI network were identified through two topological analysis algorithms (MCC and Degree) from the CytoHubba plugin. Upset was used for visualization of these key genes. The diagnostic value of gene expression levels for these key genes was assessed using the Receiver Operating Characteristic (ROC) curve and Area Under the Curve (AUC) The CIBERSORT algorithm determined the infiltration status of 22 immune cell subtypes, exploring differences between EM and IBD patients in both control and disease groups. Finally, different gene expression trends shared by EM and IBD were input into CMap to identify small molecule compounds with potential therapeutic effects.
RESULTS: 113 differentially expressed genes (DEGs) that were co-expressed in EM and IBD have been identified, comprising 28 down-regulated genes and 86 up-regulated genes. The co-expression differential gene of EM and IBD in the functional enrichment analyses focused on immune response activation, circulating immunoglobulin-mediated humoral immune response and humoral immune response. Five hub genes (SERPING1、VCAM1、CLU、C3、CD55) were identified through the Protein-protein Interaction network and MCODE.High Area Under the Curve (AUC) values of Receiver Operating Characteristic (ROC) curves for 5hub genes indicate the predictive ability for disease occurrence.These hub genes could be used as potential biomarkers for the development of EM and IBD. Furthermore, the CMap database identified a total of 9 small molecule compounds (TTNPB、CAY-10577、PD-0325901 etc.) targeting therapeutic genes for EM and IBD.
DISCUSSION: Our research revealed common pathogenic mechanisms between EM and IBD, particularly emphasizing immune regulation and cell signalling, indicating the significance of immune factors in the occurence and progression of both diseases. By elucidating shared mechanisms, our study provides novel avenues for the prevention and treatment of EM and IBD.
FINDINGS: Our high-throughput workflow minimizes these risks via a 4-step strategy: (i) technical replication with 2 PCR replicates and 2 extraction replicates; (ii) using multi-markers (12S,16S,CytB); (iii) a "twin-tagging," 2-step PCR protocol; and (iv) use of the probabilistic taxonomic assignment method PROTAX, which can account for incomplete reference databases. Because annotation errors in the reference sequences can result in taxonomic misassignment, we supply a protocol for curating sequence datasets. For some taxonomic groups and some markers, curation resulted in >50% of sequences being deleted from public reference databases, owing to (i) limited overlap between our target amplicon and reference sequences, (ii) mislabelling of reference sequences, and (iii) redundancy. Finally, we provide a bioinformatic pipeline to process amplicons and conduct PROTAX assignment and tested it on an invertebrate-derived DNA dataset from 1,532 leeches from Sabah, Malaysia. Twin-tagging allowed us to detect and exclude sequences with non-matching tags. The smallest DNA fragment (16S) amplified most frequently for all samples but was less powerful for discriminating at species rank. Using a stringent and lax acceptance criterion we found 162 (stringent) and 190 (lax) vertebrate detections of 95 (stringent) and 109 (lax) leech samples.
CONCLUSIONS: Our metabarcoding workflow should help research groups increase the robustness of their results and therefore facilitate wider use of environmental and invertebrate-derived DNA, which is turning into a valuable source of ecological and conservation information on tetrapods.