Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of globular domain annotations for millions of available protein sequences. Gene3D has previously featured in the Database issue of NAR and here we report a significant update to the Gene3D database. The current release, Gene3D v16, has significantly expanded its domain coverage over the previous version and now contains over 95 million domain assignments. We also report a new method for dealing with complex domain architectures that exist in Gene3D, arising from discontinuous domains. Amongst other updates, we have added visualization tools for exploring domain annotations in the context of other sequence features and in gene families. We also provide web-pages to visualize other domain families that co-occur with a given query domain family.
Due to its thermostability and high pH compatibility, subtilisin is most known for its role as an additive for detergents in which it is categorized as a serine protease according to MEROPS database. Subtilisin is typically isolated from various bacterial species of the Bacillus genus such as Bacillus subtilis, B. amyloliquefaciens, B. licheniformis, and various other organisms. It is composed of 268-275 amino acid residues and is initially secreted in the precursor form, preprosubtilisin, which is composed of 29-residues signal peptide, 77-residues propeptide, and 275-residues active subtilisin. Subtilisin is known for the presence of high and low affinity calcium binding sites in its structure. Native subtilisin has general properties of thermostability, tolerance to neutral to high pH, broad specificity, and calcium-dependent stability, which contribute to the versatility of subtilisin applicability. Through protein engineering and immobilization technologies, many variants of subtilisin have been generated, which increase the applicability of subtilisin in various industries including detergent, food processing and packaging, synthesis of inhibitory peptides, therapeutic, and waste management applications.
Burkholderia Lethal Factor 1 (BLF1) is a deamidase first characterized in Burkholderia pseudomallei. This enzyme inhibits cellular protein synthesis by deamidating a glutamine residue to a glutamic acid in its target protein, the eukaryotic translation initiation factor 4 A (eIF4A). In this work, we present the characterization of a hypothetical protein from Xanthomonas sp. Leaf131 as the first report of a BLF1 family ortholog outside of the Burkholderia genus. Although standard sequence similarity searches such as BLAST were not able to detect the homology between the Xanthomonas sp. Leaf131 hypothetical protein sequence and BLF1, our computed structure model for the Xanthomonas sp. hypothetical protein revealed structural similarities with an RMSD of 2.7 Å/164 Cα atoms and a TM-score of 0.72 when superposed. Structural comparisons of the Xanthomonas model structure against BLF1 and Escherichia coli cytotoxic necrotizing factor 1 (CNF1) revealed that the conserved signature LXGC motif and putative catalytic residues are structurally aligned thus signifying a level of functional or mechanistic similarity. Protein-protein docking analysis and molecular dynamics simulations also demonstrated that eIF4A could still be a possible target substrate for deamidation by XLF1 as it is for BLF1. We therefore propose that this Xanthomonas hypothetical protein be renamed as Xanthomonas Lethal Factor 1 (XLF1). Our work also provides further evidence of the utility of programs such as AlphaFold in bridging the computational function annotation transfer gap despite very low sequence identities of under 20%.Communicated by Ramaswamy H. Sarma.
There are still numerous protein subfamilies within families and superfamilies that do not yet have conclusive empirical experimental evidence providing a specific function. These proteins persist in databases with the annotation of a specific 'putative' function made by association with discernible features in the protein sequence.
Dehydroquinase or 3-dehydroquinate dehydratase (DHQD) reversibly cleaves 3-dehydroquinate to form 3-dehydroshikimate. Here, we describe the functional and structural features of a cold active type II 3-dehydroquinate dehydratase from the psychrophilic yeast, Glaciozyma antarctica PI12 (GaDHQD). Functional studies showed that the enzyme was active at low temperatures (10-30 °C), but displayed maximal activity at 40 °C. Yet the enzyme was stable over a wide range of temperatures (10-70 °C) and between pH 6.0-10.0 with an optimum pH of 8.0. Interestingly, the enzyme was highly thermo-tolerant, denaturing only at approximately 84 °C. Three-dimensional structure analyses showed that the G. antarctica dehydroquinase (GaDHQD) possesses psychrophilic features in comparison with its mesophilic and thermophilic counterparts such as higher numbers of non-polar residues on the surface, lower numbers of arginine and higher numbers of glycine-residues with lower numbers of hydrophobic interactions. On the other hand, GaDHQD shares some traits (i.e. total number of hydrogen bonds, number of proline residues and overall folding) with its mesophilic and thermophilic counterparts. Combined, these features contribute synergistically towards the enzyme's ability to function at both low and high temperatures.
Identification of phosphorylation sites is an important step in the function study and drug design of proteins. In recent years, there have been increasing applications of the computational method in the identification of phosphorylation sites because of its low cost and high speed. Most of the currently available methods focus on using local information around potential phosphorylation sites for prediction and do not take the global information of the protein sequence into consideration. Here, we demonstrated that the global information of protein sequences may be also critical for phosphorylation site prediction. In this paper, a new deep neural network model, called DeepPSP, was proposed for the prediction of protein phosphorylation sites. In the DeepPSP model, two parallel modules were introduced to extract both local and global features from protein sequences. Two squeeze-and-excitation blocks and one bidirectional long short-term memory block were introduced into each module to capture effective representations of the sequences. Comparative studies were carried out to evaluate the performance of DeepPSP, and four other prediction methods using public data sets The F1-score, area under receiver operating characteristic curves (AUROC), and area under precision-recall curves (AUPRC) of DeepPSP were found to be 0.4819, 0.82, and 0.50, respectively, for S/T general site prediction and 0.4206, 0.73, and 0.39, respectively, for Y general site prediction. Compared with the MusiteDeep method, the F1-score, AUROC, and AUPRC of DeepPSP were found to increase by 8.6, 2.5, and 8.7%, respectively, for S/T general site prediction and by 20.6, 5.8, and 18.2%, respectively, for Y general site prediction. Among the tested methods, the developed DeepPSP method was also found to produce best results for different kinase-specific site predictions including CDK, mitogen-activated protein kinase, CAMK, AGC, and CMGC. Taken together, the developed DeepPSP method may offer a more accurate phosphorylation site prediction by including global information. It may serve as an alternative model with better performance and interpretability for protein phosphorylation site prediction.
In this article, nine complete genomes of viruses from the genus Alphanodavirus and Betanodavirus (Family Nodaviridae) were comparatively analyzed and the data of their evolutionary origins and relatedness are reported. The nucleotide sequence alignment of the complete genomes from all species and their deduced evolutionary relationships are presented. High sequence similarity within the genus Betanodavirus compared to the genus Alphanodavirus was revealed in multiple sequence alignment of the Nodaviridae genomes. The amino acid sequence similarity for both RNA1 and RNA2 ORF is more conserved in Betanodavirus, compared to Alphanodavirus. The conserved and variable regions within the virus genome that were defined based on the multiple sequence alignments are presented in this dataset.
Neuropeptides that possess the Arg-Phe-NH2 motif at their C-termini (i.e., RFamide peptides) have been characterized in the nervous system of both invertebrates and vertebrates. In vertebrates, RFamide peptides make a family and consist of the groups of gonadotropin-inhibitory hormone (GnIH), neuropeptide FF (NPFF), prolactin-releasing peptide (PrRP), kisspeptin (kiss1 and kiss2), and pyroglutamylated RFamide peptide/26RFamide peptide (QRFP/26RFa). It now appears that these vertebrate RFamide peptides exert important neuroendocrine, behavioral, sensory, and autonomic functions. In 2000, GnIH was discovered as a novel hypothalamic RFamide peptide inhibiting gonadotropin release in quail. Subsequent studies have demonstrated that GnIH acts on the brain and pituitary to modulate reproductive physiology and behavior across vertebrates. To clarify the origin and evolution of GnIH, the existence of GnIH was investigated in agnathans, the most ancient lineage of vertebrates, and basal chordates, such as tunicates and cephalochordates (represented by amphioxus). This review first summarizes the structure and function of GnIH and other RFamide peptides, in particular NPFF having a similar C-terminal structure of GnIH, in vertebrates. Then, this review describes the evolutionary origin of GnIH based on the studies in agnathans and basal chordates.
Antibody phage display is a key tool for the development of monoclonal antibodies against various targets. However, the development of anti-peptide antibodies is a challenging process due to the small size of peptides for binding. This makes anchoring of peptides a preferred approach for panning experiments. A common approach is by using streptavidin as the anchor protein to present biotinylated peptides for panning. Here, we propose the use of recombinant expression of the target peptide and an immunogenic protein as a fusion for panning. The peptide inhibitor of trans-endothelial migration (PEPITEM) peptide sequence was fused to the Mycobacterium tuberculosis (Mtb) α-crystalline (AC) as an anchor protein. The panning process was carried out by subtractive selection of the antibody library against the AC protein first, followed by binding to the library to PEPITEM fused AC (PEPI-AC). A unique monoclonal scFv antibodies with good specificity were identified. In conclusion, the use of an alternative anchor protein to present the peptide sequence coupled with subtractive panning allows for the identification of unique monoclonal antibodies against a peptide target.
Studies on TCP1-1 ring complex (TRiC) chaperonin have shown its indispensable role in folding cytosolic proteins in eukaryotes. In a psychrophilic organism, extreme cold temperature creates a low-energy environment that potentially causes protein denaturation with loss of activity. We hypothesized that TRiC may undergo evolution in terms of its structural molecular adaptation in order to facilitate protein folding in low-energy environment. To test this hypothesis, we isolated G. antarctica TRiC (GaTRiC) and found that the expression of GaTRiC mRNA in G. antarctica was consistently expressed at all temperatures indicating their importance in cell regulation. Moreover, we showed GaTRiC has the ability of a chaperonin whereby denatured luciferase can be folded to the functional stage in its presence. Structurally, three categories of residue substitutions were found in α, β, and δ subunits: (i) bulky/polar side chains to alanine or valine, (ii) charged residues to alanine, and (iii) isoleucine to valine that would be expected to increase intramolecular flexibility within the GaTRiC. The residue substitutions observed in the built structures possibly affect the hydrophobic, hydrogen bonds, and ionic and aromatic interactions which lead to an increase in structural flexibility. Our structural and functional analysis explains some possible structural features which may contribute to cold adaptation of the psychrophilic TRiC folding chamber.
Here, we present a novel psychrophilic β-glucanase from Glaciozyma antarctica PI12 yeast that has been structurally modeled and analyzed in detail. To our knowledge, this is the first attempt to model a psychrophilic laminarinase from yeast. Because of the low sequence identity (<40%), a threading method was applied to predict a 3D structure of the enzyme using the MODELLER9v12 program. The results of a comparative study using other mesophilic, thermophilic, and hyperthermophilic laminarinases indicated several amino acid substitutions on the surface of psychrophilic laminarinase that totally increased the flexibility of its structure for efficient catalytic reactions at low temperatures. Whereas several structural factors in the overall structure can explain the weak thermal stability, this research suggests that the psychrophilic adaptation and catalytic activity at low temperatures were achieved through existence of longer loops and shorter or broken helices and strands, an increase in the number of aromatic and hydrophobic residues, a reduction in the number of hydrogen bonds and salt bridges, a higher total solvent accessible surface area, and an increase in the exposure of the hydrophobic side chains to the solvent. The results of comparative molecular dynamics simulation and principal component analysis confirmed the above strategies adopted by psychrophilic laminarinase to increase its catalytic efficiency and structural flexibility to be active at cold temperature.
Carnivorous pitcher plants produce specialised pitcher organs containing secretory glands, which secrete acidic fluids with hydrolytic enzymes for prey digestion and nutrient absorption. The content of pitcher fluids has been the focus of many fluid protein profiling studies. These studies suggest an evolutionary convergence of a conserved group of similar enzymes in diverse families of pitcher plants. A recent study showed that endogenous proteins were replenished in the pitcher fluid, which indicates a feedback mechanism in protein secretion. This poses an interesting question on the physiological effect of plant protein loss. However, there is no study to date that describes the pitcher response to endogenous protein depletion. To address this gap of knowledge, we previously performed a comparative RNA-sequencing experiment of newly opened pitchers (D0) against pitchers after 3 days of opening (D3C) and pitchers with filtered endogenous proteins (>10 kDa) upon pitcher opening (D3L). Nepenthes ampullaria was chosen as a model study species due to their abundance and unique feeding behaviour on leaf litters. The analysis of unigenes with top 1% abundance found protein translation and stress response to be overrepresented in D0, compared to cell wall related, transport, and signalling for D3L. Differentially expressed gene (DEG) analysis identified DEGs with functional enrichment in protein regulation, secondary metabolism, intracellular trafficking, secretion, and vesicular transport. The transcriptomic landscape of the pitcher dramatically shifted towards intracellular transport and defence response at the expense of energy metabolism and photosynthesis upon endogenous protein depletion. This is supported by secretome, transportome, and transcription factor analysis with RT-qPCR validation based on independent samples. This study provides the first glimpse into the molecular responses of pitchers to protein loss with implications to future cost/benefit analysis of carnivorous pitcher plant energetics and resource allocation for adaptation in stochastic environments.
The natural rubber latex extracted from the bark of Hevea brasiliensis plays various important roles in modern society. Post-translational modifications (PTMs) of the latex proteins are important for the stability and functionality of the proteins. In this study, latex proteins were acquired from the C-serum, lutoids, and rubber particle layers of latex without using prior enrichment steps; they were fragmented using collision-induced dissociation (CID), higher-energy collisional dissociation (HCD), and electron-transfer dissociation (ETD) activation methods. PEAKS 7 were used to search for unspecified PTMs, followed by analysis through PTM prediction tools to crosscheck both results. There were 73 peptides in 47 proteins from H. brasiliensis protein sequences derived from UniProtKB were identified and predicted to be post-translationally modified. The peptides with PTMs identified include phosphorylation, lysine acetylation, N-terminal acetylation, hydroxylation, and ubiquitination. Most of the PTMs discovered have yet to be reported in UniProt, which would provide great assistance in the research of the functional properties of H. brasiliensis latex proteins, as well as being useful biomarkers. The data are available via the MassIVE repository with identifier MSV000082419.
Kraits (Bungarus spp.) are highly venomous elapids that are only found in Asia. In the current study, 103 and 86 different proteins were identified from Bungarus candidus and Bungarus fasciatus venoms, respectively. These proteins were classified into 18 different venom protein families. Both venoms were found to contain a high percentage of three finger toxins, phospholipase A2 enzymes and Kunitz-type inhibitors. Smaller number of high molecular weight enzymes such as L-amino acid oxidase, hyaluronidases, and acetylcholinesterase were also detected in the venoms. We also detected some unique proteins that were not known to be present in these venoms. The presence of a natriuretic peptide, vespryn, and serine protease families was detected in B. candidus venom. We also detected the presence of subunit A and B of β-bungarotoxin and α-bungarotoxin which had not been previously found in B. fasciatus venom. Understanding the proteome composition of Malaysian krait species will provide useful information on unique toxins and proteins which are present in the venoms. This knowledge will assist in the management of krait envenoming. In addition, these proteins may have potential use as research tools or as drug-design templates.
Signal transducers and activators of transcription (STAT) proteins are key signalling molecules in metazoans, implicated in various cellular processes. Increased research in the field has resulted in the accumulation of STAT sequence and structure data, which are scattered across various public databases, missing extensive functional annotations, and prone to effort redundancy because of the dearth of community sharing. Therefore, there is a need to integrate the existing sequence, structure and functional data into a central repository, one that is enriched with annotations and provides a platform for community contributions. Herein, we present STATdb (publicly available at http://statdb.bic.nus.edu.sg/), the first integrated resource for STAT sequences comprising 1540 records representing the known STATome, enriched with existing structural and functional information from various databases and literature and including manual annotations. STATdb provides advanced features for data visualization, analysis and prediction, and community contributions. A key feature is a meta-predictor to characterise STAT sequences based on a novel classification that integrates STAT domain architecture, lineage and function. A curation policy workflow has been devised for regulated and structured community contributions, with an update policy for the seamless integration of new data and annotations.
Over the last 20 years in biotechnology, the production of recombinant proteins has been a crucial bioprocess in both biopharmaceutical and research arena in terms of human health, scientific impact and economic volume. Although logical strategies of genetic engineering have been established, protein overexpression is still an art. In particular, heterologous expression is often hindered by low level of production and frequent fail due to opaque reasons. The problem is accentuated because there is no generic solution available to enhance heterologous overexpression. For a given protein, the extent of its solubility can indicate the quality of its function. Over 30% of synthesized proteins are not soluble. In certain experimental circumstances, including temperature, expression host, etc., protein solubility is a feature eventually defined by its sequence. Until now, numerous methods based on machine learning are proposed to predict the solubility of protein merely from its amino acid sequence. In spite of the 20 years of research on the matter, no comprehensive review is available on the published methods.
A computational approach for identification and assessment of genomic sequence variability (GeneSV) is described. For a given nucleotide sequence, GeneSV collects information about the permissible nucleotide variability (changes that potentially preserve function) observed in corresponding regions in genomic sequences, and combines it with conservation/variability results from protein sequence and structure-based analyses of evaluated protein coding regions. GeneSV was used to predict effects (functional vs. non-functional) of 37 amino acid substitutions on the NS5 polymerase (RdRp) of dengue virus type 2 (DENV-2), 36 of which are not observed in any publicly available DENV-2 sequence. 32 novel mutants with single amino acid substitutions in the RdRp were generated using a DENV-2 reverse genetics system. In 81% (26 of 32) of predictions tested, GeneSV correctly predicted viability of introduced mutations. In 4 of 5 (80%) mutants with double amino acid substitutions proximal in structure to one another GeneSV was also correct in its predictions. Predictive capabilities of the developed system were illustrated on dengue RNA virus, but described in the manuscript a general approach to characterize real or theoretically possible variations in genomic and protein sequences can be applied to any organism.
Vanadium-dependent haloperoxidases belong to a class of vanadium enzymes that may have potential industrial and pharmaceutical applications due to their high stability. In this study, the 5'-flanking genomic sequence and complete reading frame encoding vanadium-dependent bromoperoxidase (GcVBPO1) was cloned from the red seaweed, Fracilaria changii, and the recombinant protein was biochemically characterized. The deduced amino acid sequence of GcVBPO1 is 1818 nucleotides in length, sharing 49% identity with the vanadium-dependent bromoperoxidases from Corralina officinalis and Cor. pilulifera, respectively. The amino acid residues associated with the binding site of vanadate cofactor were found to be conserved. The Km value of recombinant GcVBPO1 for Br(-) was 4.69 mM, while its Vmax was 10.61 μkat mg(-1) at pH 7. Substitution of Arg(379) with His(379) in the recombinant protein caused a lower affinity for Br(-), while substitution of Arg(379) with Phe(379) not only increased its affinity for Br(-) but also enabled the mutant enzyme to oxidize Cl(-). The mutant Arg(379)Phe was also found to have a lower affinity for I(-), as compared to the wild-type GcVBPO1 and mutant Arg(379)His. In addition, the Arg(379)Phe mutant has a slightly higher affinity for H2O2 compared to the wild-type GcVBPO1. Multiple cis-acting regulatory elements associated with light response, hormone signaling, and meristem expression were detected at the 5'-flanking genomic sequence of GcVBPO1. The transcript abundance of GcVBPO1 was relatively higher in seaweed samples treated with 50 parts per thousand (ppt) artificial seawater (ASW) compared to those treated in 10 and 30 ppt ASW, in support of its role in the abiotic stress response of seaweed.
Bioactive peptides, as products of hydrolysis of diverse food proteins, are the focus of current research. They exert various biological roles, one of the most crucial of which is the antioxidant activity. Reverse relationship between antioxidant intake and diseases has been approved through plenty of studies. Antioxidant activity of bioactive peptides can be attributed to their radical scavenging, inhibition of lipid peroxidation and metal ion chelation properties of peptides. It also has been proposed that peptide structure and its amino acid sequence can affect its antioxidative properties. This paper reviews bioactive peptides from food sources concerning their antioxidant activities. Additionally, specific characteristics of antioxidative bioactive peptides, enzymatic production, methods to evaluate antioxidant capacity, bioavailability, and safety concerns of peptides are reviewed.