The structural comparison of proteins is a vital step in structural biology that is used to predict and analyse a new unknown protein function. Although a number of different techniques have been explored, the study to develop new alternative methods is still an active research area. The present paper introduces a text modelling-based technique for the structural comparison of proteins. The method models the secondary and tertiary structure of proteins in two linear sequences and then applies them to the comparison of two structures. The technique used for pairwise comparison of the sequences has been adopted from computational linguistics and its well-known techniques for analysing and quantifying textual sequences. To this end, an n-gram modelling technique is used to capture regularities between sequences, and then, the cross-entropy concept is employed to measure their similarities. Several experiments are conducted to evaluate the performance of the method and compare it with other commonly used programs. The assessments for information retrieval evaluation demonstrate that the technique has a high running speed, which is similar to other linear encoding methods, such as 3D-BLAST, SARST, and TS-AMIR, whereas its accuracy is comparable to CE and TM-align, which are high accuracy comparison tools. Accordingly, the results demonstrate that the algorithm has high efficiency compared with other state-of-the-art methods.
Remote protein homology detection and fold recognition refer to detection of structural homology in proteins where there are small or no similarities in the sequence. To detect protein structural classes from protein primary sequence information, homology-based methods have been developed, which can be divided to three types: discriminative classifiers, generative models for protein families and pairwise sequence comparisons. Support Vector Machines (SVM) and Neural Networks (NN) are two popular discriminative methods. Recent studies have shown that SVM has fast speed during training, more accurate and efficient compared to NN. We present a comprehensive method based on two-layer classifiers. The 1st layer is used to detect up to superfamily and family in SCOP hierarchy using optimized binary SVM classification rules. It used the kernel function known as the Bio-kernel, which incorporates the biological information in the classification process. The 2nd layer uses discriminative SVM algorithm with string kernel that will detect up to protein fold level in SCOP hierarchy. The results obtained were evaluated using mean ROC and mean MRFP and the significance of the result produced with pairwise t-test was tested. Experimental results show that our approaches significantly improve the performance of remote protein homology detection and fold recognition for all three different version SCOP datasets (1.53, 1.67 and 1.73). We achieved 4.19% improvements in term of mean ROC in SCOP 1.53, 4.75% in SCOP 1.67 and 4.03% in SCOP 1.73 datasets when compared to the result produced by well-known methods. The combination of first layer and second layer of BioSVM-2L performs well in remote homology detection and fold recognition even in three different versions of datasets.
This study concerns an attempt to establish a new method for predicting antimicrobial peptides (AMPs) which are important to the immune system. Recently, researchers are interested in designing alternative drugs based on AMPs because they have found that a large number of bacterial strains have become resistant to available antibiotics. However, researchers have encountered obstacles in the AMPs designing process as experiments to extract AMPs from protein sequences are costly and require a long set-up time. Therefore, a computational tool for AMPs prediction is needed to resolve this problem. In this study, an integrated algorithm is newly introduced to predict AMPs by integrating sequence alignment and support vector machine- (SVM-) LZ complexity pairwise algorithm. It was observed that, when all sequences in the training set are used, the sensitivity of the proposed algorithm is 95.28% in jackknife test and 87.59% in independent test, while the sensitivity obtained for jackknife test and independent test is 88.74% and 78.70%, respectively, when only the sequences that has less than 70% similarity are used. Applying the proposed algorithm may allow researchers to effectively predict AMPs from unknown protein peptide sequences with higher sensitivity.
Background. Quaternary structures of proteins are closely relevant to gene regulation, signal transduction, and many other biological functions of proteins. In the current study, a new method based on protein-conserved motif composition in block format for feature extraction is proposed, which is termed block composition. Results. The protein quaternary assembly states prediction system which combines blocks with functional domain composition, called QuaBingo, is constructed by three layers of classifiers that can categorize quaternary structural attributes of monomer, homooligomer, and heterooligomer. The building of the first layer classifier uses support vector machines (SVM) based on blocks and functional domains of proteins, and the second layer SVM was utilized to process the outputs of the first layer. Finally, the result is determined by the Random Forest of the third layer. We compared the effectiveness of the combination of block composition, functional domain composition, and pseudoamino acid composition of the model. In the 11 kinds of functional protein families, QuaBingo is 23% of Matthews Correlation Coefficient (MCC) higher than the existing prediction system. The results also revealed the biological characterization of the top five block compositions. Conclusions. QuaBingo provides better predictive ability for predicting the quaternary structural attributes of proteins.
Kraits (Bungarus spp.) are highly venomous elapids that are only found in Asia. In the current study, 103 and 86 different proteins were identified from Bungarus candidus and Bungarus fasciatus venoms, respectively. These proteins were classified into 18 different venom protein families. Both venoms were found to contain a high percentage of three finger toxins, phospholipase A2 enzymes and Kunitz-type inhibitors. Smaller number of high molecular weight enzymes such as L-amino acid oxidase, hyaluronidases, and acetylcholinesterase were also detected in the venoms. We also detected some unique proteins that were not known to be present in these venoms. The presence of a natriuretic peptide, vespryn, and serine protease families was detected in B. candidus venom. We also detected the presence of subunit A and B of β-bungarotoxin and α-bungarotoxin which had not been previously found in B. fasciatus venom. Understanding the proteome composition of Malaysian krait species will provide useful information on unique toxins and proteins which are present in the venoms. This knowledge will assist in the management of krait envenoming. In addition, these proteins may have potential use as research tools or as drug-design templates.
Cysteine proteases in pineapple (Ananas comosus) plants are phytotherapeutical agents that demonstrate anti-edematous, anti-inflammatory, anti-thrombotic and fibrinolytic activities. Bromelain has been identified as an active component and as a major protease of A. comosus. Bromelain has gained wide acceptance and compliance as a phytotherapeutical drug. The proteolytic fraction of pineapple stem is termed stem bromelain, while the one presents in the fruit is known as fruit bromelain. The amino acid sequence and domain analysis of the fruit and stem bromelains demonstrated several differences and similarities of these cysteine protease family members. In addition, analysis of the modelled fruit (BAA21848) and stem (CAA08861) bromelains revealed the presence of unique properties of the predicted structures. Sequence analysis and structural prediction of stem and fruit bromelains of A. comosus along with the comparison of both structures provides a new insight on their distinct properties for industrial application.
Protein structure prediction from amino acid sequence has been one of the most challenging aspects in computational structural biology despite significant progress in recent years showed by critical assessment of protein structure prediction (CASP) experiments. When experimentally determined structures are unavailable, the predictive structures may serve as starting points to study a protein. If the target protein consists of homologous region, high-resolution (typically <1.5 Å) model can be built via comparative modelling. However, when confronted with low sequence similarity of the target protein (also known as twilight-zone protein, sequence identity with available templates is less than 30%), the protein structure prediction has to be initiated from scratch. Traditionally, twilight-zone proteins can be predicted via threading or ab initio method. Based on the current trend, combination of different methods brings an improved success in the prediction of twilight-zone proteins. In this mini review, the methods, progresses and challenges for the prediction of twilight-zone proteins were discussed.
The peptides derived from envelope proteins have been shown to inhibit the protein-protein interactions in the virus membrane fusion process and thus have a great potential to be developed into effective antiviral therapies. There are three types of envelope proteins each exhibiting distinct structure folds. Although the exact fusion mechanism remains elusive, it was suggested that the three classes of viral fusion proteins share a similar mechanism of membrane fusion. The common mechanism of action makes it possible to correlate the properties of self-derived peptide inhibitors with their activities. Here we developed a support vector machine model using sequence-based statistical scores of self-derived peptide inhibitors as input features to correlate with their activities. The model displayed 92% prediction accuracy with the Matthew's correlation coefficient of 0.84, obviously superior to those using physicochemical properties and amino acid decomposition as input. The predictive support vector machine model for self- derived peptides of envelope proteins would be useful in development of antiviral peptide inhibitors targeting the virus fusion process.
A genetic similarity algorithm is introduced in this study to find a group of semantically similar Gene Ontology terms. The genetic similarity algorithm combines semantic similarity measure algorithm with parallel genetic algorithm. The semantic similarity measure algorithm is used to compute the similitude strength between the Gene Ontology terms. Then, the parallel genetic algorithm is employed to perform batch retrieval and to accelerate the search in large search space of the Gene Ontology graph. The genetic similarity algorithm is implemented in the Gene Ontology browser named basic UTMGO to overcome the weaknesses of the existing Gene Ontology browsers which use a conventional approach based on keyword matching. To show the applicability of the basic UTMGO, we extend its structure to develop a Gene Ontology -based protein sequence annotation tool named extended UTMGO. The objective of developing the extended UTMGO is to provide a simple and practical tool that is capable of producing better results and requires a reasonable amount of running time with low computing cost specifically for offline usage. The computational results and comparison with other related tools are presented to show the effectiveness of the proposed algorithm and tools.
CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt.
This study reports the first full length gene of interferon related developmental regulator-1 (designated as MrIRDR-1), identified from the transcriptome of Macrobrachium rosenbergii. The complete gene sequence of the MrIRDR-1 is 2459 base pair long with an open reading frame of 1308 base pairs and encoding a predicted protein of 436 amino acids with a calculated molecular mass of 48 kDa. The MrIRDR-1 protein contains a long interferon related developmental regulator super family domain between 30 and 330. The mRNA expressions of MrIRDR-1 in healthy and the infectious hypodermal and hematopoietic necrosis virus (IHHNV) infected M. rosenbergii were examined using qRT-PCR. The MrIRDR-1 is highly expressed in hepatopancreas along with all other tissues (walking leg, gills, muscle, haemocyte, pleopods, brain, stomach, intestine and eye stalk). After IHHNV infection, the expression is highly upregulated in hepatopancreas. This result indicates an important role of MrIRDR-1 in prawn defense system.
Thermostable and organic solvent-tolerant enzymes have significant potential in a wide range of synthetic reactions in industry due to their inherent stability at high temperatures and their ability to endure harsh organic solvents. In this study, a novel gene encoding a true lipase was isolated by construction of a genomic DNA library of thermophilic Aneurinibacillus thermoaerophilus strain HZ into Escherichia coli plasmid vector. Sequence analysis revealed that HZ lipase had 62% identity to putative lipase from Bacillus pseudomycoides. The closely characterized lipases to the HZ lipase gene are from thermostable Bacillus and Geobacillus lipases belonging to the subfamily I.5 with ≤ 57% identity. The amino acid sequence analysis of HZ lipase determined a conserved pentapeptide containing the active serine, GHSMG and a Ca(2+)-binding motif, GCYGSD in the enzyme. Protein structure modeling showed that HZ lipase consisted of an α/β hydrolase fold and a lid domain. Protein sequence alignment, conserved regions analysis, clustal distance matrix and amino acid composition illustrated differences between HZ lipase and other thermostable lipases. Phylogenetic analysis revealed that this lipase represented a new subfamily of family I of bacterial true lipases, classified as family I.9. The HZ lipase was expressed under promoter Plac using IPTG and was characterized. The recombinant enzyme showed optimal activity at 65 °C and retained ≥ 97% activity after incubation at 50 °C for 1h. The HZ lipase was stable in various polar and non-polar organic solvents.
Ornithine decarboxylase (ODC) is an enzyme of one of the two pathways of putrescine biosynthesis in plants. The genes encoding ODC have previously been cloned from Datura stramonium and human. Using differential screening, we isolated ODC cDNA clone from a cDNA library of ripening Capsicum annuum fruit. The cDNA clone designated CUKM10 contains an insert of 1523 bp. The longest open reading frame potentially encodes a peptide of 345 amino acids with an estimated molecular mass of 47 kDa and exhibit striking similarity to other ODCs. Expression analysis showed that the capODC hybridised to a single transcript with a size of 1.7 kb. The capODC transcript was first observed in early ripening and increased steadily until it reached fully ripening stage. From the observation it is suggested that capODC is developmentally regulated especially during later stage of ripening.
Shigella flexneri serotype 2a is a major public health concern in the developing and under-developed countries which contributes to shigellosis endemic and mortality. Thus, there is an urgent need for a rapid diagnostic test for effective therapy and disease management. Previous study showed that a ∼35 kDa antigenic protein from S. flexneri is a potential biomarker. We therefore modelled the three-dimensional structure of the antigen to probe its functionality which could aid in the development of an antigen-based diagnostic. Results showed that the antigen is a transmembrane protein consists of OmpA and OmpA-like domains. The OmpA domain is a beta-barrel embedded in the outer membrane with four surface-exposed extracellular loops. The OmpA-like domain is linked to the OmpA domain with a 17 amino acids linker and located in the periplasmic. Docking of peptidoglycan into the groove of OmpA-like domain might help in catalyzing the bacterial cell wall formation. Both domains are expected to be involved in the virulence, structural stability, pathogenesis and survival of Shigella thus made the 35 kDa protein a suitable shigellosis diagnostic biomarker. This structural elucidation will also enable a better identification of the epitope regions for the development of specific binders to the 35 kDa antigen.
A thermophilic lipolytic bacterium identified as Bacillus sp. L2 via 16S rDNA was previously isolated from a hot spring in Perak, Malaysia. Bacillus sp. L2 was confirmed to be in Group 5 of bacterial classification, a phylogenically and phenotypically coherent group of thermophilic bacilli displaying very high similarity among their 16S rRNA sequences (98.5-99.2%). Polymerase chain reaction (PCR) cloning of L2 lipase gene was conducted by using five different primers. Sequence analysis of the L2 lipase gene revealed an open reading frame (ORF) of 1251 bp that codes for 417 amino acids. The signal peptides consist of 28 amino acids. The mature protein is made of 388 amino acid residues. Recombinant lipase was successfully overexpressed with a 178-fold increase in activity compared to crude native L2 lipase. The recombinant L2 lipase (43.2 kDa) was purified to homogeneity in a single chromatography step. The purified lipase was found to be reactive at a temperature range of 55-80 °C and at a pH of 6-10. The L2 lipase had a melting temperature (Tm) of 59.04 °C when analyzed by circular dichroism (CD) spectroscopy studies. The optimum activity was found to be at 70 °C and pH 9. Lipase L2 was strongly inhibited by ethylenediaminetetraacetic acid (EDTA) (100%), whereas phenylmethylsulfonyl fluoride (PMSF), pepstatin-A, 2-mercaptoethanol and dithiothreitol (DTT) inhibited the enzyme by over 40%. The CD spectra of secondary structure analysis showed that the L2 lipase structure contained 38.6% α-helices, 2.2% ß-strands, 23.6% turns and 35.6% random conformations.
To isolate and identify the pathogen of Dengue fever from Shenzhen city in 2005 - 2006, and to analyze the molecular characteristics of the isolated Dengue virus strain as well as to explore its possible origin.
DARPP-32 (dopamine and adenosine 3', 5'-monophosphate-regulated phosphoprotein of 32 kDa), which belongs to PPP1R1 gene family, is known to act as an important integrator in dopamine-mediated neurotransmission via the inhibition of protein phosphatase-1 (PP1). Besides its neuronal roles, this protein also behaves as a key player in pathological and pharmacological aspects. Use of bioinformatics and phylogenetics approaches to further characterize the molecular features of DARPP-32 can guide future works. Predicted phosphorylation sites on DARPP-32 show conservation across vertebrates. Phylogenetics analysis indicates evolutionary strata of phosphorylation site acquisition at the C-terminus, suggesting functional expansion of DARPP-32, where more diverse signalling cues may involve in regulating DARPP-32 in inhibiting PP1 activity. Moreover, both phylogenetics and synteny analyses suggest de novo origination of PPP1R1 gene family via chromosomal rearrangement and exonization.
We report a detailed structural analysis of the psychrophilic exo-β-1,3-glucanase (GaExg55) from Glaciozyma antarctica PI12. This study elucidates the structural basis of exo-1,3-β-1,3-glucanase from this psychrophilic yeast. The structural prediction of GaExg55 remains a challenge because of its low sequence identity (37 %). A 3D model was constructed for GaExg55. Threading approach was employed to determine a suitable template and generate optimal target-template alignment for establishing the model using MODELLER9v15. The primary sequence analysis of GaExg55 with other mesophilic exo-1,3-β-glucanases indicated that an increased flexibility conferred to the enzyme by a set of amino acids substitutions in the surface and loop regions of GaExg55, thereby facilitating its structure to cold adaptation. A comparison of GaExg55 with other mesophilic exo-β-1,3-glucanases proposed that the catalytic activity and structural flexibility at cold environment were attained through a reduced amount of hydrogen bonds and salt bridges, as well as an increased exposure of the hydrophobic side chains to the solvent. A molecular dynamics simulation was also performed using GROMACS software to evaluate the stability of the GaExg55 structure at varying low temperatures. The simulation result confirmed the above findings for cold adaptation of the psychrophilic GaExg55. Furthermore, the structural analysis of GaExg55 with large catalytic cleft and wide active site pocket confirmed the high activity of GaExg55 to hydrolyze polysaccharide substrates.
Hepatic phosphoprotein levels are altered in mouse liver as a manifestation of bacteria, virus or parasite infection. Identification of signaling pathways mediated by these hepatic proteins contribute to the current understanding of the mechanism of pathogenesis in malarial infection. The present study was undertaken to evaluate the changes in hepatic phosphoprotein levels during Plasmodium berghei infection. Our study revealed changes in levels of three hepatic phosphoproteins following P. berghei infection compared to non-infected controls. Peptide fragment sequence analysis using tandem mass spectrometry (MS/MS) showed these hepatic proteins to be homologs to haemoglobin beta (HBB), class
Pi glutathione S-tranferase (GSTPi) and carbonic anhydrase III (CAIII) proteins of Mus musculus species respectively from the NCBInr sequence database. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis predicted the involvement of these proteins in specific pathways in Mus musculus species; GSTPi in glutathione and drug metabolism and CAIII in nitrogen metabolism. This shows that P. berghei infection affects similar signaling pathways as those reported in other pathogenic infections such as that related to GSTPi and CAIII in response to oxidative stress.
Malaria is caused by multiple different species of protozoan parasites, and interventions in the pre-elimination phase can lead to drastic changes in the proportion of each species causing malaria. In endemic areas, cross-reactivity may play an important role in the protection and blocking transmission. Thus, successful control of one species could lead to an increase in other parasite species. A few studies have reported cross-reactivity producing cross-immunity, but the extent of cross-reactive, particularly between closely related species, is poorly understood. P. vivax and P. knowlesi are particularly closely related species causing malaria infections in SE Asia, and whilst P. vivax cases are in decline, zoonotic P. knowlesi infections are rising in some areas. In this study, the cross-species reactivity and growth inhibition activity of P. vivax blood-stage antigen-specific antibodies against P. knowlesi parasites were investigated. Bioinformatics analysis, immunofluorescence assay, western blotting, protein microarray, and growth inhibition assay were performed to investigate the cross-reactivity. P. vivax blood-stage antigen-specific antibodies recognized the molecules located on the surface or released from apical organelles of P. knowlesi merozoites. Recombinant P. vivax and P. knowlesi proteins were also recognized by P. knowlesi- and P. vivax-infected patient antibodies, respectively. Immunoglobulin G against P. vivax antigens from both immune animals and human malaria patients inhibited the erythrocyte invasion by P. knowlesi. This study demonstrates that there is extensive cross-reactivity between antibodies against P. vivax to P. knowlesi in the blood stage, and these antibodies can potently inhibit in vitro invasion, highlighting the potential cross-protective immunity in endemic areas.