Protein structure alignment and comparisons that are based on an alphabetical demonstration of protein structure are more simple to run with faster evaluation processes; thus, their accuracy is not as reliable as three-dimension (3D)-based tools. As a 1D method candidate, TS-AMIR used the alphabetic demonstration of secondary-structure elements (SSE) of proteins and compared the assigned letters to each SSE using the [Formula: see text]-gram method. Although the results were comparable to those obtained via geometrical methods, the SSE length and accuracy of adjacency between SSEs were not considered in the comparison process. Therefore, to obtain further information on accuracy of adjacency between SSE vectors, the new approach of assigning text to vectors was adopted according to the spherical coordinate system in the present study. Moreover, dynamic programming was applied in order to account for the length of SSE vectors. Five common datasets were selected for method evaluation. The first three datasets were small, but difficult to align, and the remaining two datasets were used to compare the capability of the proposed method with that of other methods on a large protein dataset. The results showed that the proposed method, as a text-based alignment approach, obtained results comparable to both 1D and 3D methods. It outperformed 1D methods in terms of accuracy and 3D methods in terms of runtime.
Task scheduling in parallel multiple sequence alignment (MSA) through improved dynamic programming optimization speeds up alignment processing. The increased importance of multiple matching sequences also needs the utilization of parallel processor systems. This dynamic algorithm proposes improved task scheduling in case of parallel MSA. Specifically, the alignment of several tertiary structured proteins is computationally complex than simple word-based MSA. Parallel task processing is computationally more efficient for protein-structured based superposition. The basic condition for the application of dynamic programming is also fulfilled, because the task scheduling problem has multiple possible solutions or options. Search space reduction for speedy processing of this algorithm is carried out through greedy strategy. Performance in terms of better results is ensured through computationally expensive recursive and iterative greedy approaches. Any optimal scheduling schemes show better performance in heterogeneous resources using CPU or GPU.
The structural comparison of proteins is a vital step in structural biology that is used to predict and analyse a new unknown protein function. Although a number of different techniques have been explored, the study to develop new alternative methods is still an active research area. The present paper introduces a text modelling-based technique for the structural comparison of proteins. The method models the secondary and tertiary structure of proteins in two linear sequences and then applies them to the comparison of two structures. The technique used for pairwise comparison of the sequences has been adopted from computational linguistics and its well-known techniques for analysing and quantifying textual sequences. To this end, an n-gram modelling technique is used to capture regularities between sequences, and then, the cross-entropy concept is employed to measure their similarities. Several experiments are conducted to evaluate the performance of the method and compare it with other commonly used programs. The assessments for information retrieval evaluation demonstrate that the technique has a high running speed, which is similar to other linear encoding methods, such as 3D-BLAST, SARST, and TS-AMIR, whereas its accuracy is comparable to CE and TM-align, which are high accuracy comparison tools. Accordingly, the results demonstrate that the algorithm has high efficiency compared with other state-of-the-art methods.
For a given DNA sequence, it is well known that pair wise alignment schemes are used to determine the similarity with the DNA sequences available in the databanks. The efficiency of the alignment decides the type of amino acids and its corresponding proteins. In order to evaluate the given DNA sequence for its proteomic identity, a pattern matching approach is proposed in this paper. A block based semi-global alignment scheme is introduced to determine the similarity between the DNA sequences (known and given). The two DNA sequences are divided into blocks of equal length and alignment is performed which minimizes the computational complexity. The efficiency of the alignment scheme is evaluated using the parameter, percentage of similarity (POS). Four essential DNA version of the amino acids that emphasize the importance of proteomic functionalities are chosen as patterns and matching is performed with the known and given DNA sequences to determine the similarity between them. The ratio of amino acid counts between the two sequences is estimated and the results are compared with that of the POS value. It is found from the experimental results that higher the POS value and the pattern matching higher are the similarity between the two DNA sequences. The optimal block is also identified based on the POS value and amino acids count.
Previously, direct-proportional length-based DNA computing (DPLB-DNAC) for solving weighted graph problems has been reported. The proposed DPLB-DNAC has been successfully applied to solve the shortest path problem, which is an instance of weighted graph problems. The design and development of DPLB-DNAC is important in order to extend the capability of DNA computing for solving numerical optimization problem. According to DPLB-DNAC, after the initial pool generation, the initial solution is subjected to amplification by polymerase chain reaction and, finally, the output of the computation is visualized by gel electrophoresis. In this paper, however, we give more attention to the initial pool generation of DPLB-DNAC. For this purpose, two kinds of initial pool generation methods, which are generally used for solving weighted graph problems, are evaluated. Those methods are hybridization-ligation and parallel overlap assembly (POA). It is found that for DPLB-DNAC, POA is better than that of the hybridization-ligation method, in terms of population size, generation time, material usage, and efficiency, as supported by the results of actual experiments.
This study concerns an attempt to establish a new method for predicting antimicrobial peptides (AMPs) which are important to the immune system. Recently, researchers are interested in designing alternative drugs based on AMPs because they have found that a large number of bacterial strains have become resistant to available antibiotics. However, researchers have encountered obstacles in the AMPs designing process as experiments to extract AMPs from protein sequences are costly and require a long set-up time. Therefore, a computational tool for AMPs prediction is needed to resolve this problem. In this study, an integrated algorithm is newly introduced to predict AMPs by integrating sequence alignment and support vector machine- (SVM-) LZ complexity pairwise algorithm. It was observed that, when all sequences in the training set are used, the sensitivity of the proposed algorithm is 95.28% in jackknife test and 87.59% in independent test, while the sensitivity obtained for jackknife test and independent test is 88.74% and 78.70%, respectively, when only the sequences that has less than 70% similarity are used. Applying the proposed algorithm may allow researchers to effectively predict AMPs from unknown protein peptide sequences with higher sensitivity.
A genetic similarity algorithm is introduced in this study to find a group of semantically similar Gene Ontology terms. The genetic similarity algorithm combines semantic similarity measure algorithm with parallel genetic algorithm. The semantic similarity measure algorithm is used to compute the similitude strength between the Gene Ontology terms. Then, the parallel genetic algorithm is employed to perform batch retrieval and to accelerate the search in large search space of the Gene Ontology graph. The genetic similarity algorithm is implemented in the Gene Ontology browser named basic UTMGO to overcome the weaknesses of the existing Gene Ontology browsers which use a conventional approach based on keyword matching. To show the applicability of the basic UTMGO, we extend its structure to develop a Gene Ontology -based protein sequence annotation tool named extended UTMGO. The objective of developing the extended UTMGO is to provide a simple and practical tool that is capable of producing better results and requires a reasonable amount of running time with low computing cost specifically for offline usage. The computational results and comparison with other related tools are presented to show the effectiveness of the proposed algorithm and tools.
Thermostable and organic solvent-tolerant enzymes have significant potential in a wide range of synthetic reactions in industry due to their inherent stability at high temperatures and their ability to endure harsh organic solvents. In this study, a novel gene encoding a true lipase was isolated by construction of a genomic DNA library of thermophilic Aneurinibacillus thermoaerophilus strain HZ into Escherichia coli plasmid vector. Sequence analysis revealed that HZ lipase had 62% identity to putative lipase from Bacillus pseudomycoides. The closely characterized lipases to the HZ lipase gene are from thermostable Bacillus and Geobacillus lipases belonging to the subfamily I.5 with ≤ 57% identity. The amino acid sequence analysis of HZ lipase determined a conserved pentapeptide containing the active serine, GHSMG and a Ca(2+)-binding motif, GCYGSD in the enzyme. Protein structure modeling showed that HZ lipase consisted of an α/β hydrolase fold and a lid domain. Protein sequence alignment, conserved regions analysis, clustal distance matrix and amino acid composition illustrated differences between HZ lipase and other thermostable lipases. Phylogenetic analysis revealed that this lipase represented a new subfamily of family I of bacterial true lipases, classified as family I.9. The HZ lipase was expressed under promoter Plac using IPTG and was characterized. The recombinant enzyme showed optimal activity at 65 °C and retained ≥ 97% activity after incubation at 50 °C for 1h. The HZ lipase was stable in various polar and non-polar organic solvents.
The Rab GTPase family plays a vital role in several plant physiological processes including fruit ripening. Fruit softening during ripening involves trafficking of cell wall polymers and enzymes between cellular compartments. Mango, an economically important fruit crop, is known for its delicious taste, exotic flavour and nutritional value. So far, there is a paucity of information on the mango Rab GTPase family. In this study, 23 genes encoding Rab proteins were identified in mango by a comprehensive in silico approach. Sequence alignment and similarity tree analysis with the model plant Arabidopsis as a reference enabled the bona fide assignment of the deduced mango proteins to classify into eight subfamilies. Expression analysis by RNA-Sequencing (RNA-Seq) showed that the Rab genes were differentially expressed in ripe and unripe mangoes suggesting the involvement of vesicle trafficking during ripening. Interaction analysis showed that the proteins involved in vesicle trafficking and cell wall softening were interconnected providing further evidence of the involvement of the Rab GTPases in fruit softening. Correlation analyses showed a significant relationship between the expression level of the RabA3 and RabA4 genes and fruit firmness at the unripe stage of the mango varieties suggesting that the differences in gene expression level might be associated with the contrasting firmness of these varieties. This study will not only provide new insights into the complexity of the ripening-regulated molecular mechanism but also facilitate the identification of potential Rab GTPases to address excessive fruit softening.