A nuclear export signal (NES) is a protein localization signal, which is involved in binding of cargo proteins to nuclear export receptor, thus contributes to regulate localization of cellular proteins. Consensus sequences of NES have been used to detect NES from protein sequences, but suffer from poor predictive power. Some recent peering works were proposed to use biochemical properties of experimental verified NES to refine NES candidates. Those methods can achieve high prediction rates, but their execution time will become unacceptable for large-scale NES searching if too much properties are involved. In this work, we developed a novel computational approach, named NES-REBS, to search NES from protein sequences, where biochemical properties of experimental verified NES, including secondary structure and surface accessibility, are utilized to refine NES candidates obtained by matching popular consensus sequences. We test our method by searching 262 experimental verified NES from 221 NES-containing protein sequences. It is obtained that NES-REBS runs in 2-3[Formula: see text]mins and performs well by achieving precision rate 47.2% and sensitivity 54.6%.
Protein structure alignment and comparisons that are based on an alphabetical demonstration of protein structure are more simple to run with faster evaluation processes; thus, their accuracy is not as reliable as three-dimension (3D)-based tools. As a 1D method candidate, TS-AMIR used the alphabetic demonstration of secondary-structure elements (SSE) of proteins and compared the assigned letters to each SSE using the [Formula: see text]-gram method. Although the results were comparable to those obtained via geometrical methods, the SSE length and accuracy of adjacency between SSEs were not considered in the comparison process. Therefore, to obtain further information on accuracy of adjacency between SSE vectors, the new approach of assigning text to vectors was adopted according to the spherical coordinate system in the present study. Moreover, dynamic programming was applied in order to account for the length of SSE vectors. Five common datasets were selected for method evaluation. The first three datasets were small, but difficult to align, and the remaining two datasets were used to compare the capability of the proposed method with that of other methods on a large protein dataset. The results showed that the proposed method, as a text-based alignment approach, obtained results comparable to both 1D and 3D methods. It outperformed 1D methods in terms of accuracy and 3D methods in terms of runtime.