Over the last 20 years in biotechnology, the production of recombinant proteins has been a crucial bioprocess in both biopharmaceutical and research arena in terms of human health, scientific impact and economic volume. Although logical strategies of genetic engineering have been established, protein overexpression is still an art. In particular, heterologous expression is often hindered by low level of production and frequent fail due to opaque reasons. The problem is accentuated because there is no generic solution available to enhance heterologous overexpression. For a given protein, the extent of its solubility can indicate the quality of its function. Over 30% of synthesized proteins are not soluble. In certain experimental circumstances, including temperature, expression host, etc., protein solubility is a feature eventually defined by its sequence. Until now, numerous methods based on machine learning are proposed to predict the solubility of protein merely from its amino acid sequence. In spite of the 20 years of research on the matter, no comprehensive review is available on the published methods.
Recombinant protein overexpression, an important biotechnological process, is ruled by complex biological rules which are mostly unknown, is in need of an intelligent algorithm so as to avoid resource-intensive lab-based trial and error experiments in order to determine the expression level of the recombinant protein. The purpose of this study is to propose a predictive model to estimate the level of recombinant protein overexpression for the first time in the literature using a machine learning approach based on the sequence, expression vector, and expression host. The expression host was confined to Escherichia coli which is the most popular bacterial host to overexpress recombinant proteins. To provide a handle to the problem, the overexpression level was categorized as low, medium and high. A set of features which were likely to affect the overexpression level was generated based on the known facts (e.g. gene length) and knowledge gathered from related literature. Then, a representative sub-set of features generated in the previous objective was determined using feature selection techniques. Finally a predictive model was developed using random forest classifier which was able to adequately classify the multi-class imbalanced small dataset constructed. The result showed that the predictive model provided a promising accuracy of 80% on average, in estimating the overexpression level of a recombinant protein.
Accurate medical disease diagnosis is considered to be an important classification problem. The main goal of the classification process is to determine the class to which a certain pattern belongs. In this article, a new classification technique based on a combination of The Teaching Learning-Based Optimization (TLBO) algorithm and Fuzzy Wavelet Neural Network (FWNN) with Functional Link Neural Network (FLNN) is proposed. In addition, the TLBO algorithm is utilized for training the new hybrid Functional Fuzzy Wavelet Neural Network (FFWNN) and optimizing the learning parameters, which are weights, dilation and translation. To evaluate the performance of the proposed method, five standard medical datasets were used: Breast Cancer, Heart Disease, Hepatitis, Pima-Indian diabetes and Appendicitis. The efficiency of the proposed method is evaluated using 5-fold cross-validation and 10-fold cross-validation in terms of mean square error (MSE), classification accuracy, running time, sensitivity, specificity and kappa. The experimental results show that the efficiency of the proposed method for the medical classification problems is 98.309%, 91.1%, 91.39%, 88.67% and 93.51% for the Breast Cancer, Heart Disease, Hepatitis, Pima-Indian diabetes and Appendicitis datasets, respectively, in terms of accuracy after 30 runs for each dataset with low computational complexity. In addition, it has been observed that the proposed method has efficient performance compared with the performance of other methods found in the related previous studies.
Network lifetime and energy efficiency are crucial performance metrics used to evaluate wireless sensor networks (WSNs). Decreasing and balancing the energy consumption of nodes can be employed to increase network lifetime. In cluster-based WSNs, one objective of applying clustering is to decrease the energy consumption of the network. In fact, the clustering technique will be considered effective if the energy consumed by sensor nodes decreases after applying clustering, however, this aim will not be achieved if the cluster size is not properly chosen. Therefore, in this paper, the energy consumption of nodes, before clustering, is considered to determine the optimal cluster size. A two-stage Genetic Algorithm (GA) is employed to determine the optimal interval of cluster size and derive the exact value from the interval. Furthermore, the energy hole is an inherent problem which leads to a remarkable decrease in the network's lifespan. This problem stems from the asynchronous energy depletion of nodes located in different layers of the network. For this reason, we propose Circular Motion of Mobile-Sink with Varied Velocity Algorithm (CM2SV2) to balance the energy consumption ratio of cluster heads (CH). According to the results, these strategies could largely increase the network's lifetime by decreasing the energy consumption of sensors and balancing the energy consumption among CHs.
Mathematical modelling is fundamental to understand the dynamic behavior and regulation of the biochemical metabolisms and pathways that are found in biological systems. Pathways are used to describe complex processes that involve many parameters. It is important to have an accurate and complete set of parameters that describe the characteristics of a given model. However, measuring these parameters is typically difficult and even impossible in some cases. Furthermore, the experimental data are often incomplete and also suffer from experimental noise. These shortcomings make it challenging to identify the best-fit parameters that can represent the actual biological processes involved in biological systems. Computational approaches are required to estimate these parameters. The estimation is converted into multimodal optimization problems that require a global optimization algorithm that can avoid local solutions. These local solutions can lead to a bad fit when calibrating with a model. Although the model itself can potentially match a set of experimental data, a high-performance estimation algorithm is required to improve the quality of the solutions. This paper describes an improved hybrid of particle swarm optimization and the gravitational search algorithm (IPSOGSA) to improve the efficiency of a global optimum (the best set of kinetic parameter values) search. The findings suggest that the proposed algorithm is capable of narrowing down the search space by exploiting the feasible solution areas. Hence, the proposed algorithm is able to achieve a near-optimal set of parameters at a fast convergence speed. The proposed algorithm was tested and evaluated based on two aspartate pathways that were obtained from the BioModels Database. The results show that the proposed algorithm outperformed other standard optimization algorithms in terms of accuracy and near-optimal kinetic parameter estimation. Nevertheless, the proposed algorithm is only expected to work well in small scale systems. In addition, the results of this study can be used to estimate kinetic parameter values in the stage of model selection for different experimental conditions.