Optimized Signal Peptide for Secretory Expression of Human Recombinant Somatropin in E. coli

Purpose: The human somatropin is a single-chain polypeptide with a pivotal role in various biological processes. Although Escherichia coli is considered as a preferred host for the production of human somatropin, the high expression of this protein in E. coli results in the accumulation of protein as inclusion bodies. Periplasmic expression using signal peptides could be used to overcome the formation of inclusion bodies; still, the efficiency of each of the signal peptides in periplasmic transportation is varied and often is protein specific. The present study aimed to use in silico analysis to identify an appropriate signal peptide for the periplasmic expression of human somatropin in E. coli. Methods: A library containing 90 prokaryotic and eukaryotic signal peptides were collected from the signal peptide database, and each signal’s characteristics and efficiency in connection with the target protein were analyzed by different software. The prediction of the secretory pathway and the cleavage position was determined by the signalP5 server. Physicochemical properties, including molecular weight, instability index, gravity, and aliphatic index, were investigated by ProtParam software. Results: The results of the present study showed that among all the signal peptides studied, five signal peptides ynfB, sfaS, lolA, glnH, and malE displayed high scores for periplasmic expression of human somatropin in E. coli, respectively. Conclusion: In conclusion, the results indicated that in-silico analysis could be used for the identification of suitable signal peptides for the periplasmic expression of proteins. Further laboratory studies can evaluate the accuracy of the results of in silico analysis.


Introduction
Human somatropin is a non-glycosylated single-chain polypeptide comprising of 191 amino acids, with a molecular mass of 22.1 kDa. 1 Somatropin belongs to the somatotropin/prolactin family, which plays a significant role in growth control through stimulating various tissues, mainly the liver, to secrete insulin-like growth factor 1 (IGF-1). Besides, it is responsible for the differentiation and proliferation of myoblasts, the uptake of amino acids, and proteins' production in muscles and other tissues. 2 Advantages such as easy genetic manipulation, low-cost media, and short culturing time have led to the use of Escherichia coli as the most suitable expression system for the production of many recombinant proteins. 3 However, high level expression of recombinant proteins in E. coli often give rise to aggregated protein molecules, known as inclusion bodies. 4 Therefore, recombinant proteins' purification encounters significant challenges, involving isolation from the cells, unfolding, refolding, and purification to produce the bioactive proteins. Various strategies have been used to overcome this problem include secretary expression by targeting the protein into the periplasmic space by an N-terminal signal peptide. 5 Sec, SRP, and TAT are major protein secretion pathways used by prokaryotes by which proteins direct into the periplasm or extracellular space according to their signal peptides (signal peptides). 6 Therefore, selecting an appropriate signal peptide is an essential parameter in the secretory expression of recombinant proteins. 7 Several studies have shown that the function of signal peptides is protein-specific, and there is no unique ideal signal peptide for secretary expression of all proteins. 8 A conventional method for selecting a signal peptide for a given protein is trial and error, which is labor-intensive and timeconsuming. Recently various bioinformatics programs have been developed for the analysis of the efficiency of different signal peptides, which include signalP4.1, ProtParam, SOLpro, ProtCompB, and signalP5.0. The advantages of using a bioinformatics program before starting an experimental study are reducing costs and increasing the accuracy and validity of experimental research. 9 Secretory expression of recombinant proteins, particularly pharmaceutical proteins, in E. coli has many advantages. Targeting a recombinant protein to the periplasmic space or the extracellular medium, in addition to reducing costs, facilitates downstream processing, compared to the cytosolic production. 10 The purpose of the present study was to in silico analysis of various signal peptides for secretary expression of somatropin using different bioinformatic programs.

Signal peptide sequences
In this research, sequences of 90 different signal peptides were collected from the Signal Sequence database at http:// www.signalpeptide.de/ (Table 1) and used for further analyses.

In silico prevision of signal peptide and prediction of h, c and n regions
SignalP software version 4.1 (http://www.cbs.dtu.dk/ services/SignalP-4.1/) was used for the prediction of signal peptides and their sites of cleavage based on the combination of different artificial neural networks. 11 SignalP online software version 3.0 was (http://www.cbs. dtu.dk/services/SignalP-3.0/) employed for predicting n, h, and c regions of signal peptides. For this purpose, signal peptides were added to the somatropin sequence and analyzed by the program.

Analysis of physicochemical features of signal peptides
The ProtParam program was used to evaluate the physicochemical features of the signal peptides including, theoretical pI, amino acid composition, negatively and positively charged amino acids, grand average of hydropathicity (GRAVY), instability index, aliphatic index, and molecular weight.

Analysis of protein solubility
SOLpro tool predicts the solubility of a protein upon expression in E. coli based on characteristics of primary sequences. Therefore, the SOLpro at http://scratch. proteomics.ics.uci.edu/, was used to determine the protein solubility in E. coli. SOLpro tool has a prediction accuracy of above 74%.

Prediction of protein localization
ProtComp B server, from Softberry, Inc (http://www. softberry.com), was applied for prediction of somatropin destination in connection with various signal peptides. It accomplishes this job using a composition of sequence homology and neural networks. 12

Prediction of the type of signal peptides and cleavage probability
In prokaryotes, there are three types of signal peptides, including Sec pathway cleaved by either SPase I (Sec/ SPI) or SPase II (Sec/SPII), and Tat pathway cleaved by Tat/SPI. 13 SignalP5.0 server was used for discrimination of three types of signal peptides. 14 SignalP 5.0 predicts the type of signal peptides based on a deep convolutional and recurrent neural network architecture. 15 The cleavage probability was also determined by SignalP 5.0 program.

Results and Discussion
In silico prediction of signal peptide and determination of c, h, and n regions SignalP 4.1 was applied for prediction of the most suitable signal peptide for somatropin, enabling its secretion into the periplasmic space in E. coli. SignalP 4.1 identifies a signal peptide based on a discriminating score, D-score. The output was tabulated in Table 2, containing five scores of D, C, S, Y, S-mean including cleavage sites and c, h and n regions of signal peptides.
Thirty-six signal peptides were deleted from further analysis because the D-scores of them were less than the cut off value of 0.570, indicating that they are not efficient for the secretion of somatropin protein.
Among the analyzed 90 signal peptides, four signal peptides, including pelB, flgl, nmpc, and, gfcA showed the highest D-score value of 0.910, 0.907, 0.902, and 0.902, respectively. Moreover, the results demonstrated that pelB and NPPC have the highest D-score in prokaryotic and eukaryotic expression systems, respectively. Additionally, the lowest scores belonged to HBP and LEAP2 (0.175, 0.208) in prokaryotic and eukaryotic expression systems, respectively.

Physico-chemical features of signal peptides
Several physicochemical features of 55 remaining signal peptides containing, theoretical pI length, molecular weight, net positive charge, grand average of hydropathicity (GRAVY), instability index and aliphatic index were evaluated by ProtParam server ( Table 3). The results showed that the length of signal peptides was between 18 and 28 residues. The results of in silico analysis revealed that the highest molecular weight pertained to ynfB, bcsB, lptA, and efeO (2948.71, 2853.53, 2849.47, and 2845.33 daltons, respectively).

Prediction of protein solubility
The results of somatropin solubility in fusion with

Prediction of the protein localization
The analysis results for sub-cellular localization by ProtCompB server indicated that the final localization sites were the outer membrane, inner membrane, and periplasmic space for 13, 15, and 18 signal peptides, respectively. Furthermore, analysis for the final localization of somatropin with signal peptides faeG, FimF41a, ompA, papK, prsK, lamb, nmpc, bcsB, and gfcA revealed that somatropin could be secreted by these signal peptides ( Table 4).

Prediction of cleavage probability and the type of signal peptides
The remaining 55 signal peptides were examined for their secretory pathway(s) by using signal P5.0 software. The results showed that except efeO (TAT pathway) and ampC (sec/SPII), all of these signal peptides were specific for the Sec/SPI pathway ( Table 4). The cleavage probability of each signal peptides was tabulated in Table 2.

Selection of appropriate signal peptide
First, the signal peptides with final localization in periplasmic space was selected and sorted according to the aliphatic index. Then, the stability and solubility of target protein in connection with the selected signals was examined. The signal peptides with which somatropin remained stable and soluble were selected as the appropriate peptide signal (Table 5). E. coli is the economical and straightforward host for the expression of recombinant proteins. 16 However, overexpression of recombinant proteins in the intracellular space of E. coli is usually associated with insoluble aggregate and inclusion body formation. To keep appropriate folding, the proteins should be avoided from the reductive environment of the cytoplasm. Hence, the secretory expression has several advantages for the production of recombinant proteins, compared with cytosolic systems.
The secretion of the target protein requires transporting across the cytoplasmic membrane. In bacteria, Sec, SRP, and TAT are three major protein secretion pathways for the carriage of proteins through the plasma membrane. These protein transport systems depend on the presence of suitable signal peptides on proteins. Signal peptides are short amino terminal peptides that affect the biosynthesis, folding, and stability of the corresponding target proteins. 17 Although various signal peptides differ in their sequences, they share conserved physicochemical properties, including aliphatic index, molecular weight, instability index, Gravy, net positive charge, and theoretical pI. The three important regions of signal peptides include an amino terminal positively-charged region, a hydrophobic central region, and a carboxyl-terminal polar region that contains the cleavage site (a conserved A-X-A motif). It has demonstrated that the n region in the signal peptide has an essential role in the primary phase of protein secretion across the membranes. 18 Also the n-region responsible for the net positive charge of the signal peptide. In addition, the presence of the basic residues in this region may be indispensable for the performance of an efficient signal peptide. 19 Further to the charge of the n-region, the c-region has an intense effect on the performance of membrane transport by both the Tat and Sec pathways. The third region of signal peptides that can affect the secretion output is the hydrophobic helical H region of the signal peptides. Also, the central h-region of signal peptides are important because the length and hydrophobic density of h-region intensify the hydrophobicity levels and facilitate the protein secretion. 19,20 In the present study, the physicochemical features of the 90 signal peptides were analyzed for secretory expression of somatropin in E. coli.
As shown in Table 3, flgI, OmpC, NPPC, mepA, and cysP showed the highest hydrophobicity levels (185.50, 171.90, 165.65, 164.74 and 164.00, respectively) among the studied signal peptides whereas, the signal peptides, bcsB, DsbC, zraP, ansB, and araF showed the lowest hydrophobicity (58.80, 78.50, 79.23, 93.64, and 93.91, respectively). Previous studies reported that OmpC has the highest aliphatic index, which is in agreement with our results. 21 Analysis for secretory pathway revealed that all 55 Signal peptides (except efeO) are specific for the Sec pathway with reliability scores of more than 0.9 (Table 4). Therefore, our findings were consistent with some previous reports. 9,22 Sec exportome polypeptides have a cleavable, Sec-specific, n-terminal signal peptides that translocates proteins across the inner membrane (I.M.) in an unfolded state. 23,24 There are two methods for selecting a signal peptide for any given protein, including experimental / trial and error method, and in silico analysis method. The advantages of using a bioinformatics program before starting an experimental study are increasing the precision and validity and reducing experimental research expenses.
In this study, online bioinformatic tools were used to find suitable signal peptides for periplasmic expression of recombinant somatropin in E. coli. Different signal peptides, including 17 eukaryotic and 73 prokaryotic signal peptides, were evaluated. The D-score parameter was used to determine an appropriate signal peptides. D-score is also used to sort signal peptides in the first step. According to the D-scores (Table 2), 55 out of 90 selected signal peptides were identified as signal peptides for somatropin. Data were sorted based on the priority of D-scores, final localization, h-region length, aliphatic index, Gravy, and solubility, respectively (Table 5). According to this sorting,  pelB, flgl, nmpC, GfcA, OmpW, PpiA, and OmpC showed the highest D-score. However, pelB and OmpC showed the highest D-score in other bioinformatics studies. 21 The results of analysis revealed that somatropin in connection with 34 signal peptides was stable and directed toward the Sec pathway, 9 signal peptides mediated the secretion, and 15 signal peptide translocated the somatropin into the periplasmic space. Zamani et al analyzed the secretion of somatropin by L-asparaginase II signal sequence and reported that successful secretion of somatropin is not achieved using the L-asparaginase II signal sequence. 22 The expression of somatropin with the NPR, STII and DsbA signal peptides using RRI as the host cell, showed that the DsbA was the most effective signal peptide for somatropin gene with 80% higher expression level compared to the reference vector. 23 Previous studies 25 demonstrated the high secretion of somatropin with phoA signal peptide, but in our research, phoA was not the right candidate due to lower D-sore (0.688) and final localization in the inner membrane.
This study evaluated 90 different signal peptide to find the most applicable signal peptide for secreting the recombinant somatropin protein in the E. coli. The results of the present study showed that ynfB, sfaS, lolA, glnH, and malE has all the features needed to be selected as suitable signal peptides for somatropin protein

Conclusion
In this research, various signal peptides were appraised for the periplasmic expression of somatropin in E. coli. The selection was based on the combination of hydrophobicity, D score, solubility, stability, and the final localization.
The results indicated that specific signal peptides, including ynfB, sfaS, lolA, glnH, and malE have the highest scores and could be used for soluble periplasmic expression of somatropin in E. coli. However, the proof of these results should be verified by an experimental study.