Logo-apb
Advanced pharmaceutical bulletin. 13(2):339-349. doi: 10.34172/apb.2023.037

Research Article

Optimized Signal Peptide for Secretory Expression of Human Recombinant Somatropin in E. coli

Zeynab Ahmadi 1ORCID logo, Safar Farajnia 2, 3, *ORCID logo, Davoud Farajzadeh 1, 4, *ORCID logo, Naser Pouladi 1, Neda Pourvatan 5, Mohammad Karbalaeimahdi 3, Fahime Shayegh 2, Maryam Arya 3
1Department of Biology, Faculty of Basic Sciences, Azarbaijan Shahid Madani University, Tabriz, Iran.
2Drug Applied Research Center, Tabriz University of Medical Science, Tabriz, Iran.
3Biotechnology Research Center, Tabriz University of Medical Science, Tabriz, Iran.
4Department of Molecular Biology and Cancer Research, Azarbaijan Shahid Madani University, Tabriz, Iran.
5Immunology Research Center, Tabriz University of Medical Science, Tabriz, Iran.
*Corresponding Authors: Safar Farajnia, Email: farajnias@tbzmed.ac.ir and Davoud Farajzadeh, Email: farajzadeh@azaruniv.ac.ir

Abstract

Purpose: The human somatropin is a single-chain polypeptide with a pivotal role in various biological processes. Although Escherichia coli is considered as a preferred host for the production of human somatropin, the high expression of this protein in E. coli results in the accumulation of protein as inclusion bodies. Periplasmic expression using signal peptides could be used to overcome the formation of inclusion bodies; still, the efficiency of each of the signal peptides in periplasmic transportation is varied and often is protein specific. The present study aimed to use in silico analysis to identify an appropriate signal peptide for the periplasmic expression of human somatropin in E. coli.

Methods: A library containing 90 prokaryotic and eukaryotic signal peptides were collected from the signal peptide database, and each signal’s characteristics and efficiency in connection with the target protein were analyzed by different software. The prediction of the secretory pathway and the cleavage position was determined by the signalP5 server. Physicochemical properties, including molecular weight, instability index, gravity, and aliphatic index, were investigated by ProtParam software.

Results: The results of the present study showed that among all the signal peptides studied, five signal peptides ynfB, sfaS, lolA, glnH, and malE displayed high scores for periplasmic expression of human somatropin in E. coli, respectively.

Conclusion: In conclusion, the results indicated that in-silico analysis could be used for the identification of suitable signal peptides for the periplasmic expression of proteins. Further laboratory studies can evaluate the accuracy of the results of in silico analysis.

Keywords: Human somatropin, Signal peptide, E. coli, Secretary expression

Copyright

©2023 The Authors.
This is an Open Access article distributed under the terms of the Creative Commons Attribution (CC BY), which permits unrestricted use, distribution, and reproduction in any medium, as long as the original authors and source are cited. No permission is required from the authors or the publishers.


Introduction

Human somatropin is a non-glycosylated single-chain polypeptide comprising of 191 amino acids, with a molecular mass of 22.1 kDa.1 Somatropin belongs to the somatotropin/prolactin family, which plays a significant role in growth control through stimulating various tissues, mainly the liver, to secrete insulin-like growth factor 1 (IGF-1). Besides, it is responsible for the differentiation and proliferation of myoblasts, the uptake of amino acids, and proteins’ production in muscles and other tissues.2

Advantages such as easy genetic manipulation, low-cost media, and short culturing time have led to the use of Escherichia coli as the most suitable expression system for the production of many recombinant proteins.3 However, high level expression of recombinant proteins in E. coli often give rise to aggregated protein molecules, known as inclusion bodies.4 Therefore, recombinant proteins’ purification encounters significant challenges, involving isolation from the cells, unfolding, refolding, and purification to produce the bioactive proteins. Various strategies have been used to overcome this problem include secretary expression by targeting the protein into the periplasmic space by an N-terminal signal peptide.5

Sec, SRP, and TAT are major protein secretion pathways used by prokaryotes by which proteins direct into the periplasm or extracellular space according to their signal peptides (signal peptides).6 Therefore, selecting an appropriate signal peptide is an essential parameter in the secretory expression of recombinant proteins.7 Several studies have shown that the function of signal peptides is protein-specific, and there is no unique ideal signal peptide for secretary expression of all proteins.8 A conventional method for selecting a signal peptide for a given protein is trial and error, which is labor-intensive and time-consuming. Recently various bioinformatics programs have been developed for the analysis of the efficiency of different signal peptides, which include signalP4.1, ProtParam, SOLpro, ProtCompB, and signalP5.0. The advantages of using a bioinformatics program before starting an experimental study are reducing costs and increasing the accuracy and validity of experimental research.9

Secretory expression of recombinant proteins, particularly pharmaceutical proteins, in E. coli has many advantages. Targeting a recombinant protein to the periplasmic space or the extracellular medium, in addition to reducing costs, facilitates downstream processing, compared to the cytosolic production.10

The purpose of the present study was to in silico analysis of various signal peptides for secretary expression of somatropin using different bioinformatic programs.


Materials and Methods

Signal peptide sequences

In this research, sequences of 90 different signal peptides were collected from the Signal Sequence database at http://www.signalpeptide.de/ (Table 1) and used for further analyses.

Table 1. The list of signal peptides was evaluated in this study
Full name Signal peptide Length Source Accession number Amino acid sequence
Periplasmic appA proteinappA22 Escherichia coli (strain K12)P07102MKAILIPFLSLLIPLTPQSAFA
Cytochrome c-type biogenesis proteinccmH18 Escherichia coli (strain K12)P0ABM9MRFLLGVLMLMISGSALA
Protein cexEcexE19 Escherichia coli A2TJI4MKKYILGVILAMGSLSAIA
Thiosulfate-binding proteincysP25 Escherichia coli (strain K12)P16700MAVNLLKKNSLALVASLLLAGHVQA
Drhemagglutinin structural subunitdraA21 Escherichia coli P24093MKKLAIMAAASMVFAVSSAHA
Thiol:disulfide interchange protein dsbDdsbD19 Escherichia coli (strain K12)P36655MAQRIFTLILLLCSTSVFA
Thiol:disulfide interchange protein dsbGdsbG17 Escherichia coli (strain K12)P77202MLKKILLLALLPAIAFA
K88 fimbrail protein ADfaeG21 Escherichia coli P14191MKKTLIALAIAASAASGMAHA
Iron(III) dicitrate-binding periplasmic proteinfecB21 Escherichia coli (strain K12)P15028MLAFIRFLFAGLLLVISHAFA
F107 fimbrail proteinfedA21 Escherichia coli P25394MKRLVFISFVALSMTAGSAMA
F41 fimbrail proteinFimF41a22 Escherichia coli P11900MKKTLIALAVAASAAVSGSVMA
Flagellar P-ring proteinflgI20 Escherichia coli O1:K1 / APECA1A9X5MVIKFLSALILLLVTTAAQA
Protein transport protein hofQhofQ18 Escherichia coli (strain K12)P34749MKQWIAALLLMLIPGVQA
Outer-membrane lipoprotein carrier proteinlolA21 Escherichia coli (strain K12)P61316MKKIAITCALLSSLVASSVWA
Lipopolysaccharide export system protein lptAlptA27 Escherichia coli (strain K12)P0ADV1MKFKTNKLSLNLVLASSLLAASIPAFA
Maltose-binding periplasmic proteinmalE26 Escherichia coli (strain K12)P0AEX9MKIKTGARILALSALTTMMFSASALA
Penicillin-insensitive murein endopeptidasemepA19 Escherichia coli O157:H7Q8XCQ5MNKTAIALLALLASSVSLA
Nickel-binding periplasmic proteinnikA22 Escherichia coli (strain K12)P33590MLSTLRRTLFALLACASFIVHA
Cytochrome c-552nrfA26 Escherichia coli (strain K12P0ABK9MTRIKINARRIFSLLIPFFFFTSVHA
Outer membrane protein AompA21 Escherichia coli (strain K12P0A910MKKTAIAIAVALAGFATVAQA
Outer membrane protease ompPompP23 Escherichia coli (strain K12)P34210MQTKLLAIMLAAPVVFSSQEASA
Outer membrane protein WompW21 Escherichia coli (strain K12)P0A915MKKLTVAALAVTTLLSGSAFA
Fimbrial adapter papKpapK21 Escherichia coli P62532MIKSTGALLLFAALSAGQAIA
D-alanyl-D-alanine endopeptidasepbpG25 Escherichia coli (strain K12)P0AFI5MPKFRVSLFSLALMLAVPFAPQAVA
pectate lyase BPelB22Erwinia chrysanthemiP04959MKYLLPTAAAGLLLLAAQPAMA
Alkaline phosphatasephoA21 Escherichia coli (strain K12)P00634MKQSTIALALLPLLFTPVTKA
Outer membrane pore protein EphoE21 Escherichia coli (strain K12)P02932MKKSTLALVVMGIVASASVQA
Protein prsKprsK21 Escherichia coli P42191MIKSTGALLLFAALSAGQAMA
Phage shock protein EpspE19 Escherichia coli (strain K12)P23857MFKKGLLALALVFSLPVFA
Protease 3ptrA23 Escherichia coli (strain K12)P05458MPRSTWFKALLLLVALWAPLSQA
S-fimbrial adhesin proteinsfaS22 Escherichia coli O6:K15:H31P13430MKLKAIILATGLINCIAFSAQA
Taurine-binding periplasmic proteintauA22 Escherichia coli (strain K12)Q47537MAISSRNTLLAALAFIAFQAQA
Thiamine-binding periplasmic proteinthiB18 Escherichia coli (strain K12)P31550MLKKCLPLLLLCTAPVFA
Periplasmic protein torTtorT18 Escherichia coli (strain K12)P38683MRVLLFLLLSLFMLPAFS
Trimethylamine-N-oxide reductase 1TorA39 Escherichia coli (strain K12)P33225MNNNDLFQASRRRFLAQLGGLTVAGMLGPSLLTPRRATA
sn-glycerol-3-phosphate-binding periplasmic protein ugpBugpB23 Escherichia coli (strain K12)P0AG80MKPLHYTASALALGLALMGNAQA
D-xylose-binding periplasmic proteinxylF23 Escherichia coli (strain K12)P37387MKIKNILLTLCTSLLLTNVAAHA
Uncharacterized protein yfeKyfeK19 Escherichia coli (strain K12)Q47702MKKIICLVITLLMTLPVYA
UPF0379 protein yhcNyhcN22 Escherichia coli (strain K12)P64614MKIKTTVAALSVLSVLSFGAFA
Uncharacterized protein yncJyncJ22 Escherichia coli (strain K12)P64459MFTKALSVVLLTCALFSGQLMA
UPF0482 protein ynfBynfB28 Escherichia coli (strain K12)P76170MKITLSKRIGLLAILLPCALALSTTVHA
Zinc resistance-associated proteinzraP26 Escherichia coli (strain K12)P0AAA9MKRNTKIALVMMALSAMAMGSTSAFA
Beta-lactamaseampC19 Escherichia coli (strain K12)P00811MFKTTLCALLITASCSTFA
Heat-labile enterotoxin B chaineltB21 Escherichia coli P13811MNKVKFYVLFTALLSSLCAHG
Type-1 fimbrial protein, C chainpilC23 Escherichia coli P62605MKLKFISMAVFSALTLGVATNAS
Copper resistance protein BpcoB23 Escherichia coli Q47453MKRNLKAIPVLVAGLFTSQLSIA
Serine protease eatAeatA56 Escherichia coli Q84GK0MNKVFSLKYSFLAKGFIAVSELARRVSVKGKLKSASSIIISPITIAIVSYAPPSLA
Hemoglobin-binding protease hbpHBP52 Escherichia coli O88093MNRIYSLRYSAVARGFIAVSEFARKVHKSVRRLCFPVLLLIPVLFSAGSLA
Thiol:disulfide interchange protein dsbADsbA19 Escherichia coli (strain K12)POAEG4MKKIWLALAGLVLAFSASA
Human G.H.Hgh26 Homo sapiens P01241MATGSRTSLL LAFGLLCLPWLQEGSA
Outer membrane protein COmpC21 Escherichia coli (strain K12)P06996MKVKVLSLLVPALLVAGAANA
Heat-stable enterotoxin IISTII23 Escherichia coli P22542MKKNIAFLLASMFVFSIATNAYA
L-asparaginase 2ansB22 Escherichia coli (strain K12)P00805MEFFKKTALAALVMGFSGAALA
Chaperone protein sfmCsfmC23 Escherichia coli (strain K12)P77249MMTKIKLLMLIIFYLIISASAHA
Outer membrane protein Fompf22 Escherichia coli (strain K12)P02931MMKRNILAVIVPALLVAGTANA
Protease 7ompt20 Escherichia coli (strain K12)P09169MRAKLLGIVLTTPIAISSFA
Major outer membrane lipoproteinLPP20 Escherichia coli (strain K12)P69776MKATKLVLGAVILGSTLLAG
MaltoporinlamB25 Escherichia coli (strain K12)P02943MMITLRKLPLAVAVAAGVMSAQAMA
Beta-lactamase TEMbla23 Escherichia coli P62593MSIQHFRVALIPFFAAFCLPVFA
D-galactose-binding periplasmic proteinmglB23 Escherichia coli (strain K12)P0AEE5MNKKVLTLSAVMASMLFGAAAHA
Heat-stable enterotoxin ST-IA/ST-PSta119 Escherichia coli P01559MKKLMLAIFISVLSFPSFS
L-arabinose-binding periplasmic proteinaraF23 Escherichia coli (strain K12)P02924MHKFTKALAAIGLAAVMSQSAMA
Putative outer membrane porin proteinnmpc23 Escherichia coli (strain K12)P21420MKKLTVAISAVAASVLMAMSAQA
Peptidyl-prolyl cis-trans isomerase AppiA24 Escherichia coli (strain K12)P0AFL3MFKSTLAAMAAVFALSALSPAAMA
UPF0412 protein YaaIyaaI23 Escherichia coli (strain K12)P28696MKSVFTISASLAISLMLCCTAQA
Uncharacterized protein YhcFyhcF20 Escherichia coli (strain K12)P45422MNNVKLLIAGSAFFAMSAQA
Uncharacterized fimbrial-like protein YfcQyfcQ18 Escherichia coli (strain K12)P76500MRKTFLTLLCVSSAIAHA
Iron uptake system component EfeOEfeO26 Escherichia coli (strain K12)P0AB24MTINFRRNALQLSVAALFSSAFMANA
Glutamine-binding periplasmic proteinglnH22 Escherichia coli (strain K12)P0AEQ3MKSVLKVSLAALTLAFAVSSHA
Ribonuclease Irna23 Escherichia coli (strain K12)P21338MKAFWRNAALLAVSLLPFSSANA
Disulfide interchange protein DsbCDsbC20 Escherichia coli (strain K12)P0AEG6MKKGFMLFTLLAAFSGFAQA
D-ribose-binding periplasmic proteinrbsB25 Escherichia coli (strain K12)P02925MNMKKLATLVSAVALSATVSANAMA
Cyclic di-GMP-binding proteinbcsB25 Escherichia coli (strain K12)P37652MKRKLFWICAVAMGMSAFPSFMTQA
Threonine-rich inner membrane protein GfcAgfcA21 Escherichia coli (strain K12)P75885MKHKLSAILMAFMLTTPAAFA
Salivary acidic proline -rich phosphoproteinPRH122 Homo sapiens P81277MKVLRAWLLCLLMLGLALRGAA
Liver -expressed antimicrobial peptide2LEAP222 Homo sapiens Q969E1MWHLKLCAVLMIFLLLLGQIDG
Secreted protein C10orf99C10orf9924 Homo sapiens Q6UWK7MRLLVLSSLLCILLLCFSIFSTEG
Prolactin -releasing peptidePRLH22 Homo sapiens P81277MKVLRAWLLCLLMLGLALRGAA
Heparin sulfate proteoglycan core proteinHSPG221 Homo sapiens P98160MGWRAAGALLLALLLHGRLLA
Transforming growth factor beta -2TGFB219 Homo sapiens P61812MHYCVLSAFLILHLVTVAL
Serine protease inhibitor Kazal –type4SPINK426 Homo sapiens O60575MAVRQWVIALALAALLVVDREVPVAA
C -type natriuretic peptideNPPC23 Homo sapiens P23582MHLSQLLACALLLTLLSLRPSEA
Tuberoinfundibular peptide of 39 residuesPTH230 Homo sapiens Q96A98METRQVSRSPRVRLLLLLLLLLVVPWGVRT
Pro-neuropeptide YNPY28 Homo sapiens P01303MLGNKRLGLSGLTLALSLLVCLGALAEA
Interleukin -8CXCL820 Homo sapiens P10145MTSKLAVALLAAFLISAALC
Alpha -1-antitrypsinSERPINA124 Homo sapiens P01009MPSSVSWGILLLAGLCCLVPVSLA
Gastrin -releasing peptideGRP23 Homo sapiens P07492MRGSELPLVLLALVLCLAPRGRA
PlasminogenPLG19 Homo sapiens P00747MEHKEVVLLLLLFLKSGQG
Transforming growth factor beta -3TGFB320 Homo sapiens P10600MKMHLQRALVVLALLNFATV
Guanylate cyclase activator 2BGUCA2B26 Homo sapiens Q16661MGCRAASGLLPGVAVVLLLLLQSTQS

In silico prevision of signal peptide and prediction of h, c and n regions

SignalP software version 4.1 (http://www.cbs.dtu.dk/services/SignalP-4.1/) was used for the prediction of signal peptides and their sites of cleavage based on the combination of different artificial neural networks.11 SignalP online software version 3.0 was (http://www.cbs.dtu.dk/services/SignalP-3.0/) employed for predicting n, h, and c regions of signal peptides. For this purpose, signal peptides were added to the somatropin sequence and analyzed by the program.

Analysis of physicochemical features of signal peptides

The ProtParam program was used to evaluate the physicochemical features of the signal peptides including, theoretical pI, amino acid composition, negatively and positively charged amino acids, grand average of hydropathicity (GRAVY), instability index, aliphatic index, and molecular weight.

Analysis of protein solubility

SOLpro tool predicts the solubility of a protein upon expression in E. coli based on characteristics of primary sequences. Therefore, the SOLpro at http://scratch.proteomics.ics.uci.edu/, was used to determine the protein solubility in E. coli. SOLpro tool has a prediction accuracy of above 74%.

Prediction of protein localization

ProtComp B server, from Softberry, Inc (http://www.softberry.com), was applied for prediction of somatropin destination in connection with various signal peptides. It accomplishes this job using a composition of sequence homology and neural networks.12

Prediction of the type of signal peptides and cleavage probability

In prokaryotes, there are three types of signal peptides, including Sec pathway cleaved by either SPase I (Sec/SPI) or SPase II (Sec/SPII), and Tat pathway cleaved by Tat/SPI.13 SignalP5.0 server was used for discrimination of three types of signal peptides.14 SignalP 5.0 predicts the type of signal peptides based on a deep convolutional and recurrent neural network architecture.15 The cleavage probability was also determined by SignalP 5.0 program.


Results and Discussion

In silico prediction of signal peptide and determination of c, h, and n regions

SignalP 4.1 was applied for prediction of the most suitable signal peptide for somatropin, enabling its secretion into the periplasmic space in E. coli. SignalP 4.1 identifies a signal peptide based on a discriminating score, D-score. The output was tabulated in Table 2, containing five scores of D, C, S, Y, S-mean including cleavage sites and c, h and n regions of signal peptides.

Table 2. Signal peptide probability and c, h and n regions
Signal peptides n-region h-region c-region Cleavage site cleavage probability C-score Y- score S-score S-mean D-score
appA4127AFA0.98070.8010.7860.9380.8080.797
ccmH397ALA0.98060.7730.5680.6550.4720.532s
cexE487AIA0.9950.6910.5510.6650.5040.534
cysP996VQA0.9990.7570.7700.8960.8210.794
draA4107AHA0.9900.7170.8070.9710.9210.860
dsbD497VFA0.97050.8290.6050.6490.5030.567
dsbG497AFA0.90210.4170.4470.7120.5360.480
faeG5107AHA0.99210.7620.8140.9700.8910.851
fecB697AFA0.93540.6010.4240.5140.3550.398
fedA598AMA0.98440.7390.8150.9720.9110.860
FimF41a5117VMA0.98270.8730.8690.9780.8960.882
flgI597AQA0.96920.8240.8800.9810.9370.907
hofQ487VQA0.99380.6430.4740.4360.3570.430
lolA5107VWA0.99480.7150.6750.8740.7240.693
lptA1197AFA0.98400.8010.7110.9050.7530.726
malE8109ALA0.92700.7180.8100.9880.9240.863
mepA497SLA0.95000.7900.7260.8600.7170.722
nikA797VHA0.91550.7400.6040.7100.5630.589
nrfA10107VHA0.96110.5490.4080.5140.3690.394
ompA4107AQA0.98140.8000.8410.9680.8910.865
ompP6106ASA0.87650.6180.6490.8700.7400.692
ompW5107AFA0.99240.8080.8630.9660.9230.891
papK5107AIA0.94150.7210.6420.8370.6590.648
pbpG6127AVA0.95420.6810.7530.9850.8900.817
PelB6106AMA0.99050.7920.8750.9810.9490.910
phoA597TKA0.96480.4960.6130.8450.7220.688
phoE5106VQA0.98750.7610.8070.9480.8550.829
prsK5107AMA0.98050.8370.8540.9500.8810.867
pspE497VFA0.97430.8110.5930.6870.5140.564
ptrA897SQA0.97500.6990.5790.5820.5040.522
sfaS5117AQA0.95510.6950.7630.9610.8410.800
tauA797AQA0.94410.8320.8200.9470.8340.827
thiB486VFA0.96670.6110.7570.9620.9270.837
TorT396AFS0.83620.4350.4130.5930.4420.424
TorA18157ATA0.96280.2590.2110.2860.2020.208
ugpB7107AQA0.98610.8260.8210.9240.8300.825
xylF6117AHA0.94460.7260.8060.9730.9030.851
yfeK4106VYA0.98780.7110.4900.5710.3980.456
yhcN6107AFA0.97800.7140.5960.7930.6020.598
yncJ5117LMA0.87380.7980.8510.9620.9040.876
ynfB10127VHA0.97230.8190.6230.7890.5900.611
zraP7128AFA0.95350.7860.8380.9940.9290.881
ampC4106TAS-CS.0.62460.7880.8480.9420.9100.877
eltB697AHG0.63390.6470.7470.9540.8740.807
pilC5117TNA-SF.0.83090.1710.3920.9730.9090.635
pcoB7107SIA0.90630.3690.3780.5850.4490.404
eatA37137--0.2300.1660.3290.2860.210
HBP34127SLA0.60630.2430.1790.2620.1680.175
DsbA4106ASA-0.94190.5720.6160.8370.7170.654
Hgh7126GSA0.89900.2000.2370.5390.3180.267
OmpC5107ANA0.96480.8270.8630.9730.9180.889
STII5127AYA0.96040.8560.8560.9710.8920.873
ansB797ALA0.95870.8380.6440.7070.5550.611
sfmC7107AHA0.96010.8060.5950.5760.4390.537
ompf6107ANA0.9810.8390.8620.9460.9020.880
ompt597SFA0.92500.2930.3350.5380.4140.364
LPP695LLA-GF0.45980.1450.2140.5810.4720.309
lamB9107AMA0.85490.7850.8190.9810.8940.854
bla7107VFA0.92030.6240.4130.4650.3340.384
mglB5127AHA0.97170.7670.8340.9860.9230.876
Sta1497SFS0.87440.4920.6640.9390.8880.769
araF6117AMA0.9870.8040.8440.9580.8740.858
nmpc5127AQA0.98330.8350.8760.9810.9300.902
ppiA5137AMA0.95640.7850.8460.9890.9390.890
yaaI6117AQA0.76410.7210.8060.9570.9130.856
yhcF687AQA0.96360.7370.7480.8970.7770.761
yfcQ487AHA0.97900.7120.7830.9320.8540.816
EfeO9117ANA0.94500.5850.7050.9730.8750.785
glnH6107SHA0.97790.7400.8140.9650.9100.859
rna7107ANA0.97600.7840.8350.9750.9120.871
DsbC4107AQA0.98090.7640.8250.9710.8980.859
rbsB6128AMA0.67950.7980.8180.9790.8930.854
bcsB6119TQA0.89930.4550.6150.9850.8890.744
gfcA5107AFA0.98340.84410.8820.9850.9250.902
PRH16107RGA0.5420.1950.3240.6570.5530.409
LEAP26107LLG0.3240.1390.1650.3590.2810.208
C10orf994127IFS0.5930.2550.2980.4990.3840.329
PRLH6107RGA0.5420.1950.3240.6570.5530.409
HSPG25107LLA0.9860.3300.2690.3560.2460.260
TGFB23107PLS0.0490.1290.1860.4430.3550.248
SPINK46128-0.0510.1430.1900.4530.3220.239
NPPC6117SEA0.97910.3980.5660.8890.8040.678
PTH21597VRT0.54300.1560.1720.3540.3100.233
NPY7137AEA0.62350.5780.4650.5040.4130.446
CXCL85167ALC0.55000.3430.4200.8160.6030.488
SERPINA17117SLA0.84890.4020.2890.3950.2600.278
GRP6117GRA0.89030.2680.2420.3810.2440.243
PLG---GQG0.42770.2070.2390.4440.2460.242
TGFB371010-0.1470.1460.2240.6100.5190.333
GUCA2B5157TQS0.70210.3200.2490.3690.2700.257

Thirty-six signal peptides were deleted from further analysis because the D-scores of them were less than the cut off value of 0.570, indicating that they are not efficient for the secretion of somatropin protein.

Among the analyzed 90 signal peptides, four signal peptides, including pelB, flgl, nmpc, and, gfcA showed the highest D-score value of 0.910, 0.907, 0.902, and 0.902, respectively. Moreover, the results demonstrated that pelB and NPPC have the highest D-score in prokaryotic and eukaryotic expression systems, respectively. Additionally, the lowest scores belonged to HBP and LEAP2 (0.175, 0.208) in prokaryotic and eukaryotic expression systems, respectively.

Physico-chemical features of signal peptides

Several physicochemical features of 55 remaining signal peptides containing, theoretical pI length, molecular weight, net positive charge, grand average of hydropathicity (GRAVY), instability index and aliphatic index were evaluated by ProtParam server (Table 3). The results showed that the length of signal peptides was between 18 and 28 residues. The results of in silico analysis revealed that the highest molecular weight pertained to ynfB, bcsB, lptA, and efeO (2948.71, 2853.53, 2849.47, and 2845.33 daltons, respectively).

Table 3. The physicochemical characteristics of the signal peptides that were analyzed in the study.
Signal peptides Length M.W. (Da) P.I. Net positive charge GRAVY Aliphatic
index
Instability
(Separately)
Instability with hGH* Stability* Solubility
appA222384.998.50.91.405155.4553.1642.9u0.782
cysP252575.15102.11.064164.0011.1437.38S0.765
draA212135.63102.11.16298.1016.4938.41S0.885
faeG212027.47102.11.005112.3811.3637.90S0.883
fedA212231.76111.91.290102.3829.5539.70S0.869
FimF41a222090.57101.91.355124.5515.1538.17S0.863
flgI202116.678.50.91.935185.5010.6437.96S0.806
lolA212192.709.30.91.324139.5216.6738.43S0.764
lptA272849.4710.32.90.881130.3717.3237.91S0.831
malE262698.3411.12.91.012113.082.8536.27S0.879
mepA191887.318.50.91.479164.7432.0740.03u0.833
nikA222434.9910.30.91.350137.7360.4542.85u0.790
ompA212046.50101.91.295121.439.5237.72S0.857
ompP232406.885.71.90.904114.7844.4741.21u0.798
ompW212093.55101.91.210125.711.4436.92S0.824
papK212047.488.51.91.390140.00-2.6036.52S0.849
pbpG252705.36111.91.228117.2057.9942.81u0.800
PelB222228.788.30.91.191138.1841.4240.88u0.802
phoA212256.82100.90.971139.5256.0242.33u0.769
phoE212104.59100.91.195130.001.4436.92S0.834
prsK212065.528.50.91.267121.433.2737.10S0.859
sfaS222290.859.30.91.314146.825.4137.16S0.844
tauA222308.729.50.91.055120.4534.4140.16u0.824
thiB181974.608.80.91.589157.2265.6442.96u0.608
ugpB232342.808.30.90.622110.8718.0138.37S0.844
xylF232482.089.30.91.083161.3033.6140.04u0.781
yhcN222254.76100.91.418128.64-2.0336.39S0.764
yncJ222344.917.90.91.541128.6415.1538.17S0.795
ynfB282948.71100.91.239163.9329.3239.35S0.774
zraP262733.3711.10.90.74679.2328.7539.37S0.834
ampC192022.467.80.91.34297.8925.2239.41u0.783
eltB212342.849.10.90.890111.4331.1039.86S0.803
pilC232400.92100.91.104110.431.0136.54S0.794
DsbA191990.48100.91.416144.2111.5038.17S0.842
OmpC212078.63100.91.552171.9014.3738.20S0.797
STII232552.099.71.91.026102.1732.4339.92S0.861
ansB222274.768.31.91.13693.64-1.1536.48S0.846
ompF222266.83111.91.259150.9167.1843.54u0.876
sta1192159.72101.91.368123.1625.2839.41S0.841
lamB252545.22111.91.332125.2042.9741.07u0.889
mglB232362.89101.90.952102.1714.1537.95S0.865
araF232348.87101.90.87893.9196.7146.83u0.876
nmpc232292.84101.91.243119.1330.3439.69S0.883
ppiA242371.908.51.91.43898.3339.9440.72u0.841
yaaI232389.937.81.91.365114.7823.7438.98S0.842
yhcF202084.488.51.90.91598.0025.7939.39S0.860
yfcQ181962.409.51.91.006119.4413.9138.50S0.792
efeO262845.33121.90.65494.2354.2042.42u0.865
glnH222244.72101.91.209133.1810.5837.70S0.846
rna232478.94111.90.757106.5240.0540.74u0.809
DsbC202179.67101.91.00078.505.2537.45S0.836
rbsB252494.02101.90.948109.6011.1437.38S0.879
bcsB252853.53101.90.68858.8048.0641.66u0.874
gfcA212293.87101.91.01998.1040.9840.83u0.842
NPPC232494.056.51.91.07165.6595.4446.69u0.737

*S = Stable, U = Unstable

*The proteins whose instability index was higher than 40 were predicted as unstable and the values under 40 might be stable.

The most hgigh GRAVY values were belonged to signal peptides flgI, thiB, OmpC and yncJ (1.935, 1.589, 1.552, and 1.541, respectively). The highest aliphatic index scores belonged to flgl, ompC, NPPC, mepA, and cysP (185.50, 171.90, 165.65, 164.74, and 164.00, respectively)

Another evaluated physicochemical feature of signal peptides was the instability index. The results demonstrated that papK, yhcN, ansB, and pilC (-2.60, -2.03, -1.15, and 1.01, respectively) were the most stable signal peptides, separately and in connection with somatropin. The proteins whose instability index was higher than 40 were predicted as unstable, and the values under 40 might be stable.

Prediction of protein solubility

The results of somatropin solubility in fusion with various signal peptides have shown in Table 3. The results demonstrated that the highest solubility were belonged to lamb, draA, faeG, nmpc, rbsB, and malE signal peptides (0.889, 0.885, 0.883, 0.883, 0.879, and 0.879, respectively).

Prediction of the protein localization

The analysis results for sub-cellular localization by ProtCompB server indicated that the final localization sites were the outer membrane, inner membrane, and periplasmic space for 13, 15, and 18 signal peptides, respectively. Furthermore, analysis for the final localization of somatropin with signal peptides faeG, FimF41a, ompA, papK, prsK, lamb, nmpc, bcsB, and gfcA revealed that somatropin could be secreted by these signal peptides (Table 4).

Table 4. Analysis of secretion pathways and final localization of human somatropin mediated by different signal peptides
Signal peptides Secretion pathway Reliability score Cytoplasmic Membrane Secreted Periplasmic Final prediction site
appASec/SPI0.99251.684.700.003.62Inner membrane
cysPSec/SPI0.97951.426.260.002.33Outer Membrane
draASec/SPI0.99840.864.740.483.92Outer Membrane
faeGSec/SPI0.99840.531.755.032.69Extracellular
fedASec/SPI0.99630.327.132.550.00Inner Membrane
FimF41aSec/SPI0.99630.002.406.311.29Extracellular
flgISec/SPI0.98921.095.840.003.07Inner Membrane
lolASec/SPI0.99750.432.340.007.23periplasmic
lptASec/SPI0.98460.556.030.003.42Outer Membrane
malESec/SPI0.99090.713.440.005.85Periplasmic
mepASec/SPI0.99250.587.140.002.29Outer Membrane
nikASec/SPI0.90010.85.470.003.73Inner membrane
ompASec/SPI0.99770.131.075.213.58Extracellular
ompPSec/SPI0.98341.767.820.000.42Outer membrane
ompWSec/SPI0.99650.006.162.121.72Outer Membrane
papKSec/SPI0.9780.111.837.410.65Extracellular
pbpGSec/SPI0.98440.642.430.006.93Periplasmic
PelBSec/SPI0.99671.291.423.333.96Periplasmic
phoASec/SPI0.99241.157.680.001.17Inner membrane
phoESec/SPI0.99730.288.630.430.66Inner Membrane
prsKSec/SPI0.99290.002.136.211.66Extracellular
sfaSSec/SPI0.98311.523.490.004.99Periplasmic
tauASec/SPI0.90960.745.500.003.75Outer Membrane
thiBSec/SPI0.98670.802.850.006.35Periplasmic
ugpBSec/SPI0.9950.553.170.006.29Periplasmic
xylFSec/SPI0.99691.403.810.004.80periplasmic
yhcNSec/SPI0.98960.268.201.540.00Inner membrane
yncJSec/SPI0.90781.217.340.001.45Inner membrane
ynfBSec/SPI0.98810.002.650.986.37periplasmic
zraPSec/SPI0.99310.572.460.006.97Periplasmic
ampCSec/SPII0.62430.932.630.006.39Periplasmic
eltBSec/SPI0.73370.977.600.001.43Outer membrane
pilCSec/SPI0.95450.998.630.290.10Outer membrane
DsbASec/SPI0.98750.008.440.680.89Inner membrane
OmpCSec/SPI0.98740.336.551.581.54Inner membrane
STIISec/SPI0.99530.118.421.470.00Outer membrane
ansBSec/SPI0.96410.606.460.002.94Inner membrane
ompFSec/SPI0.98960.628.190.740.45Inner membrane
sta1Sec/SPI0.96720.089.510.410.00Inner membrane
lamBSec/SPI0.98650.323.713.882.09Secreted
mglBSec/SPI0.99710.805.630.003.57Inner membrane
araFSec/SPI0.99410.223.730.006.05Periplasmic
nmpcSec/SPI0.99640.000.967.841.20Secreted
ppiASec/SPI0.99340.545.450.004.01Outer membrane
yaaISec/SPI0.780.184.432.802.59Inner membrane
yhcFSec/SPI0.98010.868.130.001.01Outer membrane
yfcQSec/SPI0.99561.587.040.371.01Inner membrane
efeOTAT0.53770.250.490.009.26Periplasmic
glnHSec/SPI0.99590.183.970.005.85Periplasmic
rnaSec/SPI0.99140.758.880.370.00Outer membrane
DsbcSec/SPI0.99550.465.800.003.75Inner membrane
rbsBSec/SPI0.99690.002.612.764.63Periplasmic
bcsBSec/SPI0.97930.022.287.170.53Secreted
gfcASec/SPI0.99590.192.216.760.85Secreted
NPPCSec/SPI0.98771.337.540.001.12Inner membrane

Prediction of cleavage probability and the type of signal peptides

The remaining 55 signal peptides were examined for their secretory pathway(s) by using signal P5.0 software. The results showed that except efeO (TAT pathway) and ampC (sec/SPII), all of these signal peptides were specific for the Sec/SPI pathway (Table 4). The cleavage probability of each signal peptides was tabulated in Table 2.

Selection of appropriate signal peptide

First, the signal peptides with final localization in periplasmic space was selected and sorted according to the aliphatic index. Then, the stability and solubility of target protein in connection with the selected signals was examined. The signal peptides with which somatropin remained stable and soluble were selected as the appropriate peptide signal (Table 5).

Table 5. Characteristics of most efficient signal peptides for periplasmic expression of human somatropin based on their determinant features
Signal peptides Aliphatic index Gravy D-score Stability Solubility
ynfB163.931.2390.61139.350.774
xylF161.301.0830.85140.040.781
thiB157.221.5890.83742.960.608
sfaS146.821.3140.80037.160.844
lolA139.521.3240.69338.430.764
PelB138.181.1910.91040.880.802
glnH133.181.2090.85937.700.846
pbpG117.201.2280.81742.810.800
malE113.081.0120.86336.270.879
ugpB110.870.6220.82538.370.844
rbsB109.600.9480.85437.380.879
ampC97.891.3420.87739.410.783
efeO94.230.6540.78542.420.865
araF93.910.8780.85846.830.876
zraP79.230.7460.88139.370.834

E. coli is the economical and straightforward host for the expression of recombinant proteins.16 However, overexpression of recombinant proteins in the intracellular space of E. coli is usually associated with insoluble aggregate and inclusion body formation. To keep appropriate folding, the proteins should be avoided from the reductive environment of the cytoplasm. Hence, the secretory expression has several advantages for the production of recombinant proteins, compared with cytosolic systems.

The secretion of the target protein requires transporting across the cytoplasmic membrane. In bacteria, Sec, SRP, and TAT are three major protein secretion pathways for the carriage of proteins through the plasma membrane. These protein transport systems depend on the presence of suitable signal peptides on proteins. Signal peptides are short amino terminal peptides that affect the biosynthesis, folding, and stability of the corresponding target proteins.17 Although various signal peptides differ in their sequences, they share conserved physicochemical properties, including aliphatic index, molecular weight, instability index, Gravy, net positive charge, and theoretical pI. The three important regions of signal peptides include an amino terminal positively-charged region, a hydrophobic central region, and a carboxyl-terminal polar region that contains the cleavage site (a conserved A-X-A motif). It has demonstrated that the n region in the signal peptide has an essential role in the primary phase of protein secretion across the membranes.18 Also the n-region responsible for the net positive charge of the signal peptide. In addition, the presence of the basic residues in this region may be indispensable for the performance of an efficient signal peptide.19

Further to the charge of the n-region, the c-region has an intense effect on the performance of membrane transport by both the Tat and Sec pathways. The third region of signal peptides that can affect the secretion output is the hydrophobic helical H region of the signal peptides. Also, the central h-region of signal peptides are important because the length and hydrophobic density of h-region intensify the hydrophobicity levels and facilitate the protein secretion.19,20

In the present study, the physicochemical features of the 90 signal peptides were analyzed for secretory expression of somatropin in E. coli.

As shown in Table 3, flgI, OmpC, NPPC, mepA, and cysP showed the highest hydrophobicity levels (185.50, 171.90, 165.65, 164.74 and 164.00, respectively) among the studied signal peptides whereas, the signal peptides, bcsB, DsbC, zraP, ansB, and araF showed the lowest hydrophobicity (58.80, 78.50, 79.23, 93.64, and 93.91, respectively). Previous studies reported that OmpC has the highest aliphatic index, which is in agreement with our results.21

Analysis for secretory pathway revealed that all 55 Signal peptides (except efeO) are specific for the Sec pathway with reliability scores of more than 0.9 (Table 4). Therefore, our findings were consistent with some previous reports.9,22 Sec exportome polypeptides have a cleavable, Sec-specific, n-terminal signal peptides that translocates proteins across the inner membrane (I.M.) in an unfolded state.23,24

There are two methods for selecting a signal peptide for any given protein, including experimental / trial and error method, and in silico analysis method. The advantages of using a bioinformatics program before starting an experimental study are increasing the precision and validity and reducing experimental research expenses.

In this study, online bioinformatic tools were used to find suitable signal peptides for periplasmic expression of recombinant somatropin in E. coli. Different signal peptides, including 17 eukaryotic and 73 prokaryotic signal peptides, were evaluated. The D-score parameter was used to determine an appropriate signal peptides. D-score is also used to sort signal peptides in the first step. According to the D-scores (Table 2), 55 out of 90 selected signal peptides were identified as signal peptides for somatropin. Data were sorted based on the priority of D-scores, final localization, h-region length, aliphatic index, Gravy, and solubility, respectively (Table 5). According to this sorting, pelB, flgl, nmpC, GfcA, OmpW, PpiA, and OmpC showed the highest D-score. However, pelB and OmpC showed the highest D-score in other bioinformatics studies.21 The results of analysis revealed that somatropin in connection with 34 signal peptides was stable and directed toward the Sec pathway, 9 signal peptides mediated the secretion, and 15 signal peptide translocated the somatropin into the periplasmic space.

Zamani et al analyzed the secretion of somatropin by L-asparaginase II signal sequence and reported that successful secretion of somatropin is not achieved using the L-asparaginase II signal sequence.22

The expression of somatropin with the NPR, STII and DsbA signal peptides using RRI as the host cell, showed that the DsbA was the most effective signal peptide for somatropin gene with 80% higher expression level compared to the reference vector.23

Previous studies25 demonstrated the high secretion of somatropin with phoA signal peptide, but in our research, phoA was not the right candidate due to lower D-sore (0.688) and final localization in the inner membrane.

This study evaluated 90 different signal peptide to find the most applicable signal peptide for secreting the recombinant somatropin protein in the E. coli. The results of the present study showed that ynfB, sfaS, lolA, glnH, and malE has all the features needed to be selected as suitable signal peptides for somatropin protein


Conclusion

In this research, various signal peptides were appraised for the periplasmic expression of somatropin in E. coli. The selection was based on the combination of hydrophobicity, D score, solubility, stability, and the final localization.

The results indicated that specific signal peptides, including ynfB, sfaS, lolA, glnH, and malE have the highest scores and could be used for soluble periplasmic expression of somatropin in E. coli. However, the proof of these results should be verified by an experimental study.


Acknowledgments

This study was supported by National Institute for Medical Research Development(NIMAD)grant no. 958751.


Competing Interests

The authors have no conflict of interest to declare.


Ethical Approval

This research was approved by Iran National Committee for ethics in Biomedical Research (958751).


References

  1. Ghasemi R, Hashemzadeh ‎ H, Razavi ‎ H, Yakhchali B. Production of recombinant human growth hormone and future‎ challenges. Modares J Biotechnol 2018;9(1):79-92. [Persian].
  2. Biological effects of growth hormone and its antagonist. Trends Mol Med 2001; 7(3):126-32. doi: 10.1016/s1471-4914(01)01933-5 [Crossref]
  3. Overview of bacterial expression systems for heterologous protein production: from molecular and biochemical fundamentals to commercial systems. Appl Microbiol Biotechnol 2006; 72(2):211-22. doi: 10.1007/s00253-006-0465-8 [Crossref]
  4. Wanted: more monitoring and control during inclusion body processing. World J Microbiol Biotechnol 2018; 34(11):158. doi: 10.1007/s11274-018-2541-5 [Crossref]
  5. Bacterial inclusion bodies: discovering their better half. Trends Biochem Sci 2017; 42(9):726-37. doi: 10.1016/j.tibs.2017.01.005 [Crossref]
  6. Protein export through the bacterial Sec pathway. Nat Rev Microbiol 2017; 15(1):21-36. doi: 10.1038/nrmicro.2016.161 [Crossref]
  7. A comprehensive review of signal peptides: structure, roles, and applications. Eur J Cell Biol 2018; 97(6):422-41. doi: 10.1016/j.ejcb.2018.06.003 [Crossref]
  8. Overcoming the refractory expression of secreted recombinant proteins in mammalian cells through modification of the signal peptide and adjacent amino acids. PLoS One 2016; 11(5):e0155340. doi: 10.1371/journal.pone.0155340 [Crossref]
  9. In silico analysis of suitable signal peptides for secretion of a recombinant alcohol dehydrogenase with a key role in atorvastatin enzymatic synthesis. Mol Biol Res Commun 2019; 8(1):17-26. doi: 10.22099/mbrc.2019.31801.1372 [Crossref]
  10. Recombinant protein secretion in Escherichia coli. Biotechnol Adv 2005; 23(3):177-202. doi: 10.1016/j.biotechadv.2004.11.003 [Crossref]
  11. Predicting secretory proteins with SignalP. Methods Mol Biol 2017; 1611:59-73. doi: 10.1007/978-1-4939-7015-5_6 [Crossref]
  12. Evaluating eukaryotic secreted protein prediction. BMC Bioinformatics 2005; 6:256. doi: 10.1186/1471-2105-6-256 [Crossref]
  13. Membrane proteases in the bacterial protein secretion and quality control pathway. Microbiol Mol Biol Rev 2012; 76(2):311-30. doi: 10.1128/mmbr.05019-11 [Crossref]
  14. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat Biotechnol 2019; 37(4):420-3. doi: 10.1038/s41587-019-0036-z [Crossref]
  15. A brief history of protein sorting prediction. Protein J 2019; 38(3):200-16. doi: 10.1007/s10930-019-09838-3 [Crossref]
  16. Recombinant expression systems in the pharmaceutical industry. Appl Microbiol Biotechnol 2004; 65(4):363-72. doi: 10.1007/s00253-004-1656-9 [Crossref]
  17. Signal peptides for recombinant protein secretion in bacterial expression systems. Microb Cell Fact 2018; 17(1):52. doi: 10.1186/s12934-018-0901-3 [Crossref]
  18. Role of positive charge on the amino-terminal region of the signal peptide in protein secretion across the membrane. Proc Natl Acad Sci U S A 1982; 79(11):3438-41. doi: 10.1073/pnas.79.11.3438 [Crossref]
  19. Optimisation of signal peptide for recombinant protein secretion in bacterial hosts. Appl Microbiol Biotechnol 2013; 97(9):3811-26. doi: 10.1007/s00253-013-4831-z [Crossref]
  20. Competition between functional signal peptides demonstrates variation in affinity for the secretion pathway. J Bacteriol 1996; 178(23):6658-64. doi: 10.1128/jb.178.23.6658-6664.1996 [Crossref]
  21. In silico evaluation of different signal peptides for the secretory production of human growth hormone in E. coli. Int J Pept Res Ther 2015; 21(3):261-8. doi: 10.1007/s10989-015-9454-z [Crossref]
  22. Evaluation of recombinant human growth hormone secretion in E. coli using the L-asparaginase II signal peptide. Avicenna J Med Biotechnol 2016; 8(4):182-7.
  23. Periplasmic expression of human growth hormone via plasmid vectors containing the lambdaPL promoter: use of HPLC for product quantification. Protein Eng 2003; 16(12):1131-8. doi: 10.1093/protein/gzg114 [Crossref]
  24. Protein export through the bacterial Sec pathway. Nat Rev Microbiol 2017; 15(1):21-36. doi: 10.1038/nrmicro.2016.161 [Crossref]
  25. Periplasmic production of correctly processed human growth hormone in Escherichia coli: natural and bacterial signal sequences are interchangeable. Gene 1985; 39(2-3):247-54. doi: 10.1016/0378-1119(85)90319-1 [Crossref]
Submitted: 15 Nov 2021
Revised: 20 Jan 2022
Accepted: 31 Mar 2022
First published online: 04 Apr 2022
EndNote EndNote

(Enw Format - Win & Mac)

BibTeX BibTeX

(Bib Format - Win & Mac)

Bookends Bookends

(Ris Format - Mac only)

EasyBib EasyBib

(Ris Format - Win & Mac)

Medlars Medlars

(Txt Format - Win & Mac)

Mendeley Web Mendeley Web
Mendeley Mendeley

(Ris Format - Win & Mac)

Papers Papers

(Ris Format - Win & Mac)

ProCite ProCite

(Ris Format - Win & Mac)

Reference Manager Reference Manager

(Ris Format - Win only)

Refworks Refworks

(Refworks Format - Win & Mac)

Zotero Zotero

(Ris Format - FireFox Plugin)

Abstract View: 705
PDF Download: 322
Full Text View: 79