An effective allatom potential for proteins
 Anders Irbäck^{1}Email author,
 Simon Mitternacht^{1} and
 Sandipan Mohanty^{2}
DOI: 10.1186/1757503622
© Irbäck et al 2009
Received: 27 January 2009
Accepted: 08 April 2009
Published: 08 April 2009
Abstract
We describe and test an implicit solvent allatom potential for simulations of protein folding and aggregation. The potential is developed through studies of structural and thermodynamic properties of 17 peptides with diverse secondary structure. Results obtained using the final form of the potential are presented for all these peptides. The same model, with unchanged parameters, is furthermore applied to a heterodimeric coiledcoil system, a mixed α/β protein and a threehelixbundle protein, with very good results. The computational efficiency of the potential makes it possible to investigate the freeenergy landscape of these 49–67residue systems with high statistical accuracy, using only modest computational resources by today's standards.
PACS Codes: 87.14.E, 87.15.A, 87.15.Cc
1 Introduction
A molecular understanding of living systems requires modeling of the dynamics and interactions of proteins. The relevant dynamics of a protein may amount to small fluctuations about its native structure, or reorientations of its ordered parts relative to each other. In either case, a tiny fraction of the conformational space is explored. For flexible proteins, perhaps with large intrinsically disordered parts [1, 2], the situation is different. When studying such proteins or conformational conversion processes like folding or amyloid aggregation, the competition between different minima on the freeenergy landscape inevitably comes into focus. Studying these systems by computer simulation is a challenge, because proper sampling of all relevant freeenergy minima must be ensured. This goal is very hard to achieve if explicit solvent molecules are included in the simulations. The use of coarsegrained models can alleviate this problem, but makes important geometric properties like secondary structure formation more difficult to describe.
Here we present an implicit solvent allatom protein model especially aimed at problems requiring exploration of the global freeenergy landscape. It is based on a computationally convenient effective potential, with parameters determined through fullscale thermodynamic simulations of a set of experimentally well characterized peptides. Central to the approach is the use of a single set of model parameters, independent of the protein studied. This constraint is a simple but efficient way to avoid unphysical biases, for example, toward either αhelical or βsheet structure [3, 4]. Imposing this constraint is also a way to enable systematic refinement of the potential.
Amino acid sequences
System  PDB code  Sequence 

Trpcage  1L2Y  NLYIQ WLKDG GPSSG RPPPS 
E6apn1  1RIJ  AcALQEL LGQWL KDGGP SSGRP PPSNH_{2} 
C  AcKETAA AKFER AHANH_{2}  
EK  AcYAEAA KAAEA AKAFNH_{2}  
F_{s}  SucAAAAA AAARA AAARA AAARA ANH_{2}  
GCN4tp  2OVN  NYHLE NEVAR LKKLV GE 
HPLC6  1WFA  DTASD AAAAA ALTAA NAKAA AELTA ANAAA AAAAT ARNH_{2} 
Chignolin  1UAO  GYDPE TGTWG 
MBH12  1J4M  RGKWT YNGIT YEGR 
GB1p  GEWTY DDATK TFTVT E  
GB1m2  GEWTY NPATG KFTVT E  
GB1m3  KKWTY NPATG KFTVQ E  
trpzip1  1LE0  SWTWE GNKWT WKNH_{2} 
trpzip2  1LE1  SWTWE NGKWT WKNH_{2} 
betanova  RGWSV QNGKY TNNGK TTEGR  
LLM  RGWSL QNGKY TLNGK TMEGR  
beta3s  TWIQN GSTKW YQNGS TKIYT  
AB zipper  1U2U  AcEVAQL EKEVA QLEAE NYQLE QEVAQ LEHEGNH_{2} 
AcEVQAL KKRVQ ALKAR NYALK QKVQA LRHKGNH_{2}  
Top7CFR  2GJH  ERVRI SITAR TKKEA EKFAA ILIKV FAELG YNDIN VTWDG DTVTV EGQL 
GSα_{3} W  1LQ7  GSRVK ALEEK VKALE EKVKA LGGGG RIEEL KKKWE ELKKK IEELG GGGEV KKVEE EVKKL EEEIK KL 
Whether or not this potential, calibrated using data on peptides with typically ~20 residues, will be useful for larger systems is not obvious. Therefore, we also apply our potential, with unchanged parameters, to three larger systems with different geometries. These systems are the mixed α/β protein Top7CFr, a threehelixbundle protein with 67 residues, and a heterodimeric leucine zipper composed of two 30residue chains.
Protein folding simulations are by necessity based on potentials whose terms are interdependent and dependent on the choice of geometric representation. Therefore, we choose to calibrate our potential directly against folding properties of whole chains. To make this feasible, we deliberately omit many details included in force fields like Amber, CHARMM and OPLS (for a review, see [15]). With this approach, we might lose details of a given freeenergy minimum, but, by construction, we optimize the balance between competing minima.
Two potentials somewhat similar in form to ours are the μpotential of the Shakhnovich group [16] and the PFF potential of the Wenzel group [17]. These groups also consider properties of entire chains for calibration, but use folded PDB structures or sets of decoys rather than fullscale thermodynamic simulations. Our admittedly timeconsuming procedure implies that our model is trained on completely general structures, which might be an advantage when studying the dynamics of folding. Another potential with similarities to ours is that developed by the Dokholyan group for discrete molecular dynamics simulations [18].
2 Methods
Our model belongs to the class of implicit solvent allatom models with torsional degrees of freedom. All geometrical parameters, like bond lengths and bond angles, are as described earlier [5].
The first term, E_{loc}, contains local interactions between atoms separated by only a few covalent bonds. The other three terms are nonlocal in character: E_{ev} represents excludedvolume effects, E_{hb} is a hydrogenbond potential, and E_{sc} contains residuespecific interactions between pairs of sidechains. Next we describe the precise form of these four terms. Energy parameters are given in a unit called eu. The factor for conversion from eu to kcal/mol will be determined in the next section, by calibration against the experimental melting temperature for one of the peptides studied, the Trpcage.
2.1 Local potential
The local potential can be divided into two backbone terms, and , and one sidechain term, . In describing the potential, the concept of a peptide unit is useful. A peptide unit consists of the backbone C'O group of one residue and the backbone NH group of the next residue.

The potential represents interactions between partial charges of neighboring peptide units along the chain. It is given by(2)
where the outer sum runs over all pairs of nearestneighbor peptide units and each of the two inner sums runs over atoms in one peptide unit (if the N side of the peptide unit is proline the sum runs over only C' and O). The partial charge q_{ i }is taken as ± 0.42 for C' and O atoms and ± 0.20 for H and N atoms. The parameter is set to 6 eu, corresponding to a dielectric constant of ϵ_{r} ≈ 41. Two peptide units that are not nearest neighbors along the chain interact through hydrogen bonding (see below) rather than through the potential .

The term provides an additional OO and HH repulsion for neighboring peptide units, unless the residue flanked by the two peptide units is a glycine. This repulsion is added to make doubling of hydrogen bonds less likely. Glycine has markedly different backbone energetics compared to other residues. The lack of C_{ β }atom makes glycine more flexible. However, the observed distribution of Ramachandran ϕ, ψ angles for glycine in PDB structures [19] is not as broad as simple steric considerations would suggest. provides an energy penalty for glycine ψ values around ± 120^{°}, which are sterically allowed but relatively rare in PDB structures.
The full expression for is(3)where = 1.2 eu, = 0.15 eu, I is a residue index, and(4)(5)(6)The function f(u_{ I }) is positive if the H_{ I }H_{I+1 }distance, d(H_{ I }, H_{I+1}), is smaller than both of the H_{ I }N_{I+1 }and N_{ I }H_{I+1 }distances, and zero otherwise. This term thus provides an energy penalty when H_{ I }and H_{I+1 }are exposed to each other (it is omitted if residue I or I + 1 is a proline). Similarly, f(v_{ I }) is positive when O_{ I }and O_{I+1 }are exposed to each other.

is an explicit torsion angle potential for sidechain angles, χ_{ i }. Many sidechain angles display distributions resembling what one would expect based on simple steric considerations. The use of the torsion potential is particularly relevant for χ_{2} in asparagine and aspartic acid and χ_{3} in glutamine and glutamic acid. The torsion potential is defined as(7)
where and n_{ i }are constants. Each sidechain angle χ_{ i }belongs to one of four classes associated with different values of and n_{ i }(see Table 2).
Classification of sidechain angles, χ_{ i }
Residue  χ _{1}  χ _{2}  χ _{3}  χ _{4} 

Ser, Cys, Thr, Val  I  
Ile, Leu  I  I  
Asp, Asn  I  IV  
His, Phe, Tyr, Trp  I  III  
Met  I  I  II  
Glu, Gln  I  I  IV  
Lys  I  I  I  I 
Arg  I  I  I  III 
2.2 Excluded volume
where the summation is over all pairs of atoms with a nonconstant separation, κ_{ev} = 0.10 eu, and σ_{ i }= 1.77, 1.75, 1.53, 1.42 and 1.00 Å for S, C, N, O and H atoms, respectively. The parameter λ_{ ij }is unity for pairs connected by three covalent bonds and λ_{ ij }= 0.75 for all other pairs. To speed up the calculations, E_{ev} is evaluated using a cutoff of 4.3 λ_{ ij }Å.
2.3 Hydrogen bonding
Our potential contains an explicit hydrogenbond term, E_{hb}. All hydrogen bonds in the model are between NH and CO groups. They connect either two backbone groups or a charged sidechain (aspartic acid, glutamic acid, lysine, arginine) with a backbone group. Two neighboring peptide units, which interact through the local potential (see above), are not allowed to hydrogen bond with each other.
where σ_{hb} = 2.0 Å. A 4.5 Å cutoff is used for u(r).
2.4 Sidechain potential
Here the sums run over residue pairs IJ, and are contact measures that take values between 0 and 1, and and are energy parameters.
The parameter m_{ I }of the hydrophobicity potential E_{hp}
Residue  m_{ I }(eu) 

Arg  0.3 
Met, Lys  0.4 
Val  0.6 
Ile, Leu, Pro  0.8 
Tyr  1.1 
Phe, Trp  1.6 
The residues taken as charged are aspartic acid, glutamic acid, lysine and arginine. The chargecharge contact energy is – = 1.5s_{ I }s_{ J }eu, where s_{ I }and s_{ J }are the signs of the charges (± 1).
Atoms used in the calculation of the contact measure
Residue  Set of atoms (A_{ I }) 

Pro  C_{ β }, C_{ γ }, C_{ δ } 
Tyr  C_{ γ }, C_{δ1}, C_{δ2}, C_{ϵ1}, C_{ϵ2}, C_{ ζ } 
Val  C_{ β }, C_{γ1}, C_{γ2} 
Ile  C_{ β }, C_{γ1}, C_{γ2}, C_{ δ } 
Leu  C_{ β }, C_{ γ }, C_{δ1}, C_{δ2} 
Met  C_{ β }, C_{ γ }, S_{ δ }, C_{ ϵ } 
Phe  C_{ γ }, C_{δ1}, C_{δ2}, C_{ϵ1}, C_{ϵ2}, C_{ ζ } 
Trp  C_{ γ }, C_{δ1}, C_{δ2}, C_{ϵ3}, C_{ζ3}, C_{η2} 
Arg  C_{ β }, C_{ γ } 
Lys  C_{ β }, C_{ γ }, C_{ δ } 
where γ_{ IJ }is either 1 or 0.75. For γ_{ IJ }= 1, is, roughly speaking, the fraction of atoms in and that are in contact with some atom from the other of the two sets. A reduction to γ_{ IJ }= 0.75 makes it easier to achieve a full contact ( = 1). The value γ_{ IJ }= 0.75 is used for interactions within the group proline, phenylalanine, tyrosine and tryptophan, to make facetoface stacking of these sidechains less likely. It is also used within the group isoleucine, leucine and valine, because a full contact is otherwise hard to achieve for these pairs. In all other cases, γ_{ IJ }is unity.
Atoms used in the calculation of the contact measure
Residue  Set of atoms (A_{ I }) 

Arg  N_{ ϵ }, C_{ ζ }, N_{η1}, N_{η2} 
Lys  ^{1}H_{ ζ }, ^{2}H_{ ζ }, ^{3}H_{ ζ } 
Asp  O_{δ1}, O_{δ2} 
Glu  O_{ϵ1}, O_{ϵ2} 
2.5 Chain ends
Some of the sequences we study have extra groups attached at one or both ends of the chain. The groups occurring are Nterminal acetyl and succinylic acid, and Cterminal NH_{2}. When such a unit is present, the model assumes polar NH and CO groups beyond the last C_{ α }atom to hydrogen bond like backbone NH/CO groups but with the strength reduced by a factor 2 (multiplicatively). The charged group of succinylic acid interacts like a charged sidechain.
In the absence of end groups, the model assumes the N and C termini to be positively and negatively charged, respectively, and to interact like charged sidechains.
2.6 Monte Carlo details
Algorithm used and total number of elementary MC steps for all systems studied
System  Method  MC steps 

Trpcage, E6apn1  ST  10 × 1.0 × 10^{9} 
C, EK, F_{s}, GCN4tp  ST  10 × 1.0 × 10^{9} 
HPLC6  ST  10 × 3.0 × 10^{9} 
Chignolin  ST  10 × 0.5 × 10^{9} 
MBH12  ST  10 × 1.0 × 10^{9} 
GB1p  ST  10 × 2.0 × 10^{9} 
GB1m2, GB1m3  ST  10 × 1.0 × 10^{9} 
Trpzip1, trpzip2  ST  10 × 1.0 × 10^{9} 
betanova, LLM  ST  10 × 1.0 × 10^{9} 
beta3s  ST  10 × 2.0 × 10^{9} 
AB zipper  PT  64 × 3.0 × 10^{9} 
Top7CFR  PT  64 × 2.4 × 10^{9} 
GSα_{3} W  PT  64 × 3.5 × 10^{9} 
Three different conformational updates are used in the simulations: single variable updates of sidechain and backbone angles, respectively, and Biased Gaussian Steps (BGS) [27]. The BGS move is semilocal and updates up to eight consecutive backbone degrees of freedom in a manner that keeps the ends of the segment approximately fixed. The ratio of sidechain to backbone updates is the same at all temperatures, whereas the relative frequency of the two backbone updates depends on the temperature. At high temperatures the single variable update is the only backbone update used, and at low temperatures only BGS is used. At intermediate temperatures both updates are used.
The AB zipper, a twochain system, is studied using a periodic box of size (158 Å)^{3}. In addition to the conformational updates described above, the simulations of this system used rigid body translations and rotations of individual chains.
Our simulations are performed using the open source C++package PROFASI [28]http://cbbp.thep.lu.se/activities/profasi/. Future public releases of PROFASI will include an implementation of the force field described here. While this force field has been implemented in PROFASI in an optimized manner, this optimization does not involve a parallel evaluation of the potential on many processors. Therefore, in our simulations the number of processors used is the same as the number of MC trajectories generated. For a typical small peptide, a trajectory of the length as given in Table 6 takes ~18 hours to generate on an AMD Opteron processor with ~2.0 GHz clock rate. For the largest system studied, GSα_{3} W, the simulations, with a proportionately larger number of MC updates, take ~10 days to complete.
2.7 Analysis
 1.
αhelix content, h. A residue is defined as helical if its Ramachandran angle pair is in the region 90° <ϕ < 30°, 77° <ψ < 17°. Following [29], a stretch of n > 2 helical residues is said to form a helical segment of length n  2. For an end residue that is not followed by an extra end group, the (ϕ, ψ) pair is poorly defined. Thus, for a chain with N residues, the maximum length of a helical segment is N  4, N  3 or N  2, depending on whether there are zero, one or two end groups. The αhelix content h is defined as the total length of all helical segments divided by this maximum length.
 2.
Rootmeansquare deviation from a folded reference structure, bRMSD/RMSD/pRMSD. bRMSD is calculated over backbone atoms, whereas RMSD is calculated over all heavy atoms. All residues except the two end residues are included in the calculation, unless otherwise stated. For the case of the dimeric AB zipper, the periodic box used for the simulations has to be taken into account. The two chains in the simulation might superficially appear to be far away when they are in fact close, because of periodicity. For this case we evaluate backbone RMSD over atoms taken from both chains in the dimer, and minimize this value with respect to periodic translations. We denote this as pRMSD.
 3.
Nativeness measure based on hydrogen bonds, q_{hb}. This observable has the value 1 if at most two native backbonebackbone hydrogen bonds are missing, and is 0 otherwise. A hydrogen bond is considered formed if its energy is less than 1.03 eu.
where X(T) is the quantity studied, X_{1} and X_{2} are the values of X in the two states, and K(T) is the effective equilibrium constant (R is the gas constant). In this firstorder form, K(T) contains two parameters: the melting temperature T_{m} and the energy difference ΔE. The parameters T_{m}, ΔE, X_{1} and X_{2} are determined by fitting to data.
Thermal averages and their statistical errors are calculated by using the jackknife method [30], after discarding the first 20% of each MC trajectory for thermalization.
Figures of 3D structures were prepared using PyMOL [31].
3 Results
We study a total of 20 peptide/protein systems, listed in Table 1 (amino acid sequences can be found in this table). Among these, there are 17 smaller systems with 10–37 residues and 3 larger ones with ≥ 49 residues. Many of the smaller systems have been simulated by other groups, in some cases with explicit water (for a review, see [32]). Two of the three larger systems, as far as we know, have not been studied using other force fields. A study of the 67residue threehelixbundle protein GSα_{3} W using the ECEPP/3 force field was recently reported [33]. The simulations presented here use the same geometric representation and find about a hundred times the number of independent folding events, while consuming much smaller computing resources.
3.1 Trpcage and E6apn1
The Trpcage is a designed 20residue miniprotein with a compact helical structure [34]. Its NMRderived native structure (see Fig. 1) contains an αhelix and a single turn of 3_{10}helix [34]. The E6apn1 peptide was designed using the Trpcage motif as a scaffold, to inhibit the E6 protein of papillomavirus [35]. E6apn1 is three residues larger than the Trpcage but has a similar structure, except that the αhelix is slightly longer [35].
As indicated earlier, we use melting data for the Trpcage to set the energy scale of the model. For this peptide, several experiments found a similar melting temperature, T_{m} ~315 K [34, 36, 37]. In our model, the heat capacity of the Trpcage displays a maximum at RT = 0.4722 ± 0.0008 eu. Our energy unit eu is converted to kcal/mol by setting this temperature equal to the experimental melting temperature (315 K). Having done that, there is no free parameter left in the model. Other systems are thus studied without tuning any model parameter. For E6apn1, the experimental melting temperature is T_{m} ~305 K [35].
Fig. 2b shows the free energy calculated as a function of bRMSD for the Trpcage and E6apn1 at two different temperatures. The first temperature, 279 K, is well below T_{m}. Here nativelike conformations dominate and the global freeenergy minima are at 2.4 Å and 2.0 Å for the Trpcage and E6apn1, respectively. At the second temperature, 306 K, the minima are shifted to higher bRMSD. Note that these freeenergy profiles, taken near T_{m}, show no sign of a doublewell structure. Hence, these peptides do not show a genuine twostate behavior in our simulations, even though the melting curves (Fig. 2a) are well described by a twostate model, as are many experimentally observed melting curves.
3.2 The αhelices C, EK, F_{s}, GCN4tp and HPLC6
Our next five sequences form αhelices. Among these, there are large differences in helix stability, according to CD studies. The least stable are the C [38] and EK [39] peptides, which are only partially stable at T ~273 K. The original C peptide is a 13residue fragment of ribonuclease A, but the C peptide here is an analogue with two alanine substitutions and a slightly increased helix stability [40]. The EK peptide is a designed alaninebased peptide with 14 residues.
Our third αhelix peptide is the 21residue F_{s} [41], which is also alaninebased. F_{s} is more stable than C and EK [41, 42], with estimated T_{m} values of 308 K [42] and 303 K [43] from CD studies and 334 K from an IR study [44]. Even more stable is HPLC6, a winter flounder antifreeze peptide with 37 residues. CD data suggest that the helix content of HPLC6 remains nonnegligible, ~0.10, at temperatures as high as ~343 K [45]. Our fifth helixforming sequence, which we call GCN4tp, has 17 residues and is taken from a study of GCN4 coiledcoil formation [46]. Its melting behavior has not been studied, as far as we know, but its structure was characterized by NMR [46].
For the four peptides whose melting behavior has been studied experimentally, these results are in good agreement with experimental data. In particular, we find that HPLC6 indeed is more stable than F_{s} in the model, which in turn is more stable than both C and EK. The model thus captures the stability order among these peptides.
3.3 The βhairpins chignolin and MBH12
We now turn to βsheet peptides and begin with the βhairpins chignolin [47] and MBH12 [48] with 10 and 14 residues, respectively. Both are designed and have been characterized by NMR. For chignolin, T_{m} values in the range 311–315 K were reported [47], based on CD and NMR. We are not aware of any melting data for MBH12.
Because these peptides have only four native hydrogen bonds each, one may question our definition of q_{hb} (see Methods), which takes a conformation as nativelike (q_{hb} = 1) even if two hydrogen bonds are missing. Therefore, we repeated the analysis using the stricter criterion that nativelike conformations (q_{hb} = 1) may lack at most one hydrogen bond. The resulting decrease in native population, as measured by the average q_{hb}, was ~0.1 or smaller at all temperatures. Even with this stricter definition, we find native populations well above 0.5 at low temperatures for both peptides.
3.4 The βhairpins GB1p, GB1m2 and GB1m3
GB1p is the second βhairpin of the B1 domain of protein G (residues 41–56). Its folded population has been estimated by CD/NMR to be 0.42 at 278 K [49] and ~0.30 at 298 K [50], whereas a Trp fluorescence study found a T_{m} of 297 K [51], corresponding to a somewhat higher folded population. GB1m2 and GB1m3 are two mutants of GB1p with significantly enhanced stability [50]. At 298 K, the folded population was found to be 0.74 ± 0.05 for GB1m2 and 0.86 ± 0.03 for GB1m3, based on CD and NMR measurements [50]. It was further estimated that T_{m} = 320 ± 2 K for GB1m2 and T_{m} = 333 ± 2 K for GB1m3 [50].
All these three peptides are believed to adopt a structure similar to that GB1p has as part of the protein G B1 domain (PDB code 1GB1). This part of the full protein contains seven backbonebackbone hydrogen bonds. These hydrogen bonds are the ones we consider when evaluating q_{hb} for these peptides.
These results show that, in the model, the apparent folded populations of these peptides depend quite strongly on the observable studied. Our E_{hp}based results agree quite well with experimental data, especially for GB1m2 and GB1m3, whereas our q_{hb} results consistently give lower folded populations for all peptides. The stability order is the same independent of which of the two observables we study, namely GB1p < GB1m2 < GB1m3, which is the experimentally observed order.
The stability difference between GB1m2 and GB1m3 is mainly due to chargecharge interactions. In our previous model [6], these interactions were ignored, and both peptides had similar stabilities. The present model splits this degeneracy. Moreover, the magnitude of the splitting, which sensitively depends on the strength of the chargecharge interactions, is consistent with experimental data.
3.5 The βhairpins trpzip1 and trpzip2
The 12residue trpzip1 and trpzip2 are designed βhairpins, each containing two tryptophans per βstrand [52]. The only difference between the two sequences is a transposition of an aspargine and a glycine in the hairpin turn. CD measurements suggest that trpzip1 and trpzip2 are remarkably stable for their size, with T_{m} values of 323 K and 345 K, respectively [52]. A complementary trpzip2 study, using both experimental and computational methods, found T_{m} values to be strongly probedependent [53].
Like for the other βhairpins discussed earlier, our q_{hb}based folded populations are low compared to estimates based on CD data, whereas those based on E_{hp} are much closer to experimental data. For trpzip2, the agreement is not perfect but acceptable, given that T_{m} has been found to be strongly probedependent for this peptide [53].
3.6 Threestranded βsheets: betanova, LLM and beta3s
Betanova [54], the betanova triple mutant LLM [55] and beta3s [56] are designed 20residue peptides forming threestranded βsheets. All the three peptides are marginally stable. NMR studies suggest that the folded population at 283 K is 0.09 for betanova [55], 0.36 for LLM [55], and 0.13–0.31 for beta3s [56].
These E_{hp}based T_{m} values are high compared to the experimentally determined folded populations, especially for betanova. Note that betanova has a very low hydrophobicity. The correlation between E_{hp} and folding status is therefore likely to be weak for this peptide.
In contrast to the E_{hp}based folded populations, those based on q_{hb} agree quite well with experimental data. In this respect, the situation is the opposite to what we found for the βhairpins studied above. A possible reason for this difference is discussed below.
3.7 AB zipper
The AB zipper is a designed heterodimeric leucine zipper, composed of an acidic A chain and a basic B chain, each with 30 residues [57]. The dimer structure has been characterized by NMR, and a melting temperature of ~340 K was estimated by CD measurements (at neutral pH) [57].
3.8 Top7CFr
Top7CFr, the Cterminal fragment of the designed 93residue α/βprotein Top7 [58], is the most complex of all molecules studied here. It has both αhelix and βstrand secondary structure elements, and highly nonlocal hydrogen bonds between the N and Cterminal strands. CFr is known to form extremely stable homodimers, which retain their secondary structure till very high temperatures like 371 K and high concentrations of denaturants [59].
In [13, 14], an earlier version of our model was used to study the folding of CFr. The simulations pointed to an unexpected folding mechanism. The Nterminal strand initially folds as a nonnative continuation of the adjoining αhelix. After the other secondary structure elements form and diffuse to an approximately correct tertiary organization, the nonnative extension of the helix unfolds and frees the Nterminal residues. These residues then attach to an existing βhairpin to complete the threestranded βsheet of the native structure. Premature fastening of the chain ends in βsheet contacts puts the molecule in a deep local energy minimum, in which the folding and proper arrangement of the other secondary structure elements is hampered by large steric barriers. The above "caching" mechanism, spontaneously emerging in the simulations, accelerates folding by helping the molecule avoid such local minima.
The folding properties of CFr, including the above mentioned caching mechanism, are preserved under the current modifications of the interaction potential. The centre of the native freeenergy minimum shifts from bRMSD (all residues) of 1.7 Å as reported in [13] to about 2.2 Å. This state remains the minimum energy state, although the new energy function changes the energy ordering of the other low energy states. The runs made for this study (see Table 6) found 22 independent folding events. The freeenergy landscape observed in the simulations is rather complex with a plethora of deep local minima sharing one or more secondary structure elements with the native structure. They differ in the registry and ordering of strands and the length of the helix. Longer runs are required for the MC simulations to correctly weight these different minima. Temperature dependence of the properties of CFr can therefore not be reliably obtained from these runs.
We note that the simulations ran on twice as many processors but were only about one sixth the length of those used for [13], in which 15 independent folding events were found. The improved efficiency is partly due to the changes in the energy function presented here, and partly due to the optimization of the parallel tempering described in [26].
3.9 GSα_{3}W
GSα_{3} W is a designed threehelixbundle protein with 67 residues [60], whose structure was characterized by NMR [61]. The stability was estimated to be 4.6 kcal/mol in aqueous solution at 298 K, based on CD data [60].
In Fig. 9a, we show how the probabilities for structures with different bRMSD vary with temperature in the simulations. Clearly, the protein makes a transition from a rather continuous distribution of bRMSD at high temperatures to a distribution dominated by three well separated clusters. Analysis of the structures at the lower temperatures shows that all three freeenergy minima consist almost exclusively of structures with all three helices of GSα_{3} W formed. The plot of the ratio of the observed helix content and the helix content of the native state, shown in Fig. 9b, further supports this idea. The average value of this ratio approaches 1 as the temperature decreases below 300 K. The specific heat curve, also shown in Fig. 9b, indicates that the formation of these structures correlates with the steepest change in energy.
The cluster with a center at bRMSD ~3 Å dominates at the lowest temperatures. The structures contributing to the cluster with ~8–9 Å bRMSD superficially look like well folded threehelix bundles. But as illustrated in the figure, the arrangement of the helices is topologically distinct from the native arrangement. The cluster seen at larger bRMSD values is broader and consists of a host of structures in which two of the helices make a helical hairpin, but the third helix is not bound to it. The unbound helix could be at either side of the chain.
According to our model therefore, the population at the lowest temperatures consists of ~80% genuinely native structures, ~10% threehelix bundles with wrong topology, and ~10% other structures with as much helix content as the native state. In order to experimentally determine the true folded population of the protein, the experimental probe must be able to distinguish the native fold from the other helix rich structures described here.
4 Discussion
The model presented here is intrinsically fast compared to many other allatom models, because all interactions are short range. By exploiting this property and using efficient MC techniques, it is possible to achieve a high sampling efficiency. We could, for example, generate more than 800 independent folding events for the 67residue GSα_{3} W. The speed of the simulations thus permits statistically accurate studies of the global freeenergy landscape of peptides and small proteins.
In developing this potential, a set of 17 peptides with 10–37 residues was studied. The peptides were added to this set one at a time. To fold a new sequence sometimes required finetuning of the potential, sometimes not. A change was accepted only after testing the new potential on all previous sequences in the set. In its final form, the model folds all 17 sequences to structures similar to their experimental structures, for one and the same choice of potential parameters.
Also important is the stability of the peptides. A small polypeptide chain is unlikely to be a clear twostate folder, and therefore its apparent folded population will generally depend on the observable studied. For βsheet peptides, we used the hydrophobicity energy E_{hp} and the hydrogen bondbased nativeness measure q_{hb} to monitor the melting behavior. The extracted T_{m} values indeed showed a clear probe dependence; the E_{hp}based value was always larger than that based on q_{hb}. For the βhairpins studied, we found a good overall agreement between our E_{hp}based results and experimental data. For the threestranded βsheets, instead, the q_{hb} results agreed best with experimental data. The reason for this difference is unclear. One contributing factor could be that interactions between aromatic residues play a more important role for the βhairpins studied here than for the threestranded βsheets. These interactions may influence spectroscopic signals and are part of E_{hp}. Probedependent T_{m} values have also been obtained experimentally, for example, for trpzip2 [53].
The probe dependence makes the comparison with experimental data less straightforward. Nevertheless, the results presented clearly show that the model captures many experimentally observed stability differences. In particular, among related peptides, the calculated order of increasing thermal stability generally agrees with the experimental order, independent of which of our observables we use.
It is encouraging that the model is able to fold these 17 sequences. However, there is no existing model that will fold all peptides, and our model is no exception. Two sequences that we unsuccessfully tried to fold are the βhairpins trpzip4 and U_{16}, both with 16 residues. Trpzip4 is a triple mutant of GB1p with four tryptophans [52]. For trpzip4, our minimum energy state actually corresponded to the NMRderived native state [52], but the population of this state remained low at the lowest temperature studied (~14% at 279 K, as opposed to an estimated T_{m} of 343 K in experiments [52]). U_{16} is derived from the Nterminal βhairpin of ubiquitin [62]. It has a shortened turn and has been found to form a βhairpin with nonnative registry [62]. In our simulations, this state was only weakly populated (~8% at 279 K, as opposed to an estimated ~80% at 288 K [62]). Instead, the main freeenergy minima corresponded to the two βhairpin states with the registry of native ubiquitin, one with native hydrogen bonds and the other with the complementary set of hydrogen bonds.
Our calibration of the potential relies on experimental data with nonnegligible uncertainties, on a limited number of peptides. It is not evident that this potential will be useful for larger polypeptide chains. Therefore, as a proofofprinciple test, we also studied three larger systems, with very good results. Our simulations showed that, without having to adjust any parameter, the model folds these sequences to structures consistent with experimental data. Having verified this, it would be interesting to use the model to investigate the mechanisms by which these systems selfassemble, but such an analysis is beyond the scope of this article. The main purpose of our present study of these systems was to demonstrate the viability of our calibration approach.
The potential can be further constrained by confronting it with more accurate experimental data and data on new sequences. The challenge in this process is to ensure backward compatibility – new constraints should be met without sacrificing properties already achieved.
5 Conclusion
We have described and tested an implicit solvent allatom model for protein simulations. The model is computationally fast and yet able to capture structural and thermodynamic properties of a diverse set of sequences. Its computational efficiency greatly facilitates the study of folding and aggregation problems that require exploration of the full freeenergy landscape. A program package, called PROFASI [28], for single and multichain simulations with this model is freely available to academic users.
Declarations
Acknowledgements
We thank Stefan Wallin for suggestions on the manuscript. This work was in part supported by the Swedish Research Council. The simulations of the larger systems were performed at the John von Neumann Institute for Computing (NIC), Research Centre Jülich, Germany.
Authors’ Affiliations
References
 Uversky VN: Protein Sci. 2002, 11: 739756.View ArticleGoogle Scholar
 Dyson HJ, Wright PE: Nat Rev Mol Cell Biol. 2005, 6: 197208.View ArticleGoogle Scholar
 Yoda T, Sugita Y, Okamoto Y: Chem Phys. 2004, 307: 269283.ADSView ArticleGoogle Scholar
 Shell MS, Ritterson R, Dill KA: J Phys Chem. 2008, B 112: 68786886.View ArticleGoogle Scholar
 Irbäck A, Samuelsson B, Sjunnesson F, Wallin S: Biophys J. 2003, 85: 14661473.View ArticleGoogle Scholar
 Irbäck A, Mohanty S: Biophys J. 2005, 88: 15601569.View ArticleGoogle Scholar
 Cheon M, Chang I, Mohanty S, Luheshi LM, Dobson CM, Vendruscolo M, Favrin G: PLoS Comput Biol. 2007, 3: e173ADSView ArticleGoogle Scholar
 Irbäck A, Mitternacht S: Proteins. 2008, 71: 207214.View ArticleGoogle Scholar
 Li D, Mohanty S, Irbäck A, Huo S: PLoS Comput Biol. 2008, 4: e1000238ADSView ArticleGoogle Scholar
 Irbäck A, Mitternacht S, Mohanty S: Proc Natl Acad Sci USA. 2005, 102: 1342713432.ADSView ArticleGoogle Scholar
 Mitternacht S, Luccioli S, Torcini A, Imparato A, Irbäck A: Biophys J. 2009, 96: 429441.View ArticleGoogle Scholar
 Mohanty S, Hansmann UHE: Biophys J. 2006, 91: 35733578.View ArticleGoogle Scholar
 Mohanty S, Meinke JH, Zimmermann O, Hansmann UHE: Proc Natl Acad Sci USA. 2008, 105: 80048007.ADSView ArticleGoogle Scholar
 Mohanty S, Hansmann UHE: J Phys Chem. 2008, B 112: 1513415139.View ArticleGoogle Scholar
 Ponder JW, Case DA: Adv Protein Chem. 2003, 66: 2785.View ArticleGoogle Scholar
 Hubner IA, Deeds EJ, Shakhnovich EI: Proc Natl Acad Sci USA. 2005, 102: 1891418919.ADSView ArticleGoogle Scholar
 Herges T, Wenzel W: Phys Rev Lett. 2005, 94: 018101ADSView ArticleGoogle Scholar
 Ding F, Tsao D, Nie H, Dokholyan NV: Structure. 2008, 16: 10101018.View ArticleGoogle Scholar
 Hovmöller S, Zhou T, Ohlsson T: Acta Cryst. 2002, D 58: 768776.Google Scholar
 Miyazawa S, Jernigan RL: J Mol Biol. 1996, 256: 623644.View ArticleGoogle Scholar
 Li H, Tang C, Wingreen NS: Phys Rev Lett. 1997, 79: 765768.ADSView ArticleGoogle Scholar
 Lyubartsev AP, Martsinovski AA, Shevkunov SV, VorontsovVelyaminov PN: J Chem Phys. 1992, 96: 17761783.ADSView ArticleGoogle Scholar
 Marinari E, Parisi G: Europhys Lett. 1992, 19: 451458.ADSView ArticleGoogle Scholar
 Swendsen RH, Wang JS: Phys Rev Lett. 1986, 57: 26072609.ADSMathSciNetView ArticleGoogle Scholar
 Hukushima K, Nemoto K: J Phys Soc (Jap). 1996, 65: 16041608.ADSView ArticleGoogle Scholar
 Meinke J, Mohanty S, Nadler W: Manuscript in preparation. 2009Google Scholar
 Favrin G, Irbäck A, Sjunnesson F: J Chem Phys. 2001, 114: 81548158.ADSView ArticleGoogle Scholar
 Irbäck A, Mohanty S: J Comput Chem. 2006, 27: 15481555.View ArticleGoogle Scholar
 García AE, Sanbonmatsu KY: Proc Natl Acad Sci USA. 2002, 99: 27822787.ADSView ArticleGoogle Scholar
 Miller RG: Biometrika. 1974, 61: 115.MATHMathSciNetGoogle Scholar
 DeLano WL: 2002, San Carlos, CA: DeLano Scientific
 Gnanakaran S, Nymeyer H, Portman J, Sanbonmatsu KY, García AE: Curr Opin Struct Biol. 2003, 13: 168174.View ArticleGoogle Scholar
 Meinke J, Hansmann UHE: 2009
 Neidigh JW, Fesinmeyer RM, Andersen NH: Nat Struct Biol. 2002, 9: 425430.View ArticleGoogle Scholar
 Liu Y, Liu Z, Androphy E, Chen J, Baleja JD: Biochemistry. 2004, 43: 74217431.View ArticleGoogle Scholar
 Qiu L, Pabit SA, Roitberg AE, Hagen SJ: J Am Chem Soc. 2002, 124: 1295212953.View ArticleGoogle Scholar
 Streicher WW, Makhatadze GI: Biochemistry. 2007, 46: 28762880.View ArticleGoogle Scholar
 Bierzynski A, Kim PS, Baldwin RL: Proc Natl Acad Sci USA. 1982, 79: 24702474.ADSView ArticleGoogle Scholar
 Scholtz J, Barrick D, York E, Stewart J, Baldwin R: Proc Natl Acad Sci USA. 1995, 92: 185189.ADSView ArticleGoogle Scholar
 Shoemaker KR, Kim PS, York EJ, Stewart JM, Baldwin RL: Nature. 1987, 326: 563567.ADSView ArticleGoogle Scholar
 Lockhart DJ, Kim PS: Science. 1992, 257: 947951.ADSView ArticleGoogle Scholar
 Lockhart DJ, Kim PS: Science. 1993, 260: 198202.ADSView ArticleGoogle Scholar
 Thompson PA, Eaton WA, Hofrichter J: Biochemistry. 1997, 36: 92009210.View ArticleGoogle Scholar
 Williams S, Causgrove TP, Gilmanshin R, Fang KS, Callender RH, Woodruff WH, Dyer RB: Biochemistry. 1996, 35: 691697.View ArticleGoogle Scholar
 Chakrabartty A, Ananthanarayanan VS, Hew CL: J Biol Chem. 1989, 264: 1130711312.Google Scholar
 Steinmetz MO, Jelesarov I, Matousek WM, Honnappa S, Jahnke WA, Missimer JH, Frank S, Alexandrescu AT, Kammerer RA: Proc Natl Acad Sci USA. 2007, 104: 70627067.ADSView ArticleGoogle Scholar
 Honda S, Yamasaki K, Sawada Y, Morii H: Structure. 2004, 12: 15071518.View ArticleGoogle Scholar
 Pastor MT, López de la Paz M, Lacroix E, Serrano L, PérezPayá E: Proc Natl Acad Sci USA. 2002, 99: 614619.ADSView ArticleGoogle Scholar
 Blanco F, Rivas G, Serrano L: Nat Struct Biol. 1994, 1: 584590.View ArticleGoogle Scholar
 Fesinmeyer RM, Hudson FM, Andersen NH: J Am Chem Soc. 2004, 126: 72387243.View ArticleGoogle Scholar
 Muñoz V, Thompson PA, Hofrichter J, Eaton WA: Nature. 1997, 390: 196199.ADSView ArticleGoogle Scholar
 Cochran AG, Skelton NJ, Starovasnik MA: Proc Natl Acad Sci USA. 2001, 98: 55785583.ADSView ArticleGoogle Scholar
 Yang WY, Pitera JW, Swope WC, Gruebele M: J Mol Biol. 2004, 336: 241251.View ArticleGoogle Scholar
 Kortemme T, RamírezAlvarado M, Serrano L: Science. 1998, 281: 253256.ADSView ArticleGoogle Scholar
 López de la Paz M, Lacroix E, RamírezAlvarado M, Serrano L: J Mol Biol. 2001, 312: 229246.View ArticleGoogle Scholar
 de Alba E, Santorio J, Rico M, Jimenez MA: Protein Sci. 1999, 8: 854865.View ArticleGoogle Scholar
 Marti DN, Bosshard HR: Biochemistry. 2004, 43: 1243612447.View ArticleGoogle Scholar
 Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D: Science. 2003, 302: 1364ADSView ArticleGoogle Scholar
 Dantas G, Watters AL, Lunde BM, Eletr ZM, Isern NG, Roseman T, Lipfert J, Doniach S, Tompa M, Kuhlman B, Stoddard BL, Varani G, Baker D: J Mol Biol. 2006, 362: 10041024.View ArticleGoogle Scholar
 Johansson JS, Gibney BR, Skalicky JJ, Wand AJ, Dutton PL: J Am Chem Soc. 1998, 120: 38813886.View ArticleGoogle Scholar
 Dai QH, Thomas C, Fuentes EJ, Blomberg MRA, Dutton PL, Wand AJ: J Am Chem Soc. 2002, 124: 1095210953.View ArticleGoogle Scholar
 Jourdan M, GriffithsJones SR, Searle MS: Eur Biophys J. 2000, 267: 35393548.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.