Stochastic reconstruction of protein structures from effective connectivity profiles
© Wolff et al. 2008
Received: 24 September 2008
Accepted: 26 November 2008
Published: 26 November 2008
We discuss a stochastic approach for reconstructing the native structures of proteins from the knowledge of the "effective connectivity", which is a one-dimensional structural profile constructed as a linear combination of the eigenvectors of the contact map of the target structure. The structural profile is used to bias a search of the conformational space towards the target structure in a Monte Carlo scheme operating on a C α -chain of uniform, finite thickness. Structure information thus enters the folding dynamics via the effective connectivity, but the interaction is not restricted to pairs of amino acids that form native contacts, resulting in a free energy landscape which does not rely on the assumption of minimal frustration. Moreover, effective connectivity vectors can be predicted more readily from the amino acid sequence of proteins than the corresponding contact maps, thus suggesting that the stochastic protocol presented here could be effectively combined with other current methods for predicting native structures.
PACS codes: 87.14.Ee.
The challenges presented by the protein folding problem have been approached from a wide range of different theoretical angles. Computationally convenient structural representations of protein structures such as lattice models [1, 2] or Gō-models [3, 4] have the advantage of simplicity and of capturing some of the more universal properties of the protein folding process. However, lattice models severely restrict the conformation space, and Gō-models consider primarily interactions between pairs of amino acids that are in contact in the native structure. This latter approach is inspired by the "minimal frustration" view , in which the assumption is made that non-native interactions play a relatively minor role in shaping the free energy landscape of proteins. As an alternative, sophisticated force fields that employ all atom representations of protein structures  provide results in good agreement with a range of experimental observations, but are computationally demanding and thus tend to be restricted to the study of the faster among the dynamical processes experienced by proteins.
In order to investigate whether an alternative trade-off between simplicity and accuracy can be found, in the present work we adopt an approach based on the use of structural profiles [7–11]. In this work, we begin to study this problem by considering the reconstruction problem from exact structural profiles, prerequisite to the investigation of folding energy landscapes.
Reconstruction as discussed in this study always operates on an explicit structure description and thus a successful "reconstruction" is equivalent to a folding process reaching the native structure. In this sense, our approach is similar to Gō-models as it uses information derived from the folded structure but dissimilar in the important fact that interactions are non-specific, i.e. there are interactions between all residues, not only native ones. As it is the case of Gō-models, an agreement between the dynamics of folding found experimentally and computationally would imply that the native structure of a protein largely dictates the path of folding [5, 12].
Another important reason for exploring the use of structural profiles is the correlation to (optimal) sequence hydrophobicity  allowing prediction of profile from sequence alone (without prior prediction of contact maps) with relatively high accuracy [13, 14]. These results open the possibility of using predicted structural profiles to aid, in turn, the prediction of the three-dimensional coordinates of the native states of proteins. One of the major problems in this type of approach, however, is that structural profiles can be predicted with only limited accuracy. Therefore it is key to establish procedures which are in principle able to determine three-dimensional conformations from the knowledge of approximate, or noisy, structural profiles. Stochastic methods are the most promising ones for solving this problem. In this work we make a step in establishing this approach by studying the problem of reconstruction of three-dimensional structures from structural profiles using a Monte Carlo procedure for a set of representative small proteins. A major advantage over work that we previously carried out  is achieved by using a more general structure profile and by restricting the structural profile to amino acids that form cooperative contacts and, in this sense, show protein-like behaviour, resulting in a more reliable reconstruction.
Even though our method is not meant to perform structure predictions on its own, the results presented here suggest that it might be convenient to incorporate predictions of the effective connectivity vectors as additional information into methods for predicting the native structures of proteins, in particular those based on the molecular fragment replacement procedure [16–18], which are currently the most effective ones for achieving this goal .
Contact maps and effective connectivity
The largest contribution to the EC comes from the principal eigenvector (PE), the eigenvector v(1) to the largest eigenvalue λ(1), and for single domain proteins the correlation between PE and EC is very high . Since during the folding process single-domain proteins populate non-compact conformations, the use of the more general EC is preferable over that of the PE. For the small proteins investigated in this work, this distinction is of minor importance as folding is likely to happen as a single cooperative event. When, in the future, progressing to larger proteins, folding may start independently at different sites and we expect the more general EC to fare better. Both structural profiles (PE and EC) contain information about the connectivity of each amino acid. Well connected residues tend to have larger entries in the structural profile than those connected to fewer residues. This fact also explains the correlation to hydrophobicity, as residues with many contacts are buried inside the protein fold (see discussion in ).
Restriction to cooperative contacts
Protein structures can be identified by their respective structural profile. For the PE, this matching has been shown to be unique for a set of representative proteins up to 120 residues in length  and we expect the EC, too, to uniquely determine the structure. In fact, the EC has been successfully used to perform structural alignments of proteins . In a stochastic reconstruction approach, however, the ruggedness of the landscape corresponding to a cost function based on the difference of structural profiles may pose severe difficulties. In previous work we found that compact structures without discernible secondary structure often resulted in dead ends in Monte Carlo simulations. Since the EC is more suitable for describing folded structures, we restrict the computation of the EC at any time step in the simulation to those parts of the structure that exhibit cooperative contacts (definition see below) characteristic of secondary structure elements. The target EC is computed with the same restrictions but with respect to the target structure.
The typical contact pattern of α-helices consists of successive contacts between amino acids i and i + A with A = 3 or 4. For the assignment of the existence of an α-helix we require at least four such successive contacts with a contact threshold of 8.5Å. Additionally, positive chirality is requested, ri-1,i·(ri,i+1 × ri+1,i+2) > 0. Existence of β-sheets for individual residues is characterised by detecting contact patterns of i and i + A (A ≥ 5, parallel β-sheets) or i and B - i (B ≥ 7, anti-parallel β-sheets) with at least four consecutive contacts and fixed A, B. To make up for the additional condition of chirality for helices, the contact threshold for β sheets is set to 7Å.
Energy function for Monte Carlo simulations
where ri,i+1,i+2 is the radius of the circle defined by residues i, i + 1 and i + 2.
Here, NH is the number of helix residues and NH2 the number of helix residues in close contact. By doing so, helices on average get no energy penalty nor reward and their ratio to β-sheets is not shifted.
Temperature was between 0.7 and 0.9 in units of the above energy scale and kept fixed during folding simulations.
Results and discussion
Analysis of successful reconstructions
We have presented a stochastic scheme to reconstruct the three-dimensional structures of proteins from the knowledge of their effective connectivity vector. We have demonstrated that in its current implementation this method is rather effective for proteins in the all-α fold class but shows limitations for more complex proteins with non-local cooperative contacts. Since the stochastic method employed here was based on Monte Carlo simulations at fixed temperature, more advanced sampling techniques are expected to provide improved results, particularly for longer proteins. In addition, further improvements might be achieved by enhancing the sampling of long-range cooperative contacts.
The possibility of using effective connectivities to bias the sampling towards native states opens the way for investigation of folding dynamics using the description of non-specific interactions that we have discussed here. In addition, the fact that the structure profile can discriminate the correct fold from very similar structures, as is necessary in folding, and is predictable to quite good accuracy suggests its incorporation into existing powerful tools of protein structure prediction to exploit the information encoded in the profile. In future studies we will address the problem of reconstruction when the effective connectivity vectors are not known exactly but predicted from the amino acid sequences.
We gratefully acknowledge financial support by the Deutsche Akademische Austauschsdienst, grant number D/08/08872, and The British Council, grant number ARC 1319.
- Sali A, Shakhnovich EI, Karplus M: Nature. 1994, 369: 248-251. 10.1038/369248a0.View ArticleADSGoogle Scholar
- Dinner AR, Sali A, Smith LJ, Dobson CM, Karplus M: Trends Biochem Sci. 2000, 25: 331-339. 10.1016/S0968-0004(00)01610-8.View ArticleGoogle Scholar
- Gō N, Taketomi H: Proc Natl Acad Sci USA. 1978, 75: 559-563. 10.1073/pnas.75.2.559.View ArticleADSGoogle Scholar
- Clementi C, Nymeyer H, Onuchic JN: J Mol Biol. 2000, 298: 937-953. 10.1006/jmbi.2000.3693.View ArticleGoogle Scholar
- Bryngelson JD, Wolynes PG: Proc Natl Acad Sci USA. 1987, 84: 7524-7528. 10.1073/pnas.84.21.7524.View ArticleADSGoogle Scholar
- Karplus M, Kuriyan J: Proc Natl Acad Sci USA. 2005, 102: 6679-6685. 10.1073/pnas.0408930102.View ArticleADSGoogle Scholar
- Bowie JU, Luthy R, Eisenberg D: Science. 1991, 253: 164-170. 10.1126/science.1853201.View ArticleADSGoogle Scholar
- Jones DT, Taylor WR, Thornton JM: Nature. 1992, 358: 86-89. 10.1038/358086a0.View ArticleADSGoogle Scholar
- Porto M, Bastolla U, Roman HE, Vendruscolo M: Phys Rev Lett. 2004, 92: 218101-10.1103/PhysRevLett.92.218101.View ArticleADSGoogle Scholar
- Bastolla U, Porto M, Roman HE, Vendruscolo M: Proteins. 2005, 58: 22-30. 10.1002/prot.20240.View ArticleGoogle Scholar
- Bastolla U, Ortiz AR, Porto M, Teichert F: Proteins. 2008, 73: 872-888. 10.1002/prot.22113.View ArticleGoogle Scholar
- Baker D: Nature. 2000, 405: 39-42. 10.1038/35011000.View ArticleADSGoogle Scholar
- Vullo A, Walsh I, Pollastri G: BMC Bioinformatics. 2006, 7: 180-10.1186/1471-2105-7-180.View ArticleGoogle Scholar
- Kinjo AR, Nishikawa K: BMC Bioinformatics. 2006, 7: 401-10.1186/1471-2105-7-401.View ArticleGoogle Scholar
- Wolff K, Vendruscolo M, Porto M: Gene. 2008, 422: 47-51. 10.1016/j.gene.2008.06.004.View ArticleGoogle Scholar
- Simons KT, Kooperberg C, Huang E, Baker D: J Mol Biol. 1997, 268: 209-225. 10.1006/jmbi.1997.0959.View ArticleGoogle Scholar
- Bradley P, Misura KMS, Baker D: Science. 2005, 309: 1868-1871. 10.1126/science.1113801.View ArticleADSGoogle Scholar
- Schueler-Furman O, Wang C, Bradley P, Misura K, Baker D: Science. 2005, 310: 638-642. 10.1126/science.1112160.View ArticleADSGoogle Scholar
- Moult J, Fidelis K, Kryshtafovych A, Rost B, Hubbard T, Tramontano A: Proteins. 2007, 69 (Suppl 8): 3-9. 10.1002/prot.21767.View ArticleGoogle Scholar
- Teichert F, Bastolla U, Porto M: BMC Bioinformatics. 2007, 8: 425-10.1186/1471-2105-8-425.View ArticleGoogle Scholar
- Kabsch W, Sander C: Biopolymers. 1983, 22: 2577-2637. 10.1002/bip.360221211.View ArticleGoogle Scholar
- Frishman D, Argos P: Proteins. 1995, 23: 566-579. 10.1002/prot.340230412.View ArticleGoogle Scholar
- Humphrey W, Dalke A, Schulten K: J Molec Graphics. 1996, 14: 33-38. 10.1016/0263-7855(96)00018-5.View ArticleGoogle Scholar
- Hoang TX, Trovato A, Seno F, Banavar JR, Maritan A: Proc Natl Acad Sci USA. 2004, 101: 7960-7964. 10.1073/pnas.0402525101.View ArticleADSGoogle Scholar
- Murzin AG, Brenner SE, Hubbard T, Chothia C: J Mol Biol. 1995, 247: 536-540.Google Scholar
- Chen Y, Ding F, Dokholyan N: J Phys Chem B. 2007, 111: 7432-7438. 10.1021/jp068963t.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.