On the electrostatic component of protein-protein binding free energy

Calculations of electrostatic properties of protein-protein complexes are usually done within framework of a model with a certain set of parameters. In this paper we present a comprehensive statistical analysis of the sensitivity of the electrostatic component of binding free energy (ΔΔGel) with respect with different force fields (Charmm, Amber, and OPLS), different values of the internal dielectric constant, and different presentations of molecular surface (different values of the probe radius). The study was done using the largest so far set of entries comprising 260 hetero and 2148 homo protein-protein complexes extracted from a previously developed database of protein complexes (ProtCom). To test the sensitivity of the energy calculations with respect to the structural details, all structures were energy minimized with corresponding force field, and the energies were recalculated. The results indicate that the absolute value of the electrostatic component of the binding free energy (ΔΔGel) is very sensitive to the force field parameters, the minimization procedure, the values of the internal dielectric constant, and the probe radius. Nevertheless our results indicate that certain trends in ΔΔGel behavior are much less sensitive to the calculation parameters. For instance, the fraction of the homo-complexes, for which the electrostatics was found to oppose binding, is 80% regardless of the force fields and parameters used. For the hetero-complexes, however, the percentage of the cases for which electrostatics opposed binding varied from 43% to 85%, depending on the protocol and parameters employed. A significant correlation was found between the effects caused by raising the internal dielectric constant and decreasing the probe radius. Correlations were also found among the results obtained with different force fields. However, despite of the correlations found, the absolute ΔΔGel calculated with different force field parameters could differ more than tens of kcal/mol in some cases. Set of rules of obtaining confident predictions of absolute ΔΔGel and ΔΔGel sign are provided in the conclusion section. PACS codes: 87.15.A-, 87.15. km


Introduction
Proteins are essential components of the living cell [1]. They function by interacting with other biological macromolecules, including other proteins. Therefore, understanding protein-protein interactions is a very important step in learning about cell function, which in turn requires understanding the forces and effects that influence these interactions [2][3][4][5][6][7][8][9][10]. Protein-protein binding is a complex process mostly driven by the hydrophobic effect and van der Waals interactions, with significant contribution from entropy and electrostatics. While these energies can not be easy experimentally separated, the electrostatic energy is the energy mostly affected by pH and salt concentration. The interest of modeling pH and salt dependent phenomena inspired many researchers to model the electrostatic component of the binding free energy alone [11]. Nowadays, the 3D structures of a significant number of protein-protein complexes are available, which allows researchers to model the contribution of electrostatic energy to the binding [12][13][14][15][16][17][18][19]. Experimental [20][21][22][23] and computational [24] mutagenesis studies have found that most of the ionizable residues at protein-protein interfaces contribute significantly to the binding free energy (i.e., replacement of those residues with the alanine residues critically affects the protein binding affinity). In many cases, the formation of a complex creates favorable pair-wise electrostatic interactions across the interface, as was demonstrated in a series of work on the barnase-barstar complex [17,[25][26][27][28] and on other complexes [19,[29][30][31][32][33]. Authors of several studies [7,34,35] have concluded that electrostatic interactions have a dominant contribution into total binding free energy for the complexes with small interfaces. The contribution of electrostatic energy to the binding affinity of the particular, Rap/Raf, complex was the subject of a series of investigations [36,37]. Despite the fact that all of the abovementioned studies agreed that there are many specific pair-wise electrostatic interactions across the interface, the conclusions about the role of electrostatics on binding affinity remain controversial. It was found that, in some cases, electrostatic energy favors binding, while in other cases, it opposes it. Since the electrostatic component of the binding free energy is the difference between two large and similar numbers (energy of pair-wise interactions and the desolvation penalty), the calculations outcome inevitably depends on parameters of simulations such as the force field parameters, the choice of the internal dielectric constant of proteins, and the molecular surface representation. In addition, structural refinement also affects the outcome of electrostatic calculations.
The internal dielectric constant is an important characteristic of protein's polarizability [38][39][40][41]. However, it was repeatedly shown in the literature that the dielectric constant is neither a constant nor is there an optimal value that works for all cases. Molecular dynamic simulations with implicit solvent models use a dielectric constant of either 1.0 or 2.0 [42], the molecular mechanics Poisson-Boltzmann (MMPB) method usually employs a dielectric constant of 2.0 [43,44], and pKa's calculations are usually carried out with a dielectric constant larger or equal to 4.0 [45,46]. In our study the dielectric constant will be considered a parameter, and we will vary its value to reveal the sensitivity of the calculations of the electrostatic component of binding. The other parameter that will be varied is the probe radius. As pointed out by Zhou and coworkers [17,47,48], the electrostatic component of the binding free energy is very sensitive to the method used to build the molecular surface. Recently, the effect of different presentations of the molecular surface was tested on a set of 64 mutants [47], which demonstrated that calculations done with van der Waals molecular surface give better agreement with experimental data delivered from a double mutant cycle. The charge distribution and atomic radii greatly affect numerical calculations as well [49]. Currently, there are many force fields, each having different partial charge distributions and sets of atomic radii. Three of the most popular force fields, Charmm, Amber, and OPLS will be used in this study to reveal the sensitivity of the calculations with respect to the force field parameters. Previous work has shown that optimal hydrogen bonding is crucial for correct predictions of pKa's of ionizable groups [50,51]. However, different force fields generate different hydrogen networks [52]. In addition, the calculated energies strongly depend on structural details, and many studies have been done on refined structures [53], i.e., on structures that were subjected to energy minimization. Therefore, it is desirable to know what will be the effect of minimization on the calculations of the electrostatic component of binding, if minimization is done with different force fields.
In this study we test the sensitivity of the electrostatic component of the binding free energy with respect to different force field parameters, structural relaxation, values of the internal dielectric constant, and probe radii. The study will be done on a large set of protein-protein complexes comprised of 260 hetero-and 2148 homo-complexes. Such a large dataset has never been investigated before, especially with energy minimized structures.

Protein-protein complexes used in the study and their parameters
Protein-protein hetero-complexes subjected to the study were extracted from the ProtCom [54] database (as of November 2006) http://www.ces.clemson.edu/compbio/protcom, which contains 1771 entries at 95% sequence identity level. To avoid the bias toward overrepresented complexes, the entries were purged with CD-hit [55] at 40% sequence identity level for all components of the hetero-complexes, including monomers that belong to the same protein-protein complex. This resulted in 299 structures out of which 39 structures were excluded from further consideration due to large defects in the PDB files (like large segments of missing polypeptide chains, for example), which constitute untreatable obstacles for the correct protonation of the structures. The structures of homo-complexes were taken from 40% sequence identity of the ProtCom database. Some structures were excluded because of large structural defects, which reduced our homocomplex set to 2617 structures at 40% sequence identity level.
This resulted in an initial set of 260 hetero-and 2617 homo-protein complexes, which after removing structures for which some of the computational procedures could not produce the desired accuracy was reduced to 260 hetero-and 2148 homo-protein complexes. The interfacial area was calculated with the surfv program [56] by subtracting the accessible surface area of the complex from the accessible surface area of the free monomers. The net charge of the complex and the free monomers were calculated assuming that all titratable amino acids were in their charged state.

Hydrogen placement and energy minimization
Some of the structures in our initial data set had structural defects, and thus all of the structures were subjected to the profix program from the Jackal package developed in Honig's lab (http://wiki.c2b2.columbia.edu/honiglab_public/index.php/Software: Jackal) in order to add missing atoms and/or sequence fragments. All of the complexes that had a missing segment chain longer than fifteen amino acids were removed from the set because building such long sequence stretches could introduce significant structural errors. The rest of the structures were subjected to the TINKER [57] software to add missing hydrogens using the package's pdbxyz.x and the xyzpdb.z modules with three different force field parameters: Charmm27 [58], AMBER [59], and OPLS [60]. We will refer to this set of structures as the non-minimized set.
A common approach in computing biophysical quantities using the 3D structures of biological macromolecules is to refine these structures by performing energy minimization with a particular force field. Following this strategy, we created another set of protonated structures (hereafter referred to as the minimized set) for all of the PDB files described above by running the minimize.x module of the TINKER software between running the pdbxyz.x and the xyzpdb.z modules. The minimize.x module performs energy minimization using the Limited Memory BFGS Quasi-Newton Optimization algorithm [57]. The implicit solvent was modeled using the Still Generalized Born model [61], and the internal dielectric constant was set to 1.0 to be consistent with the corresponding force field parameters [62] used in the calculations. To test the sensitivity of the results, the minimizations were done with Charmm27 [58], AMBER [59], and OPLS [60] parameters. A weak convergence criteria (RMS gradient per atom = 0.1) was applied to make computation tractable. Even in such a case, minimizing 2408 protein-protein complexes, some of which were larger than 50,000 atoms, was quite a challenge from a computational point of view (monomers were not minimized separately and their structures were taken from minimized complexes). Therefore, for these calculations we utilized a high throughput distributed computing resource, CONDOR, originally developed at the University of Wisconsin-Madison http://www.cs.wisc.edu/condor, which is available at Clemson University with more than 2,000 single CPUs of computational power.

Calculation of the electrostatic component of the binding free energy
The electrostatic component of the binding free energy (G el ) was calculated as the difference of the electrostatic free energies of the complex and of the free molecules [63]: where G el (X)is the electrostatic component of the folding energy of the complex, monomer A and monomer B, respectively (X = AB, A or B). The 3D structures of the monomers were taken from the corresponding complex, i.e. no conformational changes associated with the binding were modeled. Since the 3D structures of the bound and unbound monomers are the same, their internal mechanical energies are the same as well and do not affect the calculations. These energies were calculated as a sum of Coulombic interactions and reaction field energies, as discussed in detail in Ref. [64]. The Coulombic energy was calculated in homogeneous media with a dielectric constant equal to the internal dielectric constant of the protein. The reaction field energy was calculated as the energy of the interaction of permanent charges with induced charges at the surface of the corresponding entity -protein complex or a monomer. These energy terms were calculated with the Delphi [64,65] program using a grid size of 129. The calculations were performed as zero ionic strength. The external dielectric constant was kept at 80. The convergence criteria was rmsc = 0.0001 kT/e (see the Delphi manual for details at http://wiki.c2b2.colum bia.edu/honiglab_public/index.php/Software: Delphi).
The calculations were performed assuming that all of the Arg, Asp, Glu, and Lys residues were ionized in both free and bound states, while the His residues were considered neutral. In order to reduce the complexity of the problem, all ionizable residues were kept in their default charge state. Thus, ionization changes induced by complex formation (suggested either by experimental data [66,67] or by theoretical simulations [68][69][70][71]) were not considered.

Parameters that will be varied
Force field parameters Three force field parameters were used in this study: Charmm27 [58], AMBER [59], and OPLS [60]. These force fields are all atom force fields but were parameterized differently.

Internal dielectric constant
What the value of the internal dielectric constant ((in)) should be for a given protein is the subject of much debate in the scientific community [41]. In this study we adopt a pragmatic approach and test the sensitivity of the results at (in) = 1, 2, 4, 8, 20.

Probe radius
The molecular surface in FDPB algorithms is determined by rolling a water probe with a particular radius to find the molecular surface according to Richard's algorithm [72]. A typical probe radius is 1.4A. However, Zhou and co-workers [73] have introduced the concept of a zero probe radius, and here we will explore the effect of different probe radii on the electrostatic component of the binding free energy. The probe radius will vary taking values of R = 0.0, 0.5, 1.0, and 1.4A.

Results
Two types of complexes, hetero-and homo-complexes are the subjects of our study. We will explore how similar and how different are they with respect to their macroscopic characteristics.
Since the physical process of binding is governed by the same forces, it may be expected that their physico-chemical properties should be quite similar. Table 1 summarizes the global and interfacial properties of the complexes used in our study. It can be seen that the interfacial areas of the hetero-and homo-complexes are quite similar in terms of their maximal and minimal sizes. The mean of the distribution of the interfacial area of the hetero-complexes is about 200A larger than that of the homo-complexes. The same is observed for the number of interfacial residues; the mean of the distribution is slightly larger for the hetero-complexes. Minimum and maximum of both the net and interfacial charges are quite similar for the hetero-and homo-complexes. The main difference is observed in terms of electrostatic pairing. By the virtue that homo-complexes are made of identical monomers, all homo-complexes are made of monomers carrying the same polarity charge. In contrast, 40% of the hetero-complexes are made of monomers having an opposite net charge. The same analysis done for the interfacial charges confirms the electrostatic difference between the hetero-and homo-complexes. Only 8% of the homo-complexes form interfaces carrying opposite charges, while the charges of the interfaces are opposite in 58% of the hetero-complexes. The global and interfacial differences of the charges among the hetero-and homo-complexes suggest that electrostatics could play different roles in hetero-and homo-complexes association. Because of that, the results for the hetero-and homo-complexes will be reported separately.

Effects of the internal dielectric constant
In this study, the value of the dielectric constant was varied from one, which is the value usually used in molecular dynamics simulations with explicit water model, to twenty, which was introduced by Gilson and co-workers [46] as the best value for rigid-body pKa calculations. Further discussion of the role and optimal value of the internal dielectric constant is provided in series papers of Warshel and co-workers [41,[74][75][76]. It is well known that electrostatic energy terms, even in case of irregularly shaped objects, are roughly inversely proportional to the value of the dielectric constant. Since the electrostatic component of the binding free energy is made of two major components, Coulombic and de-solvation energies, their difference should also be The files are taken from 40% sequence identity the ProtCom database.
roughly inversely proportional to the value of the dielectric constant. The distribution of the electrostatic component of the binding free energy is shown in Fig. 1 for different values of the internal dielectric constant. These calculations were done with the Charmm27 force field parameters. It can be seen that using a low dielectric constant ((in) = 1, 2 and 4), the mean of the distribution for both the hetero-( Fig. 1a and 1b) and homo-complexes ( Fig. 1c and 1d) is shifted toward positive G el . For (in) = 1 and (in) = 2 in 94% of the non-minimized complex cases, the electrostatics is calculated to oppose binding. Minimization of the structures slightly lowers this percentage to 87%; however, in the vast majority of the cases, the calculated electrostatic component of the binding free energy still opposes binding. The corresponding numbers for homocomplexes are 91% and 85% for non-minimized and minimized complexes, respectively. These percentages are quite similar and show no significant difference between the electrostatic components of the binding free energy for the hetero-and homo-complexes. Further increasing the value of the internal dielectric constant makes G el smaller and smaller, which results in a narrower distribution, but the majority of the energies remain positive. At the largest value used in this study ((in) = 20), in 62% of the non-minimized hetero-complex cases, the electrostatic component of the binding free energy was calculated to oppose binding. This percentage drops to 43% in case of minimized hetero-complexes. Quite different percentages were calculated in case of the homo-complexes. Raising the internal dielectric constant to twenty only minimally changed the percentage of the G el calculated to favor binding; however, in at least 91% of the cases, G el still opposes binding. Energy minimization had a significant effect, lowering the percentage of cases in which the electrostatic component of the binding free energy was calculated to be positive number to 77%. However, electrostatics was still calculated to oppose binding in the vast majority of the homo-complexes.
Why were the calculated distributions of G el using the low internal dielectric constant quite similar for the hetero-and homo-complexes but significantly different using high values of the internal dielectric constant? A possible explanation is offered by the numbers reported in the last two rows of Table 1. From the results in Table 1, it can be concluded that in the vast majority of the cases, the homo-complexes form interfaces that carry the same polarity charge, while in the hetero-complex cases, more than 58% of the complexes have oppositely charged interfaces. Thus, in the vast majority of the cases, the Coulombic energy has a large positive magnitude for homocomplexes (interactions between the same polarity charges across interfaces), while this percentage is much smaller for hetero-complexes. This results in a larger amplitude of the calculated G el for homo-complexes compared with the G el calculated for the hetero-complexes.
Increasing the internal dielectric constant combined with minimizing the energy of the structures could turn some of the small positive G el calculated for the hetero-complexes into small negative G el . However, G el are large positive numbers in the homo-complexes set, and an increase of the internal dielectric constant combined with the refinement of the structures cannot dramatically change the percentage of the cases in which electrostatics was calculated to oppose binding.
The observation made for the hetero-complexes, that their structural refinement through the energy minimization of corresponding complexes results in a change of the sign of the calculated G el , deserves attention. The electrostatics is one of the components of the total energy and the total energy is minimized in the minimization protocol. There is no reason to expect that minimizing the total energy should result in the minimization of the energy components. However, apparently the electrostatic component was actually optimized in the energy minimization protocol. Perhaps this indicates that if a given energy component favors a particular energy state, the minimization of the total energy will most likely result in the enhancement of this energy component as well.
The results of this section are summarized in Fig. 2, where the means of the corresponding G el distributions are plotted against the internal dielectric constant. The calculations were done with the probe radius R = 1.4A. It can be seen that variations of the internal dielectric constant within the range 1.0-8.0 cause dramatic changes in the mean of the energy distributions for all types of complexes. In contrast, an increasing the magnitude of the internal dielectric constant above 8.0 does not cause much change in the calculated G el . Despite the changes of the distributions' mean magnitudes as the internal dielectric constant varies, the mean is always above zero for the homo-complexes and for the non-minimized hetero-complexes. The mean drops below zero only in minimized hetero-complex set when the internal dielectric constant is above 8.0.
The mean of the G el distributions calculated with at probe radius R = 1.4 A plotted as a function of the internal dielectric constant (in) Figure 2 The mean of the G el distributions calculated with at probe radius R = 1.4 A plotted as a function of the internal dielectric constant (in).

Effects of the probe radius
The probe radius in finite-difference algorithm plays an important role in determining both the molecular surface and the internal dielectric space of molecules [17,77]. A small probe radius results in a rough molecular surface and many artificial high dielectric cavities in the protein interior [17,77]. However, getting away from the physical interpretation of the probe radius value, we would like to look at it as a parameter and to investigate how it affects the electrostatic component of the binding free energy. The distribution of the electrostatic component of the binding free energy using an internal dielectric constant of 2.0 and different probe radii are shown in Fig.  3 for both hetero-and homo-complexes. It can be seen that decreasing the probe radius causes similar effects as does increasing the internal dielectric constant (compare Figs. 1 and 3). At probe radii of 1.4A and 1.0A, the distributions of both the hetero-and homo-complexes G el , with and without minimized structures are offset towards positive energies, .i.e., in the vast majority (over 85%) of cases, the calculated energies oppose binding. As the probe radius magnitude decreases and reaches 0.5A and 0.0A, the distributions move to the left, towards negative G el .
However, the distributions of G el for non-minimized hetero-and homo-complexes and for minimized homo-complexes remain mostly at positive energies. At a probe radius R = 0.0A, for 81% of the non-minimized hereto-complexes, the electrostatics was calculated to oppose binding. The percentage of the homo-complexes with positive G el was 94% and 73%, for non-minimized and minimized complexes, respectively. The only exceptions are the distributions of G el for minimized hetero-complexes at probe radii of 0.5A and 0.0A. At a probe radius of 0.0A, for 40% of the minimized hetero-complexes, the electrostatics was calculated to oppose binding. It is interesting to note that similar percentage were calculated for minimized hetero-complexes using a "standard" probe radius of 1.4A, but assigning a high internal dielectric constant of 20, resulted in 43% of the calculated energies being positive.
The observed similarities of the effects caused by increasing the internal dielectric constant and decreasing the magnitude of the probe radius inspired us to investigate a possible correlation between the G el calculated with a "standard" probe radius value but with a high internal dielectric constant and the G el calculated with a small probe radius and a "standard" internal dielectric constant. Here by "standard" we mean values that are most frequently reported in the literature. To address such a possibility, the corresponding G el for each type of dataset, heteroand homo-complexes, non-minimized and minimized, were subjected to the following procedure: G el calculated with probe radius 0.0A and 'standard" dielectric constant of 2.0 were plotted against G el calculated with probe radius 1.4A and (in) = 1, 2, 4, 8 and 20. The plot resulting to largest correlation coefficient and a slope close to 1.0 was considered as the best fit. In case of non-minimized hetero-complexes (Fig. 4a) the best fit was obtained at (in) = 8.0 (Table 2, first row). In case of non-minimized homo-complexes, the best correlation between the calculations with probe radius of 0.0A and 1.4A was obtained at (in) = 4.0 ( Fig. 4b and Table 2  The results are presented for the best correlations among G el calculated with "standard" probe radius of 1.4A versus small probe radius of 0.0 and 0.5A. The best correlations are those resulting in the best correlation coefficient and slope close to 1.0. In obtaining the best fits, the internal dielectric constant e(in) was varied ((in) = 1,2,4, 8,20) in the G el calculations with "standard" probe radius of 1.4A.
third row). The same procedure was applied to minimized structures and the results in terms of the slope of the fitting line and the correlation coefficient are shown in Table 2 for hetero-and homo-complexes. Then the procedure was repeated to find the best fit between G el calculated with slightly larger probe radius of 0.5A and 'standard" dielectric constant of 2.0 versus G el calculated with probe radius 1.4A and (in) = 1, 2, 4, 8 and 20. The best fits in terms of correlation coefficient and slope are reported in Table 2, the last four rows. It can be seen that in some cases, the correlation coefficient reaches 0.87, and in other cases, the slope of the fitting line is close to cases: for the hetero-and homo-complexes and for the non-minimized and minimized. However, the mean of the distribution of homo-complexes at R = 1.4A is much more positive than that of the hetero-complexes, and the decrease caused by lowering the radius is not enough to make it a negative number. In contrast, the change is enough to turn the sign of the mean of the minimized hetero-complexes and to make it negative.

The effects of the force field
The above results were calculated using Charmm27 force filed parameters for both calculations of the electrostatic energies and of the minimization protocol. To test the sensitivity of the results with respect to the force fields, we repeated the calculations with Amber98 and OPLS force fields. This required energy minimization of both the hetero-and homo-complexes with Amber and OPLS force fields, respectively. The results are reported in the Supplementary Materials section, where the G el calculated with Charmm27 are plotted against the corresponding G el calculated with Amber98 and OPLS, respectively. The data points were fitted with straight lines and The mean of the G el distributions calculated with internal dielectric constant (in) = 2.0 as a function of the probe radius Figure 5 The mean of the G el distributions calculated with internal dielectric constant (in) = 2.0 as a function of the probe radius.
the scopes and correlation coefficients recorded. These slopes and correlation coefficients are provided in Tables 3, 4. In addition, we calculated the difference of the G el calculated with different force fields as where X, Y stand for either the Charmm27, Amber98 or OPLS force fields. These differences were calculated for all types of complexes, both non-minimized and minimized. The parameters of the resulting distributions are reported in Tables 3, 4. The corresponding graphs are shown in Additional file 1.
It can be seen that both the correlation coefficients and the slope of the fitting lines are close to 1.0, indicating that G el s calculated with different force fields are quite similar. However, a closer look at Tables 3, 4 shows significant differences as well. The mean of G el calculated with Charmm27 is 5.8 and is 10.3 kcal/mol less positive than that of the corresponding means calculated with Amber98 in the case of non-minimized and minimized hereto-complexes, respectively. The same offset is even more pronounced when comparing the results obtained with the The numbers without parentheses are for non-minimized and the numbers in parentheses are for minimized complexes. The last two rows report the correlation coefficient and the slope of the fitting line of the graphs provided in Additional file 1. The calculations were done with probe radius 1.4A and internal dielectric constant of 2.0. The numbers without parentheses are for non-minimized and the numbers in parentheses are for minimized complexes. The last two rows report the correlation coefficient and the slope of the fitting line of the graphs provided in Additional file 1. The calculations were done with probe radius 1.4A and internal dielectric constant of 2.0.
Amber and OPLS force fields. In the hetero-complex cases, the means differ by 13.0 and 19.0 kcal/mol for non-minimized and minimized complexes, respectively. Lastly, comparing the results obtained with the Charmm27 and OPLS force fields, we observed that the mean of OPLS is less positive as compared to the mean of the energy distribution calculated with Charmm27. Thus, these results suggest that the OPLS force field resulted in less positive G el (less unfavorable energies), and the Charmm27 and Amber98 force fields resulted in the most unfavorable G el in the hetero-complex cases.
In addition to the above observations, in some exceptional cases the G el calculated with different force fields differed by more than 100 kcal/mol. Particular example is tail-associated lysozyme bound to baseplate structural protein GP27 (PDB ID 1k28) for which G el calculated with Charmm27, e(in) = 1.0 and probe radius 1.4A was 261.7 kcal/mol while it was calculated to be 391.1 kcal/mol with Amber98 force field. We have not investigated these exceptions in detail; however, the differences reported in Tables 3, 4 as "Min" and "Max" indicate such cases. In many cases, even a difference of several kcal/mol could be crucial for getting the correct biophysics of the binding.

Discussion
We have done extensive testing of the sensitivity of the electrostatic component of the binding free energy with respect to internal dielectric constant values, probe radius, and several commonly used force fields. The goal was also to reveal the general trends and to elaborate on the role of electrostatics on the binding for hetero-and homo-complexes. The study was done on a very large set of protein-protein complexes: 260 hetero-complexes and 2148 homo-complexes. Energy minimization and full-scale electrostatic calculations have never been done on such a large dataset before. This ensures that the obtained results and conclusions are statistically significant.
A common practice is to validate the results of numerical simulations against experimental data. However, it is impossible to experimentally determine the electrostatic component of the binding free energy. The binding free energy is made of a variety of energy terms whose interplay results in the observed affinity. Even the pH-dependence of the binding is not a pure electrostatic event [71]. The changes of the protonation states of ionizable groups induced by either pH changes or the binding in turn cause conformational changes and thus alter the contribution of non-electrostatic energy terms to the binding affinity as demonstrated in a recent study [71]. The most successful attempt so far to compare electrostatic calculations to experimental data was done by Dong and Zhou [47]. They carried out Poisson-Boltzmann calculations on a set of 64 mutations over six protein-protein complexes and compared their predictions with experimental data. The mutants were selected to involve charge-charge interactions across the interface of the corresponding complex and thus suggested the putative dominance of electrostatic interactions over the rest of the energy terms contributing to the binding. It was shown that the calculations done with a zero probe radius and an internal dielectric constant of 4.0 provided the closest agreement with experimental data. However, as pointed out by the authors, other contributions, such as van der Waals interactions and hydrophobic effect, may play important role [47]. Modeling the non-electrostatic contributions on such a large set of data as ours could introduce significant noise and obscure the results. That is our reasoning for not attempting to compare our calculations to cases in which the total binding free energy was experimentally measured. Proper modeling of the absolute binding free energy requires that the entropy be also accounted for, a task that is not easily achieved, especially on a large set of data.
Despite the fact that electrostatics is only one of the many forces contributing to the binding, in many cases, its involvement could be related to biologically important effects. For this reason, many researchers employ electrostatic calculations to find out the electrostatic component of the binding free energy. However, the value of the dielectric constant, the method of determining the molecular surface (probe radius), and the force field parameters are still a personal preference. Our study demonstrated that the absolute value of the calculated G el is very sensitive to all of the above-mentioned parameters. While this is expected to be the case for different internal dielectric constants and to some extent for different probe radii, the observation that the results are quite sensitive to the force field parameters deserves attention. The average difference of the G el calculated with different force fields can be as large as 20 kcal/mol and more. It seems to us that the energy minimization with the corresponding force field does not make much difference. Since the minimization protocol minimizes the total energy and not just the electrostatic component, the outcome for electrostatic energy will vary case by case. All of these observations indicate that the absolute values of the G el should be considered with precaution.
This study revealed a major difference between hetero-and homo-complexes with respect to the calculated G el . The calculated electrostatic component of the binding free energy opposes binding for approximately 80% of the homo-complex cases, despite the internal dielectric constant, probe radius, and force field. In contrast, the role of electrostatics on the binding of heterocomplexes depends on all of the factors above. Using a low internal dielectric constant and a probe radius of 1.4A, despite the force field used, most of the calculated G el opposes binding.
However, increasing the internal dielectric constant to 20 or decreasing the probe radius to R = 0.0A results in 60% of the cases having a G el favoring the binding. These findings offer a pragmatic prescription for assessing the confidence of the predicted role of G el on the binding affinity. It seems that despite the differences between the homo-and hetero-complexes, if the binding free energy is calculated to be a large positive quantity, then the conclusions could be considered independent from the choice of the parameters. However, cases in which the G el are calculated with a particular set of parameters and corresponding magnitude is a small number close to zero are more complicated. Our study indicates that the calculated G el could easily change sign upon changing the parameters of the protocol or upon switching the force field. Therefore, the conclusions made for the role of electrostatics on the binding in case of a small G el should be carefully investigated by varying the force fields and the parameters of the computational protocol.
Our study, being applied on 2408 protein-protein complexes, offers the possibility to statistically address the question of what role electrostatics play in protein-protein binding. A similar question of what role salt bridges play on protein stability was extensively studied by Tidor and co-workers [78][79][80]. It was concluded that in most cases, salt bridges destabilize proteins. They also investigated the effect of electrostatic interactions on protein binding since salt bridges are also formed across protein-protein interfaces [81]. It was found that charge interactions in barnase-barstar complexes are optimized to favor the binding [25,26]. Schreiber and co-workers [22,82,83] employed experimental and computational methods to investigate the role of electrostatic interactions on barnase-barstar association. It was shown that charge-charge interactions can be re-designed to result in tighter binding [30]. The association rate was also studied on a set 68 transient hetero-complexes, demonstrating that electrostatics plays a marginal role [84]. Recently the association rates of four protein complexes and twenty three mutants were modeled by Zhou and co-workers [85]. It was shown that the calculations with vdW surface presentation (zero probe radius) and non-linear PB equation can successfully reproduce experimental data, while the models with molecular surface presentation (probe radius 1.4A) gave nonphysical results. The role of electrostatic interactions on the stability of monomeric protein was also systematically investigated. Makhatadze and co-workers have shown that so called "complex" salt bridges, in which the anchor residue forms hydrogen bonds with two or more opposite charged residues simultaneously, contribute significantly to protein stability [86]. This observation is valid not only for buried but for surface exposed groups as well [87][88][89][90]. It terms of different types of hydrophilic group, Pace and co-workers demonstrated that Asp and Glu amino acids contribute the most to protein stability [91], however the individual contributions are strongly contextdependent [92]. These observations and the successes in re-engineering of more stable proteins and tighter complexes than the wild type, demonstrate that the current computational methods can adequately model electrostatic interactions in biological macromolecules. However, it should be pointed out that the effect of re-designing is calculated and measured in respect to a reference state, the wild type protein or protein complex. Thus, an increased stability or tighter binding due to electrostatic optimization does not necessary mean that electrostatics plays favorable role. The only conclusion that can be made is that the re-engineering makes electrostatic contribution either more favorable or less unfavorable.
The results of our study indicate that the role electrostatics have on protein-protein binding in hetero-complexes cases is the same as the role salt bridges have on protein stability; i.e., in some cases, it will favor the binding, but in other cases it will oppose the binding. In our previous study [93] we have shown that electrostatic component of the binding free energy tends to be optimized in respect to random shuffling of the amino acid sequence of the corresponding partners. It was also demonstrated that the salt dependence of the binding is not correlated with macroscopic parameters of the monomers [63,94]. Taking together these observations, the findings of above mentioned studies and the results presented in this work, it could be stated that in majority of the cases electrostatics favor hetero-complexes formation. However, the role of electrostatics for each individual case depends on the fine details at the interface as formation of "complex" salt bridges, for example. The situation is quite different for homo-complexes. As was shown, all of the computational protocols predicted that, in majority of the cases, the electrostatics opposes binding. Since it was stated that macroscopic parameters do not matter much for protein-protein binding, this is a surprising observation. However, Table 1 provides an explanation. The homocomplexes differ from hetero-complexes not only by their macroscopic parameters as being made of two monomer carrying the same charge, but their interfaces are also different. Only 8% of homo-complexes have opposite charged interfaces, while this percentage is 58% for hetero-complexes. In remaining 92% of the cases, homo-complexes are formed between interfaces carrying either the same net charge or not having net charge at all. Perhaps, formation of "complex" salt bridges [86] and favorable polar interaction is much more difficult in this case as compared with hetero-complexes.
In this work we have not considered possible protonation changes induced by the binding. It was done in order to reduce computational demand of the numerical protocol. However, protonation changes may be common phenomena in protein binding as demonstrated recently by Jensen and co-workers [95]. Our recent investigation [96] also indicates that from statistical point of view there is 70% chance that binding will cause proton uptake/release of more than 0.5 proton units at pH = 7.0. Adjusting the ionization states according to the predicted pKa's of ionizable groups will definitely make the calculated G el more favorable (or less unfavorable) compared with calculations done with default charge states. However, most of the predicted ionization changes are fractional numbers and modeling of non-integer charge changes requires ensemble approach in computing G el , a task which is quite difficult to accomplish on a set of 2408 protein complexes.
The study revealed that for a significant fraction of hetero-complexes and in vast majority of homo-complexes the electrostatics opposes binding. Since most of the protein-protein complexes in the cell are homo-complexes, the overall role of electrostatics is predicted to oppose binding. Instead, electrostatics provides the necessary specificity and steering, processes equally important for protein-protein association.

Conclusion
The analysis of the sensitivity of the G el , calculated with different force fields, internal dielectric constants, probe radius value and minimization protocols, gives us the opportunity to suggest a set of rules of calculating (a) the absolute value of G el and (b) the sign of G el : (a1) if there is no prior knowledge what the effective value of both internal dielectric constant and probe radius are for a given protein complex and a particular protocol of calculating the G el , then the absolute value of calculated G el is meaningless; (a2) provided that the effective value of both internal dielectric constant and probe radius are known for a given protein complex and a particular protocol of calculating the G el , then the absolute value of G el should be obtained as an average of G el calculated with different force fields; (a3) energy minimization does not help in obtaining consistent G el 's, since it minimizes the total energy, not just the electrostatic component. Therefore, the absolute value of G el should be obtained as an average of G el calculated with different force fields, provided that other parameters are known; (b1) if there is no prior knowledge what the effective value of both internal dielectric constant and probe radius are for a given protein complex and a particular protocol of calculating the G el , then the sign of G el (determining the role of electrostatics on binding) is meaningful only if the absolute G el calculated with either high internal dielectric constant or zero probe radius is larger than 1 kcal/ mol. In case of hetero-complexes, if G el is calculated to be positive and larger than 1 kcal/mol with (in) = 20 and probe radius 1.4A with given force field, then the probability of remaining positive in the calculations performed with different force fields is 0.99 (or 0.97 with (in) = 2). In case of homo-complexes, this probability is 0.95 (or 0.94 with (in) = 2); (b2) if there is a prior knowledge of the effective value of internal dielectric constant and probe radius, then calculations with a particular force field with and without minimization provide a good estimate of G el sign in about 90% of the cases. In case of hetero-complexes, if G el is calculated to be positive and larger than 1 kcal/mol with (in) = 20 and probe radius 1.4A with given force field, then the probability of remaining positive in the calculations performed with different dielectric constant is 0.98. Probability of change of the sign G el using different probe radii is about 0.1. In case of homo-complexes, these probabilities are 0.96 and 0.87.