Computational study of molecular interactions and design
Molecular interactions provide the essential language of biochemistry. Gene regulation, signaling and other physiological communication processes are controlled by networks of such interactions. It is natural to examine interactions in terms of the individual behaviors, which are largely dictated by structural features and random fluctuation. Because kinetic rates can be altered by environmental conditions such as temperature, and the basis for this is largely structural, the system-wide behavior is intimately tied to structural detail. Our goal is to use mathematical models to explore both structural and systems characterizations of protein association.
Predicting Protein-Protein and Protein-DNA Interactions
A fundamental question in biochemistry is how the structure of molecules determines their function. Predicting or determining a protein's three-dimensional fold is one piece of the puzzle. Given structures for a pair of interacting proteins, the next question is how they bind one another. This is just as complex as predicting a protein fold, though many fewer structures have been determined experimentally, and we have yet to find an equivalent to the fold classifications and homology-based approaches that have proven successful in folding.
Our approach to predicting the structure of molecular complexes is based in mathematics and physics. We have developed an energy function that can distinguish native binding configurations from non-native ones. This is coupled to numerical methods for finding the global minimum of this energy function. We have successfully employed underestimation and basin-finding schemes toward this objective. Starting with a sample of local minima, or stable states, from the energy landscape, a function is fit closely beneath them. For basin finding, a multi-funnel Gaussian function seems to work well, whereas for our later optimizations, a more simple quadratic function suffices. Through an iterative process, this type of fitting, along with subsequent domain trimming, has successfully identified near-native bound states for many examples.
Interface Hot Spots and Binding Energetics
Another of our research objectives is to characterize the effects of mutation upon binding affinity and specificity. Some amino acids make a large energetic contribution to binding, and these residues can be more sensitive to mutation. Our approach combines biochemical and geometric considerations, such as hydrophobicity or precise shape match, to predict these ``hot spots.'' Various features are analyzed to determine which of them have the ability to distinguish hot spots from other interface residues.
To maximize the predictive capabilities of the model, automated statistical learning methods were applied to a data set consisting of binding energetics data for alanine substitutions within protein interfaces of known structure. The result is a model able to predict when substitution to alanine is likely to cause a binding free energy increase of at least 2 kcal/mol. We have also used the model to predict when other types of sidechain substitutions will enhance or inhibit binding. Since rate constants reflect a statistical average of system states and energies, this allows one to reengineer rate constants in nontrivial ways.
To make our computational tools most useful to bench scientists, we have created a server to help researchers analyze hot spots within protein systems of interest. Using Jmol, we created an interactive display with which hot spot residues can be explored. Our server is currently available online (http://kfc.mitchell-lab.org) after much testing and development. Several demonstration jobs can be viewed without an account.
High Performance Computing for Biology
The computing costs encountered in structural biology can be considerable. Our optimization codes in particular can require quite extensive computing resources. In the past, we have addressed these needs using large supercomputers and clusters of machines. However, we are presently working on some exciting new innovations. Modern graphics cards can perform many of the types of calculations required by our codes, and these graphics processing units (GPU's) can perform up to hundreds of calculations at once. By combining a number of cards within a single machine, a low-cost and space efficient supercomputer can be created.
We are currently adapting our docking optimization codes to run on GPU's. Our goal is to design the codes to be as simple as possible, so that they can serve as a starting point for others to utilize this emerging computer architecture. Our first endeavor has been to adapt our desolvation energy model for this hardware. Based on use of basic design principles, we are able to achieve a speedup of more than 300 when using two graphics cards. Thus, we are already seeing the benefits of developing for this emerging computing framework. In the future, we will work to develop our own GPU-based codes and to collaborate with other groups toward converting existing packages for this type of computing environment.