What is Folding@Home ?
Folding@Home is a project run by an academic institution (specifically the Pande Group, at Stanford University's Chemistry Department), which is a non-profit institution dedicated to science research and education.

PROJECT GOALS: Solving the protein folding problem

Understanding how proteins self-assemble ("protein folding") is a holy grail of modern molecular biophysics. What makes it such a great challenge is its complexity, which renders simulations of folding extremely computationally demanding and difficult to understand. (See Scientific Background for more details about what are proteins, why do they fold, why this is so difficult, and why do we care).

Our group has developed a new way to simulate protein folding ("distributed dynamics") which should remove the previous barriers to simulating protein folding. However, this method is extremely computationally demanding and we need your help (see below). We have already demonstrated that our distributed dynamics technique can fold small protein fragments and protein-like synthetic polymers. The next step is to apply these methods to larger, considerably more important and complicated proteins. Unfortunately, larger proteins fold slower and thus we need more computers to simulate their folding. While the alpha helix folds in 100 nanoseconds, proteins just a little larger fold 100x slower (10 microseconds). Thus, while 10-100 processors were enough to simulate the helix, we will need many more to simulate these larger, more interesting proteins.  

To achieve a significant speedup, we need lots of processors in a given run. Also, since a single run does not tell us much, we need to simulate several runs (10 runs would be a good start) per protein. Thus, we need lots of processors. By running our client that uses the Mithral CS-SDK, you can lend us your machine for as long as you like. The client allows you to run for as little or as long as you like. Even a single day's worth of running is helpful to us.

1. Protein Folding, Misfolding, and Aggregation:

How proteins self-assemble into their native state (responsible for biological function) has been a much studied problem for over a decade. Progress has been made into how simple models of proteins fold as well as means to design protein sequences de novo. However, these models ignore much protein detail which is likely crucial for understanding how real proteins fold. Thus, the current challenge lies in understanding how particular chemical detail in proteins (such as hydrogen bonding and hydrophic interactions) lead to particular protein folding mechanisms.

We have developed techniques which allows us to make fundamental advances in simulations of protein folding, by speeding atomistic simulations 100 to 1,000 times. This speedup allows us to simulate tens of microseconds and thus simulate the folding of the fastest folding proteins in all-atom detail. However, these methods are extremely computationally demanding, and require 1000's to 10,000's of computers. To solve this problem, we have released our software as a screen saver and have gathered over 10,000 collaborators who run our software. This project, called Folding@home has already lead to great initial results (the folding of proteins in atomistic detail on the microseconds timescale) and we are now continuing to use this technique on other systems as well, including the folding of RNA and non-biological polymers as well as the aggregation of proteins associated with diseases, such as Altzheimer's and Mad Cow (see below).

2. Protein design and structure prediction:

We have also started another distributed computing project to use protein design to generate new "virtual genomes." Our project, Genome@home, studies real genomes and proteins directly, by designing new sequences for existing 3-D protein structures, which come from real genomes. The protein structure files that are sent out as work contain the Cartesian atomic coordinates of a protein. This data was obtained experimentally through X-ray crystallography or NMR techniques. Note that this was not done by us; thousands of scientists have spent decades compiling this data, which is generously made freely available to the public. By designing new sequences that could form these specific protein structures, we're setting the stage to attack a number of significant contemporary issues in structural biology, genetics, and medicine. For example, the Genome@home data will be used to:

  • Try to unravel a fundamental issue in the "protein folding problem" (which itself lies at the heart of a huge amount of modern biomedical research): the fact that thousands of different sequences can all form the same three-dimensional structure.
  • Predict the functions of newly discovered genes and protein structures. Modern approaches to structural biology, known as "proteomics" or "structural genomics", often solve protein structures without knowing what the proteins do. Because techniques for function prediction tend to work best with large amounts of sequence data, a virtual library of sequences for a new protein structure will be an invaluable resource.
  • Potentially design and make new versions of existing proteins for use in medical therapy.

 

3. RNA Folding:

While protein folding has garnered much attention over the last decade, RNA folding has received much less interest. From a theoretical point of view, one reason for this is the large molecular weight of RNA chains and role of electrostatics and counter ions in RNA folding. However, with recent techniques developed in protein folding, we have started to tackle the RNA problem.

We are currently collaborating with several experimental groups at Stanford (Herschlag, Doniach, and Chu) to combine and compare our simulation results to experiment. This allows us to validate our simulations and allows one to refine the experimental data to yield more information about the structure and nature of folding.

 

4. Folding of biomimetic heteropolymers:

Can we apply our understanding gleamed from our study of proteins and RNA to design protein-like heteropolymers -- heteropolymers which can fold into particular structures? If so, how do these polymers fold as compared with proteins? Finally, can we take advantage of new polymer architectures, such as branched chains, in order to design synthetic polymers with novel folding and material properties?

 

5. Lipid vesicle fusion:

Lipid membranes also play a fundamental role in biochemistry, serving as the structural units which encapsulate cells, organelles, viruses, etc. In particular, lipid membranes must fuse in order for such systems to combine (endocytosis) or detach (exocytosis). This physical process is also a first order phase transition, but is heavily mediated by proteins in biological systems. We are currently studying how lipid vesicles fuse with and without the affect of biological machinery (fusion peptides and proteins; see below).

 

 

(B) FUNCTION

1. Ligand binding and drug design:

One of the biggest challenges in computational drug design is the accurate calculation of the free energy of binding of small ligands. Currently, typical errors in these calculations make them unusable to distinguish between strong binders (which would potentially make good drugs) and non-specific binders (which wouldn't). We are using distributed computing methods to greatly increase the accuracy of such calculations.

 

2. Fusion peptides:

Fusion peptides catalyze lipid vesicle fusion. What do they do to help speed fusion? We are addressing this question using coarse grained (see left) and atomistic simulations.