Proteins take a holiday

One would think that now, with the unfathomably infinite expanse of free time that has opened up with the close of another semester, my folding project would switch into high gear. Had I known, about one year ago, that I would even be working on this project and that it would look this promising, then yes, that is what would be happening right now. As it turns out, my summer is solidly booked with prior commitments:

  1. NSF workshop on “Physical, Engineering and Biological Limits to Brain Measurement”.
  2. Five lectures at the Park City Mathematics Institute on “Model building in statistical mechanics”.
  3. Four weeks of consulting at CCR La Jolla.
  4. Lecturer at a Swedish summer school on "Imaging with X-rays and Neutrons in Life- and Material-Sciences”.

I may also be a speaker at the Congress and General Assembly of the International Union of Crystallography in Montreal. While I find the idea of being in Congress with Union members intriguing, my participation (for a fraction of one day) is pending the waiver of their exorbitant registration fee.


As you’ve noticed by now, none of these activities have anything remotely in common with protein folding. And so, to minimize their combined effect on the continuity of my project, I have planned ahead.

The plan, which is nothing short of brilliant, is to press pause precisely when all the infrastructure is in place and working perfectly! I can’t imagine better circumstances for picking up the thread, after those lazy days in the Swedish sun.

In case you need reminding: 

  • The folding program is completely debugged and runs blazingly fast and successfully on “rigged” input files, that is, litemotifs derived from just the target structure.
  • Thanks to Alex Alemi and Hyung Joo Park, we have an easily searchable library of PDB-harvested litemotifs poised to create input files for any primary sequence one might care to fold.

Yes, it’s very hard to resist making that first “unrigged” run. But that would void my brilliant plan, because of the very real possibility of things not working out as hoped. So in lieu of a season finale, I present the three ways that the folding project can meet disaster, or at least a hiccup, when it gets underway again in the fall:

  1. incomplete library
  2. constraint weakness
  3. combinatorial complexity

I’ve ranked these according to likelihood, based on no hard facts whatsoever. All three scenarios have a particular signature in the RMSD time series we saw for the first time in last week’s post.

When the litemotif library is incomplete it simply means a good fraction of the local structures in the target have never appeared in any other protein. The RMSD will fluctuate forever, never becoming small, because no combination of litemotifs is consistent with the target primary sequence. And in contrast to scenario 3, the solver will not be satisfied (and slightly embarrassed) when initialized with the known structure, but instead will tear it apart because of inconsistency with the available litemotifs.

Scenario 3 plays out almost exactly as scenario 1 when running the algorithm, only that the known structure will be recognized as an approximate fixed point. Here the problem is that there are too many possibilities for the algorithm to explore.

Finally, in scenario 2 the RMSD time series will rather quickly settle down to a small value, signaling that a fold satisfying all the constraints has been found — only that this fold, and pretty much every other fold obtained from random initial conditions, is not the correct one! This means litemotifs are simply too weak as constraints. The resulting folds might “look” like reasonable proteins, and yet lack important characteristics not encoded in litemotifs.

I’m as excited as you are to find out which of these comes to pass. In the meantime, have a great summer!