In the fast crowd that I am increasingly identifying with — protein folding physicists — success is measured by boredom. What seemed a minor miracle only nine months ago is now routine. A case in point: my week with 1YU5 (headpiece domain of chicken villin).
1YU5 is a 67-residue protein for which the PDB coughs up 337 litemotifs per constraint (on average), about 43% of which are solvent contacts. It’s still too small for the folders in the big leagues to take notice, but the right size to study the effects of the numerous parameters I have at my disposal. Moving forward, with each new protein I will increase the residue count with the goal of reaching 100 residues by the end of the year.
One way that these folding experiments are boring — in a good way — is seen in the behavior of the ADMM constraint discrepancy with iteration count:
The fluctuating value, in blue, is never too far from the current best value, in red. In other words, the journey to zero discrepancy is roughly monotonic. Contrast that with the behavior of the discrepancy in NP-hard problems, where this time series fluctuates in a quasi-steady-state for long times before it finds a solution. Quasi-monotonicity of the free energy with physical time is the “folding-funnel” hypothesis. But let’s not get carried away: the dynamics of the ADMM algorithm and the platform on which it operates is highly non-physical.
Monotonicity of the discrepancy, when it eventually finds its way near zero, is a good thing because it adds a degree of predictability. From the rate of descent I can estimate how long the algorithm will run to produce a solution. There are exceptions, of course. The most extreme I saw in my sampling of 12 random starts shows the algorithm apparently recovering from two bad choices:
I will summarize the results of my experiments with 1YU5 with the same pair of plots I introduced last week. First the scatter between the RMSD values in the final litemotif constraints (horizontal) and between the folded and native backbones (vertical):
The green point is what I get when the algorithm starts with the native fold. I’m not too concerned about the poor correlation between the two RMSD values because, except for three, the backbone RMSD values are very good. This is confirmed by the CASP-style percentage-below-distance-cutoff plot:
The best fold (lowest yellow curve, green is the native-fold start) had the more monotonic discrepancy time series shown above. Several of the folds were off the native fold just at one end of the backbone. This was the case with the best fold too:
Here is a list of the four parameters that seem to to make the most difference and the values I used for 1YU5:
For stronger constraints we want 1 and 2 to be small, 3 to be large. And I’ve argued previously why 4 should be small. But 0.002 is ridiculous!
My life with proteins may not be as boring as I had originally thought.