Cracks in the litemotif fortress

Longtime followers of this blog know that this project has had to confront two very different challenges. In the first two months, starting February of this year, I considered schemes for defining local structure constraints, or litemotifs, using the information in the PDB. This was followed by about three months of developing a constraint satisfaction algorithm that tries to fold a protein according to the litemotifs derived from its residue sequence. Even up until last week it looked like I was practically in the home stretch: the algorithm was finding correctly folded proteins from litemotifs, and my work had settled into tweaking parameters and smoothing a few wrinkles.

Now I’m convinced that the wrinkles are actually deep fissures, and without a major intervention I’m afraid the project could fragment to its very core.

I will describe the problem as it materialized while trying to fold the 87-residue protein 1ULR. Unlike previous experiments, I had to significantly “stretch” the litemotifs in two ways just to stabilize the native fold. Letting the algorithm ignore some constraints, to safeguard against never-seen-before motifs, is one form of stretch. However, I had never needed to set this number as high as 20 (out of the 84 positions along the backbone). Second, I increased the cutoff distance on side-chain contacts from 5 to 6Å in a desperate effort to inflate my store of litemotifs. Although these actions stabilized the native fold, the constraints were now so weak and ignorable that, starting from scratch, 1ULR was folding into many different shapes, none of them even close to native.

Folding algorithm 1, constraints 0

I could abandon the project right here and claim that the PDB at this point in time, no fault of mine, is just too small to support my wonderful litemotif scheme. Or a respectable fallback position could be that right now success is hit-or-miss, based on the spotty sampling of litemotifs in the PDB. I will keep these options in mind while exploring another strategy for extending the reach of litemotifs.

I had considered this idea much earlier, when the precise definition of litemotifs was still fluid. Worried that the PDB would yield too few hits on a typical 4-residue sequence, the idea was to allow for imperfect residue matches. I revisited that idea and spent the better part of an afternoon this week staring at side-chain contact geometries, such as this one here:

The red, orange, yellow and green groups of atoms (rendered with the methane radius) are the side-chains of valine, alanine, leucine and valine along a 4-residue piece of backbone (white cylinder); the side-chain of the contacting residue, also valine, is shown as white. Two side-chains make contact when there is a pair of atoms within the 5Å cutoff distance, and a litemotif must have two such contacts. In this example, the white valine makes contacts with the side-chains at positions 2 (orange) and 4 (green) of the backbone. In this particular example, of VALV making contact with V, what struck me is how irrelevant the identities of the other two residues (red and yellow) appear to be. It seems that one could attach pretty much any side-chains at positions 1 and 3 without affecting the contacts made by 2 and 4. This was the case in essentially every one of the many dozens of contact geometries I examined.

Reversing my conclusion in an earlier post, I now believe it is incorrect to record the geometry above as an example of VALV making contact with V; it should really be *A*V making contact with V. When half of the residue symbols in a litemotif are the wildcard *, many more litemotifs can be applied to any 4-residue sequence. If the sequence we are folding contained VALV, for example, then the litemotifs for V*L*, V**V, *AL* and *A*V would all serve as constraints. This increases the number of litemotifs per constraint by roughly a factor of 1600 over what I had available without the wildcard option.

The integrity of the fortress hangs in the balance, but I’m feeling confident about that factor of 1600. We should know by next week, when I revisit the stability of 1ULR.