Divide and Concur

Given the very promising developments of last week, it is not surprising that I have become the target of a conspiracy. And while it’s true that productivity always takes a big hit on the week leading up to a midterm exam, or when setting up a new laptop, why did these have to coincide?

My activities have mostly been curtailed by the Machiavellian licensing tactics of Wolfram & Co., whose Mathematica licenses, in my case, are machine specific. Without the password code to activate my license on the new laptop, I was unable to access any of the data and tools I’ve developed over the last several weeks. Those solvent-atom derived local structures I was excited about will just have to wait.


To make good on my promise of posting at least once per week, here is a kind of primer on the algorithm that will be used to fold proteins. Everything in this blog up to now has dealt with the constraints that define a valid fold. These constraints get passed to a very general type of algorithm, best known by the name coined by my former student Simon Gravel: Divide and Concur.

Let’s start with “divide”. Suppose we partition the protein primary sequence into all the overlapping subsequences of length four: 1234, 2345, 3456, etc. Our structure library puts constraints on the α-carbon geometry of the corresponding residues, as well as the α-carbons they make contact with elsewhere on the chain. The α-carbon of residue 4, for example, makes an appearance in each of four partitions of the backbone, as well as any constraint where it plays the role of the contact atom. Rather than try to have the α-carbon of residue 4 satisfy all those constraints, the idea behind “divide” is to have multiple versions of this atom, divided-up one per constraint. Satisfying any of the “divided” constraints is easy because they involve at most 5 atoms.

As you guessed, “concur” is not just a casualty of Simon’s French-Canadian English stumbling over “conquer”. In the solution we want all the versions of the α-carbon of residue 4 to be at the same position, that is, to concur. Both operations, divide and concur, are designed to minimize distance in the space of configurations. In the case of concur this is especially simple: replace each atom-version-position by the mean position of all the versions.

One would think that the divide and concur algorithm, with its distance-based operations, is only suited for problems with continuous variables. So it started out as a great mystery (and continues to this day), the observation that this algorithm is remarkably good at solving even the classical hard problems of computer science, all of which are expressed in discrete variables.

The constraint satisfaction problem best known to the public is sudoku. Lance Fortnow, in The Golden Ticket: P, NP, and the Search for the Impossible, gave a 25x25 puzzle to get across the concept that it is the growth in difficulty with size that (in part) defines a complexity class. Ironically, Fortnow’s puzzle turns out to be very easy (after fixing the obvious typo in the first row). Here is a much harder puzzle I created:

 0  6  0  2  0 21 16  0  0  0  0  7 17  0 11 10  0 13  0 25  1 15  0 18  8
14 13  3  9 25  0  0  0  0  0 10  0  0  0  0  0  0  0  0  0  0  0  0 16  4
 0 18  0  0 20 15  0  5 10 23  4  0  0  8  0  0  0  0  0  2  0  0 21  6  0
 0  7  0  0 10 25 18  8 12 11 21  0  0  0  0 14  0 15 19  0 17  0 13 22 23
 1  0 21 15 22  0  0  0 14 24  0 12  0  0 16  0 11 23 17 20  0  3  0  0  0
23  0  0 14  0  0  0  0  0  0  0  0  1  0 22 20 15  7  0  4  8  0  0  0  0
17  0  8  0  0  0  6  0 20  0  0  0 24  7  0 21  9  0 13 16  0  0  2 11  0
 0 10  9  4  0 13  0  0  0  0  0 25 21 16  0  5 18  0 23  0  0 14  0  0  0
16 20 22  0  0  7  0 12  0  0 15  0  0 10  0  0  0 24  0  0  5  0  0  0 25
 0 19  6 18  0 23 21  0  0  0  5  0  0 20  8  0 22 25  0  0  0  4  0 24 13
 0  1  0  8  0  5  0  0  0 22  0  9 16 15  0 12  0  0 11 13  0  0  4  0  3
 0  9 18 13 24 20 14  4 16 15  0 23  2 11  3  0  0  0  1  5 19  0 12  8  0
 0  3  0  0  0  0  0  0 11 17  0  0  0  1 14  0 20  2 24  0  0 16  7  0  0
 0  0  0 20  0  0 10  0  0 13 17  0  0  0 21 16  0  9 22  8  0  1  0 15  0
 0  5  7  0  0 19 12  6  0  3 13  0  0 24  0  0  0  0  0  0  0 11 17  9  0
21  0 16  0 13  0  2  0  0  0  0  8  0 22  0  0 19  0  0  0  0 24  0  0  0
 0 23  0 12  7 24  0  0  0  0  0 13  0  4  0  0  5  0  0 14 18  0 22  0 21
 8  0  0 10  0  0  0  0  0  0  7 17  9 14  2  0  0  0 20 24  0  5 16 19  0
 0  0 24 19 15 11  0  0  0  0  0 21  0  0  0  0  0  0  0 18 13  8  6 20 14
 0 14 25  6  0  3 15 22  7 21 24 16 18  0 20  0  0 17  8  0  0  0  9  0  2
 0  0  0 23  0  9  0 24  0  5  8  0  0  0  1  0  0  3  0 17 10  7 15 21 16
 0  0  0  5  8  0 22  0 13 12  6  0  0  3  0  0 16  0  4  0  0 23  0 14 11
 0  0  0  0  9  0  3 23 19 18  0  0  0  0 15  0 24  0  0  0  0  0  0  5  0
 0  0  0  0  0  0  0  0  0  2  0  0 23 21  9 13  0 10  0 15  0 19 18  0 17
 0  4  0 24  0  1  0  0  0  0  0  5  0 17  0 18 23  0  0 19  0 12  0  0  9

The 0’s get replaced by numbers 1-25 so each row, column, and 5x5 block is a permutation of 1-25. How can divide and concur be applied to this puzzle?

The first step is coming up with a smart way to represent the puzzle with numbers. For 25x25 sudoku the trick is to use a 25x25x25 cube of numbers. Think of the third axis of the cube as a vertical axis, as in a building with 25 floors. In each of the 25x25 cells, on each of the 25 floors, we have either a 0 or a 1. There is just a single 1 in any vertical column of cells, and the floor of that 1 is the number that appears in the 25x25 sudoku puzzle. For example, in the puzzle above, we see that there will be a 1 on floor 9 above the right-bottom corner cell.

How many versions of the 25^3 numbers in our sudoku “building” will we need, in order to “divide” the constraints so they are all easy to solve? And what does it mean when not all the numbers are 0 or 1? I’ll leave that for you to ponder.