When I first heard about lattice QCD, I found the idea instantly appealing. Other approaches to particle physics require mastery of some very challenging mathematics, but the lattice methods looked like something I could get a grip on—something discrete and finite, where computing the state of a quantum system would be a matter of filling in columns and rows of numbers.

Those early hopes ended in disappointment. I soon learned that lattice QCD does not bring all of quantum field theory down to the level of spreadsheet arithmetic. There is still heavy-duty mathematics to be done, along with a great deal of heavy-duty computing. Nevertheless, I continue to believe that the lattice version of the weird quantum world is easier to grasp than any other. My conviction has been reinforced by the discovery of an article, "Lattice QCD for Novices," published 10 years ago by G. Peter Lepage of Cornell University. Lepage doesn't offer lattice QCD in an Excel spreadsheet, but he does present an implementation written in the Python programming language. The entire program fits in a page or two.

Lepage's lattice model for novices has just one space dimension as well as a time dimension; in other words, it describes particles moving back and forth along a line segment. And what the program simulates isn't really a quantum field theory; there are no operators for the creation and annihilation of particles. All the same, reading the source code for the program gives an inside view of how a lattice model works, even if the model is only a toy.

At the lowest level is a routine to generate thousands of random paths, or configurations, in the lattice, weighted according to their likelihood under the particular rule that governs the physical evolution of the system. Then the program computes averages for a subset of the configurations, as well as quantities that correspond to experimentally observable properties, such as energy levels. Finally, more than half the program is given over to evaluating the statistical reliability of the results.

QCD on a Chip

Going beyond toy programs to research models is clearly a big step. Lepage writes of the lattice method:

Early enthusiasm for such an approach to QCD, back when QCD was first invented, quickly gave way to the grim realization that very large computers would be needed....

It's not hard to see where the computational demand comes from. A lattice for a typical experiment might have 32 nodes along each of the three spatial dimensions and 128 nodes along the time dimension. That's roughly 4 million nodes altogether, and 16 million links between nodes. Gathering a statistically valid sample of random configurations from such a lattice is an arduous process.

Some lattice QCD simulations are run on "commodity clusters"—machines assembled out of hundreds or thousands of off-the-shelf computers. But there is also a long tradition of building computers designed explicitly for lattice computations. The task is one that lends itself to highly parallel architectures; indeed, one obvious approach is to build a network of processors that mirrors the structure of the lattice itself.

One series of dedicated machines is known as QCDOC, for QCD on a chip. The chip in question is a customized version of the IBM PowerPC microprocessor, with specialized hardware for interprocessor communication. Some 12,288 processors are organized in a six-dimensional mesh, so that each processor communicates directly with 12 nearest neighbors. Three such machines have been built, two at Brookhaven National Laboratory and the third at the University of Edinburgh.

The QCDOC machines were completed in 2005, and attention is now turning to a new generation of special-purpose processors. Ideas under study include chips with multiple "cores," or subprocessors, and harnessing graphics chips for lattice calculations.

Meanwhile, algorithmic improvements may be just as important as faster hardware. The computational cost of a lattice QCD simulation depends critically on the lattice spacing a; specifically, the cost scales as 1/ a^{6}. For a long time the conventional wisdom held that amust be less than about 0.1 fermi for accurate results. Algorithmic refinements that allow ato be increased to just 0.3 or 0.4 fermi have a tremendous payoff in efficiency. If a simulation at a=0.1 fermi has a cost of 1,000,000 (in some arbitrary units), the same simulation at a=0.4 costs less than 250.