Craig has been experimenting with openMP which is included in gcc v4.2 and above. Since openMP uses a shared memory model the trick is to find places where multiple threads can be used which do not trash each others' memory. A valid case is the loop over responses in Model::fold. This speeds up the code if there are multiple datasets in the same datagroup. It involves changes to Model.cxx as well as MultiResponse.cxx and RealResponse.cxx.
Test was with a dual-processor machine running Solaris. Note that the -xopenmp compiler flag required either no optimization or -O3.