Sept. 15th, 2015
So, after some more bug fixes, it appears the tuning method is working. I set it to re-evaluate the PVs every 16 iterations, and set the step size back to 1 ("grid-adjacent updates" as it is termed in the paper) for all parameters. Then I let it run for 50 iterations overall. The hyper-bullet rating before tuning was 2349 (on my arbitrary scale) and after re-running with the tuned parameter set, the rating increased to 2360, which is well outside the standard deviation of about 4 ELO. Note: I am not currently using regularization. Bonanza uses L1 regularization for most parameters and L2 regularization for piece values.
I am planning to merge the branch that has the tuning code back into master and push it to github. Next step would be to make more parameters tunable and maybe try a longer tuning run. Also TBD: applying regularization. And it is possible some of the pruning that Arasan does in its search (especially late move pruning) may affect this tuning method, so it might be worth the experiment of disabling some of that while learning.
Sept. 9th, 2015
I now have the MMTO tuning method working with Arasan. It is not tuning all parameters yet, just piece-square tables, mobility, and a couple other things.
One difference from the implementation in Bonanza is that I am not using a temporary file for the PVs, but am storing everything in memory. Pvs are stored as packed 16-bit arrays for space saving. When running, the tuner is using about 7GB of memory, which is ok, since my big server box has 24G.
I am currently re-calculating the PVs every 10 iterations. In between I update the parameter values assuming the PVs do not change. I am using a step size of 1 for most parameters but if the parameter range is large I use range/100 and decrease it gradually as the iteration count goes up. TBD: trying some other optimization methods such as AdaGrad.
For a training set, I have a collection of about 40,000 games, which are a mix of strong correspondence games, long time-control computer games (mostly from Playchess), and GM-level human games.
It is yet to be seen if tuning parameters this way actually improves gameplay but the method looks promising. I should know soon: I plan to do a comparison test between the pre- and post-tuning values.