Feb. 27th, 2015

I am continuing to work on some automated tuning, currently focusing on the evaluation function.

I have usually used a hacked version of Arasan for tuning experiments but have recently made some efforts towards integrating this into the main codebase. The scoring module has been modified so that most parameter values are read from a class. This class has two versions: for tuning, the parameters are variables, so that they can be modified; but the tuning code spits out as one of its results a "const" version of the class in which all parameter values are fixed. For runtime use in the engine the const version is used because it can be better optimized (in many cases the const values can be loaded directly into registers, instead of read from memory). Currently this is off in a feature branch but I plan to eventually merge it into the master branch.

For scoring parameters such as PSQ tables, currently I am generating these from a smaller set of tuning parameters to make the overall tuning parameter set smaller (it is about 240 variables at present, which already would be considered a large scale optimization problem).

I have also rescaled the scoring values so that a pawn has value 1000 instead of 100. And for tuning I am experimenting with some algorithms that use real values instead of integers. The tuning code scales all values to be in the range 0..1. Then when the values are applied to the scoring parameters they are scaled up and rounded.

My current tuning code is based on libcmaes, which implements the CMA-ES algorithm. This has a pretty good record of solving high-dimensional optimization problems. The library includes some variants such as BIPOP-CMA-ES (BIPOP is a restart strategy that tries to avoid being stuck at a local minimum point). I have also looked at some other algorithms, in particular see this very useful page.

I am also still experimenting with some surrogate measures for actual game play, currently using a measure of the variance between evaluation results from a shallow search and actual game results, for a large training set (about 2 million positions).

Too early to tell if this is working. The optimizer does steadily if slowly reduce the objective value, so that is working. But whether it results in better game play is TBD. Also, the mods to the scoring module are considerable, and while I have tried to ensure I have not broken things, I probably need to go back and verify further that it is still performing reasonably with the default values, before resuming tuning.