March 1st, 2015
CMA-ES tuning code is pushed to github now, but is not currently performing as well as the hand-tuned code in the master branch.
March 14th, 2015
As I wrote in a recent post on talkchess, my first experiments with automated parameter tuning using CMA-ES were not sucessful (did not pass validation with my standard fast time-control matches against other programs).
I have done some more work to bring the tunable version closer in line with the eval in the master branch. I also temporarily took some large tunable structures such as piece-square tables out of the tuning set, since I don't think the functions that generated these were very accurate.
I am now trying another tuning run using this time a set of positions from the last big 36,000 game set of matches I ran. These are lower-quality games because of the fast time control but likely have more varied positions than those drawn from higher-quality games.
Still using CMA-ES for tuning although I have looked a few other algorithms. A recently published interesting one is ROCKstar.
March 16th, 2015
Ok, that last experiment was bad too. I gave it about 1100 function evals, took about a day and a half. Rating before tuning = 2321, rating after tuning = 2294.
March 17th, 2015
RockStar code is integrated now and I am experimenting with that. Unlike CMA-ES, it does not do a bunch of evaluations in one iteration before selecting another batch of points; instead, it does one evaluation at a time. The objective decreases steadily but not monotonically: sometimes the next eval is a little higher than the previous one.
I have also recently made some changes to my standard validation test, which runs a large number of games against varied opponents. I tried a faster time control (game in 5 sec + 0.1 sec increment) and that worked fine, except that it appeared looking at the games that some were adjudicated too early. Not sure why, but I disabled win/loss adjudication in cutechess (it will still adjudicate draws).
After these changes I did another test run and fed FEN positions from that run into the tuner. I also disabled the hash table in the searcher when doing tuning. With these changes now the starting error was 0.505, which was lower than before. I am running another tuning session now and then will try to validate the results. If this does not work I am going to try actually using the game match results in the tuning objective, which I was doing awhile ago with NOMAD.March 19th, 2015
The problem with RockStar is, it is an unconstrained optimizer. When I looked at the early-stage tuning results they were obviously wrong: for example, some passed pawn scores were negative. I didn't try to verify these parameters. I am pretty sure they would be worse than the somewhat optimized starting point. But the objective function was lower with these!
I could fix the constraint problem by adding constraint penalties to the objective. However, based on my previous experiments with CMA-ES, I think by now I have sufficient proof that minimizing game prediction error from low-depth search is an unreliable method, having seen that lowering this objective does not translate to greater succcess in game matches.
It is possible further optimization time might eventually give a good result: from the literature I have read, less than 100 * dimension function evaluations is considered quite a low evaluation budget for nonlinear optimization. Still I do not trust this tuning method, with its surrogate objective.