June 18th, 2015

Some notes on my testing setup: I have a combination of shell and Python scripts to run tests on multiple machines. Although Arasan is multi-platform, I don't plan to make these scripts portable to environments other than Linux, since that is what all my test machines run.

The command "scatter" propagates a new version of the program to all the testing machines. The command "match" runs a standard set of test matches, for example: match 450 0:04+0.1 runs 450 games per core at a time control of 0:04+0.1.

Once the games are complete, the "gather" command collects game files from all the machines, runs BayesELO over the game set and produces a rating list.

This is very simple to run, but it depends on all machines having cutechess and other tools installed, and the machine list and opponents are hard-coded in scripts, so they are not very flexible. Some day I may clean this up so other people with a different setup could use the scripts more easily.

For automated tuning the current github code has a program called "tuner," which can accept a list of parameters to tune and it will use a specified optimization algorithm to find the optimal values. Currently it is set up to call out to an external script (a variant of "match") to get a rating number, but it could use some kind of internal code. It depends on a specially compiled version of Arasan called arasanx-64-tune, which has variable instead of fixed parameters and accepts Winboard options that can change parameter values (this could be used with CLOP too, but I don't run that anymore).

I have also done some experimenting with the Stochastic RBF optimzer as mentioned last month. That is a Python program that is designed to interface with a module that evaluates some function; for this I have yet another modified version of "match" that runs matches on multiple machines and returns a rating. This module has to be edited to select the parameters to tune.

I have to say though that most of the progress I have made recently is from trying things manually vs. running the optimization code.

June 4th, 2015

I had an initial try at tuning king safety using a fairly complex model with a lot of parameters but more recently I simplified it and had some modest success with tuning that simpler model. Currently I am tuning the scaling table that is more or less a sigmoid function: it initially gives a low value to king attacks, then as more and larger pieces attack gives progressively greater scores, then eventually tapers so that additional attacking pressure increases the score only modestly. I will commit these changes soon.

Next up is probably some material trade scoring. I have recently been examining this position:

trade down position

Arasan 17.5 scores this as -2.24 pawns for White. This is almost certainly wrong: without the pawns it would be a draw, and with the pawns it is probably still a draw (they are doubled and not far advanced, so White will find it hard to advance them). Arasan "understands" this is a near-draw situation, but the scoring code that handles that case is doing the wrong thing. I have some experimental mods that change the eval to -1.12 for this position, but it probably needs further tuning.

Arasan will play (remotely) in the 48th CSVN Programmer's Tournament the weekend of June 13/14.