Arasan Programmer's Guide - version 20.2

Arasan Programmer's Guide - version 20.2

by Jon Dart


Arasan is a chess program.

Usage of the source code is governed by the MIT license: see license.txt for details.

Arasan includes a console-based chess engine that be used with Winboard, a separately available interface for chess programs, or with UCI-compatible interface programs such as Arena and Shredder. In addition, a custom Windows-only user interface program for Arasan is available; it communicates with the chess engine using a pipe.

Arasan is written in C++. Versions before 6.0 ran only on Windows. In version 6.0 and higher, the Arasan chess engine supports both Windows (32- or 64-bit versions) and other platforms such as Linux, but the Arasan GUI is still Windows-only. If you are not on Windows, you need to use a program such as xboard to view the chessboard and play games with Arasan. You can get xboard from http://savannah.gnu.org/projects/xboard and many other locations.

Arasan has mostly been tested on Intel & AMD processors but the code is designed to be portable to other processors (there is some assembly code in bitboard.h, but that is not used unless you compile with -DUSE_ASM).

The remainder of this file contains information for use by programmers reading or working on Arasan source code. I assume that you have a working familiarity with C++. Also, if you have no background in computer chess, you should probably start by reading some of the reference material mentioned at the end of this document.

Building Arasan

The source distribution for Arasan includes sources and makefiles for building the chess program (arasanx), and also source for the following other console programs:

makebook - builds binary opening book files from text (PGN) files

makeeco - builds ecodata.cpp from from "eco" text file

ecococder - adds ECO codes to a PGN file

pgnfilter - samples PGN files, writes EPD records to stdout

playchess - filters PGN games, removing those where end eval differs from result (and short games)

tuner - automatically tunes scoring parameters

Following is a sketch of the Arasan source directory tree:


arasan
  20.2
    src     - source code for the chess engine and related tools
    gui     - source code for the Arasan user interface (Windows only)
    book    - text files for opening book and ECO recognizer
    doc     - documentation
    tests   - test files and documentation
    tools   - misc. utility scripts
    prj     - Visual C++ project file and build directories (Windows)

Note: the source codes for Windows and Linux are identical, except that there are different format Makefiles, one for Windows nmake and one for GNU make.

Building Under Windows

The release binaries for Arasan were built with Visual Studio 2015 Community Edition. The engine binaries were built from the command-line using NMAKE, and the Arasan GUI was built with the Visual Studio project file.

Recommended build method

I recommend building the engine with the supplied makefile. This is the most reliable and tested method. This assumes you have Visual Studio or at least the command-line tools (Microsoft SDK). You need a recent version with C++ 11 support. Visual Studio 2012 and later should work.

If you want tablebase support, you should download and install the tablebase support code (available separately) and edit the makefile to select the type(s) of tablebases used and to point to the tablebase source. Note: as of version 19.0, Arasan includes Syzygy tablebase support by default, but to get Gaviota or Nalimov tablebase support you need to configure that.

To build the chess engine with VC++ and PGO, the easiest way is to use the command line and execute:

nmake clean profiled

Release binaries from a build are placed in the "<target>\Release" directory, where <target> is "win32" or "win64", while the debug objects and binaries are put into "<target>\Debug".

The default target is 64-bit Windows. For a 32-bit build, edit the Makefile to select TARGET=win32 and run the same build commands.

If you have a processor that has the POPCNT instruction (introduced with SSE 4.2) then you can use "nmake popcnt" to produce a build that uses that instruction (but it will only run on SSE4.2 capable processors). This only works for a 64-bit target. The executable will be named "arasanx-64-popcnt.exe" and will be placed in the release or debug directory (depending on build type). To build with both POPCNT support and PGO, use "nmake popcnt-profiled".

To make a XP-compatible build, edit the Makefile to set PLATFORM=XP and be sure to also use the 7.1A SDK includes and libraries, as detailed here. Visual C++ users should read this.

You can also use the Intel C++ compiler but a recent version is required, at least version 13.0. Use the alternate Makefile if your compiler is Intel C++, like this:

nmake -f Makefile.icl popcnt-profiled

Note: Intel C++ does not support Windows Xp.

The makefiles have a separate "tuning" target that will build utilities used for automated parameter tuning (see the Parameter Tuning section for details).

To build the Arasan native Windows GUI, use the Visual C++ solution file in the "gui" subdirectory. (The command-line Makefile does not build the GUI). You cannot build the GUI with the Express Edition of Visual C++, because it lacks the required MFC libraries. The Community Edition is ok, though.

Other build methods

  1. The Windows source distribution includes Visual C++ 2015 solution files for the chess engine and tools and GUI in directories "prj" and "gui", respectively. But I use the "prj" solution file mostly for debugging, not for building the final release.

  2. Arasan can also be built using the Cygwin development tools, which simulate a Linux environment under Windows. Use the Linux source package for building with Cygwin. I do not recommend or support MingW.

Building Under Linux or Mac OS

See the INSTALL file in the doc subdirectory of the Linux/Mac distribution for current build instructions.

Opening book

Arasan stores the opening book in a binary file (book.bin).

Since book.bin is supplied with the Arasan program distribution, you do not need to build this file in order to build Arasan. However, if you want to modify the contents of the opening book, you will need to edit the ASCII source for the openings and rebuild book.bin, and for this you will need to build the "makebook" program that is part of the source distribution. Following is further information on creating a book file.

The book for Arasan is converted from text files to the binary book by the console program makebook.exe.

The input files used to generate the opening book are in Portable Game Notation (PGN) format. More than one PGN file can be specified as input.

When multiple files are used, makebook treats the first one specially. The first file is used for manual tuning of the opening book weights. This file can be annotated using standard PGN comments and Numeric Annotation Glyphs (NAGs). A file with such annotations is supplied with the Arasan source and is named "basic.pgn".

In the annotated PGN file, an explict weight can be specified by putting a comment with the string "weight: " followed by a numeric value after a move. Weights are between 0 and 100. A weight of 0 means that the computer will never play the move, but will respond to it if it is played by its opponent. The higher the weight, the more often the computer will choose the move compared to the other alternatives in the book. By default, if no weight is specified, moves get a weight computed from the overall frequency and win/loss ratio for each move.

PGN files can also have numeric annotation glyphs (best to use an editing program such as ChessBase to insert these). If present these have the following meaning in the context of "makebook:"

!! - a very good move, give it the maximum possible weight.
! - a good move, weight it 50% higher than normal.
!? - a worthwhile move, weight it 25% higher than normal
? - a dubious move, do not play it
?? - a blunder, do not play it.

The "normal" weighting is computed from the move's frequency plus win/loss percentage, except that if a move is in the first annotated file but rarely occurs elsewhere it is given a neutral default weight that can be further modified by annotations.

Glyphs can also be used to assign an evaluation to the end of an opening line and if so this evaluation modifies the weighting of moves along that line.

Arasan comes with a modest-sized book that was built from a combination of a small hand-tuned book file (basic.pgn) and a fairly large collection of high-quality human and computer games. There are over 500,000 moves in the book.

You can make your own book file using the makebook utility. Typical usage would be like this:

makebook -n 100 -o book.bin basic.pgn big.pgn

The -n parameter to makebook specifies how many "index pages" are in the binary book file. You may have to experiment with this parameter. Too small a value will cause an error building the book. Too large a value will waste space by creating a bigger book.bin than is required.

You can also specific the "-m" parameter to makebook with a number, to set a minimum number of times that a move must be played in a game collection to be included in the book. The default book shipped with arasan was built with "-m 4". Other options include:

  • -p <number> - sets maximum ply depth for moves extracted from a PGN file
  • -o <filename> - sets output file name (default book.bin)
  • -v - show more verbose output.

See bookdefs.h for some documentation about the data layout within the book.bin file.

Learning

Arasan has positional learning (a.k.a "permanent brain"). It is basically a persisent hashtable. If a search returns an unexpectedly high or low score, the position and its score are stored in a text file called arasan.lrn, which is located in the same directory as the Arasan executable. When the next game is started, stored positions from this file are read into memory and stored in the hash table, enabling the program to detect danger or opportunity sooner than it did previously.

Arasan learning does not work in UCI mode at present, for several reasons.

ECO Recognizer

Arasan can produce an ECO (Encylopedia of Chess Openings) code for a given game. As with the opening book, the mapping of chess positions to ECO codes is contained in a text file. The file is called "eco" and is in the "misc" directory. It contains a series of lines, each starting with an ECO code, followed by a series of chess moves, and ending with a quoted string that contains the English name of the opening.

The "makeeco" program reads the "eco" data file and outputs to stdout a C++ file that is then compiled into Arasan. The generated file is called "ecodata.cpp" and contains a single large C data structure.

Because Visual C++ makefiles don't handle generated source files like this one, you need to run "makeeco" manually whenever you change the "eco" data file.

makeeco takes a single argument: the location of the eco file. It writes the generated ecodata.cpp file to stdout.

Note: the ECO recognizer is pretty crude right now, because Arasan does not contain all the possible ECO lines and sublines in its data file. Therefore transpositions that wind up in an ECO subline like C18/3 but skip the position that is stored for C18 may not be recognized correctly. Transpositions across opening systems may also be missed. There is no general solution for this problem, since the ECO classification scheme is ambiguous in some cases, but recognition could be improved by expanding the number of positions in the eco file. If the number becomes large enough, however, the current method of building a data structure and compiling it into Arasan might have to be changed.

Testing support

Arasan includes several features to aid debugging. If you compile the source for debugging (with -D_DEBUG), checks are inserted for accessing arrays past their boundaries, as well as some other sanity checks. If any of these checks fail, an error box will be displayed with the assertion that failed.

At runtime, adding the "-t" (trace) flag to the arasanx executable will cause it to output trace messages (beginning with '#') detailing its internal operations. This is mostly for Winboard/xboard. These messages will show up in the Winboard debug log if Winboard is started with the "-debugMode true" flag.

If the source is compiled with /DUCI_LOG, then a similar log file named "ucilog" is produced on disk when running in UCI mode. This however doesn't have exactly the same information shown by "-t".

The chess engine supports a couple of commands to aid with testing. The "eval" command followed by a filename will read the file (assumed to be in FEN format) and output the static evaluation of the position. If the EVAL_DEBUG and/or PAWN_DEBUG defines are uncommented in scoring.cpp, then you can see detailed output of the scoring components.

The "test" command followed by the name of a test suite file in EPD format will do a search on each position in the file and output the results to stdout. Additional switches can be added after the filename:

  • -v - prints more verbose output
  • -t <seconds> searches for the specified number of seconds per position
  • -x <count> can terminate the search early if the correct move is found and held for "count" plies.
  • -o <file> stores test output in "file".

If the search module is compiled with -D_TRACE, arasanx will print out copious information about the search process when it is run (if running multi-threaded, only the main thread is traced). See search.cpp to see what information is output and where it comes from.

Test Suites

The following test suites are provided with the source code (see the "tests" directory):

  1. wacnew.epd contains the 300 problems from Reinfeld's "Win At Chess" book. These are mostly easy tactical problems (a few are hard). This test suite is widely used by computer chess programmers. "wacnew" is a revised version with a number of corrections and additions to Reinfeld's oroginal solutions.
  2. ecmgcp.epd contains a subset of test positions from the Encyclopedia of Chess Middlegames, selected and corrected by Gian-Carlo Pascutto.
  3. bt2630.epd is a set of 30 chess problems, with a range of difficulty. This is used to determine an approximate rating for the program. Standard procedure is to allow 15 minutes for each problem. Add the time needed to find the correct answer (900 for problems that are not solved), divide by 30, and subtract from 2630. Note: some corrections from Dann Corbit have been applied to this file.
  4. lapuce2.epd is a set of 35 tests from the French chess magazine La Puce Echiquenne. This is another test that purports to estimate a program's rating. Standard procedure is to allow 10 minutes for each problem. See lapuce2.doc for the scoring procedure.
  5. arasan20.epd is a set of test positions from Arasan games, plus some from other sources. They range in difficulty, but most are non-trivial and some are quite hard. This is the latest version of the test suite. Some positions in this file are "avoid move" positions where there is a bad but superficially tempting move.
  6. iq4.epd is a set of positions from the book "Test Your Chess IQ". Jim Monaghan selected and corrected some tests from this book. I have made some further modifications to the test and this version, which I use, is in the file "iq4.epd". These tests are run at 10 seconds per position.
  7. pet.epd is a set of endgame tests from Peter MacKenzie, the author of the freeware chess program "Lampchop". I have applied a few corrections to Peter's original test suite.
  8. eet.epd is a set of endgame tests from Walter Eigenmann. See http://www.beepworld.de/members38/eigenmann/e_e_t.htm.

The results file in the tests subdirectory summarizes Arasan's performance on these test suites. There is a Perl script (tests.pl) in the tests directory that generates a summary of test results given a log file produced by the Arasan "test" command.

Perft

A built-in command in the engine can be used to run the "perft" command for testing. The command "perft" should be followed by a number indicating the ply depth for the computation.

Unit tests

If compiled with -DUNIT_TESTS, Arasan will run a set of tests on program startup (this flag is off by default). These tests verify that a set of critical functions operate correctly, verify that evaluation results are symmetrical between White and Black sides, and also run and verify the perft test over a set of positions.

Automated parameter tuning

Arasan since version 19.0 has some support for automated parameter tuning.

Tuner program

The Makefile has a "tuning" target that builds a "tuner" executable that when run will generate the parameter file (params.cpp) containing evaluation weights used by the Arasan scoring module.

The tuner implements a variant of the Texel tuning method, based on logistic regression. (Formerly Arasan also supported a tuning method called MMTO described by Hoki and Kaneko (see references), but the current version no longer does).

The input file should be a EPD (Extended Position Description) file with a diverse set of positions. These do not have to be quiescent (i.e. can have possible captures, checks and/or promotions). The tuner program expects each position to have two EPD tags:

  • The "c1" tag should hold the Arasan castling status (see board.h) as two integers, the first for the White side and the second for Black. For example c1 "0 0" means both sides can castle either side. (This tag is needed because Arasan scores positions based not just on whether castling is possible, but whether it has occurred).
  • The "c2" tag holds the game result as one of the following strings: "0.0" (Black win), "0.5" (draw), or "1.0" (White win).

The "tuner" program expects an EPD file to be specified on the command line. For each position it will conduct a shallow-depth search. The principal variation (PV) from this search will end in a quiescent position. These positions are stored in memory and and their static evaluations are used as a proxy for the initial position's eval. The tuner has the option of periodically re-computing the PVs, since as the parameters change so potentially do the PVs (this is similar to what MMTO does).

The tuner tries to minimize a measure of difference between the predicted game results (which are computed using a sigmoid function of the evaluation scores) and the actual game results. It does this using one of several gradient descent methods. The code explicitly computes the gradient of the evaluation function, rather than approximating it as some programs do.

The tuner will output two files: by default the first one is named "x0" and contains the tuned parameter values; the second one is a compilable version of the parameters named "params.cpp."

The following command-line options are supported:

  • -c <cores> run multithreaded with the specifed number of cores.
  • -d just write out default parameter values to the output file.
  • -f <output .cpp file>
  • -i <input parameter file> specify the name of a file containing starting values for the parameters (can be the x0 file from a previous run). Without this some reasonable starting defaults will be used.
  • -x <output parameter file> specify the name of the output file (instead of "x0")
  • -n <iterations> how many iterations to use in tuning
  • -o adagrad|adam|adaptive select optimization method
  • -O ordinal|msq select objective function
  • -R <interval> periodically recalulate PVs
  • -V validate gradients (not compatible with multithreading)

Some further notes on the options: The default objective is "msq", the mean squared error between the predicted result value (sigmoid function of the evaluation) and the actual result value. The tuner also supports Ordinal Logistic Regresssion based on the three possible discrete game results, but this is not really tested.

Three iterative optimization methods are supported. The first is "adaptive," which is a simple momentum based approach described by Geoffrey Hinton. The tuner also supports the AdaGrad method (see Duchi et. al.) and the ADAM method (see Kingma and Ba). ADAM is currently the default method.

Generating the training file

The Arasan source distribution contains some tools used for generating the labeled EPD file used for parameter tuning.

One program is called "pgnselect," located in the "util" subdirectory. This program reads a PGN input file and produces EPD output to stdout. The EPD output is based on a randomly selected subset of the positions encountered during the game. If the -r flag is used, then for each sampled position, a randomly selected legal move will be made (this trick was used in the neural-network based program "Giraffe" (see references). It produces more varied and unbalanced positions that just game sampling would). The EPD output will contain the "c1" tag with the castling status, but no result tag.

Other switches supported by the "pgnselect" program:

  • -max <int> - max ply to sample (default: 150)
  • -min <int> - min ply to sample (default: 30)
  • -d <int> - min ply distance between samples (default: 3)
  • -r - insert random moves
  • -s <int> - max ply distance between samples

There is also a Python 3 script in the "tools" subdirectory called "label_positions.py". This will take as input the EPD produced from "pgnselect" and then will label each position with a game result, based on a game in which both sides are played by a selected chess engine at a short time control.

This script relies on cutechess-cli being installed and the path to that executable must be set in the script. Note: it has only been tested on Linux, although it should work on Windows, too.

The script takes two command-line switches:

  • -e <name> - name of the chess program in the "engines.json" file that cutechess-cli uses.
  • -c <cores> - number of concurrent matches to run

Following the switches the EPD file path should be specified. Output from the script goes to stdout; errors are written to stderr.

Algorithms and data structures

The chess board

Following is some information about the algorithms and data structures used by Arasan. If you are new to computer chess programming, I suggest first reading a general work on the subject such as Frey (1983) or Marsland and Schaeffer (1990).

The chess board in Arasan is represented by an array of 64 squares, laid out so that square a1 has the value 0 and square h8 has the value 63 (Note: versions before 11.0 had a different layout with a8=0).

Each square contains 0 if it is empty, or a piece identifier if it is occupied. Black pieces have identifier values between 1 and 6, while White pieces have values between 9 and 15. A special value (127) is used to represent a square that is uninitialized or invalid.

The Board class also maintains several "bit boards" or quantities that that hold 64 bits. The Bitboard class in the source encapsulates a bit board. For example, the occupied bit board has one bit set for every piece that is on the board (there are actually two such bit boards, one for Black and one for White).

Each type of piece has its own bit board that has one bit set for each piece of that type (for example, there is a rook_bits Bitboard to hold rook locations). Since there is only one king location, though, this is kept in an integer variable.

Besides the bit boards, there is some other information in the Board structure. The enPassantSq variable holds the square position at which an en passant capture is possible (if none is possible, it has the value IllegalSquare). The castleStatus array holds an enum for each side indicating whether castling has occurred. Also, if the king or a rook has been moved, making castling on one side or another impossible, castleStatus is set to an appropriate value.

Each board position also has a hash code associated with it. The hash code is 64 bits and is computed by fetching, for each piece and square combination, a unique 64-bit code from a table of random numbers, and computing the exclusive or of these codes. (This hashing mechanism for chess was invented by Zobrist - see references). The low-order bit of the hash code is then set to identify whether White or Black is to move. Castling status and en passant status are also folded into the hash code, because positions with the same piece layout but different castling rights or possible en-passant captures must be kept distinct.

Moves

Arasan uses a 64-bit word to store move information. Each move contains a start square, destination square, promotion value, the type of piece being moved, the type of piece being captured (if any), and the type of move (normal, castling, en passant, etc).

Attack Generation

Earlier versions of Arasan used rotated bitboards for computation of attack information. The rotated bitboards allow computing which pieces attack a square without using any loops in the code, just shift and mask operations. Starting with version 11.0, a different approach is used: it produces much the same benefits but is faster. This approach uses a technique called "magic bitboards", which basically computes a hash code that looks up attack information, in a way that ensures there are no invalid hash collisions.

There are several ways to do this, but the algorithms and data structures used in Arasan follow an approach initially published by Pradyumna Kannan. The constants for the 64-bit version were constructed by generation code posted by Tord Romstad.

The magic bitboard code is primarily in file attacks.cpp. There are two versions of the code: one uses 64-bit multiplication and is best suited to running on a 64-bit architecture. The other uses only 32-bit multiplications. One or the other is automatically selected at compile time based on the target processor architecture.

Move Generator

The move generation logic is mostly contained in the MoveGenerator class and uses the magic bitboard attack functions.

The MoveGenerator class has separate routines to find all moves and to find just capture moves - the latter is all that is required in the quiescence search. Move generation is done incrementally, because if a move is found that causes cutoff, there is then no need to generate the rest of the moves for that position. Specifically, Arasan uses the "Fancy Magic Bitboards" technique described in the Chess Programming Wiki (see References), which was invented by Pradu Kannan.

Move generation occurs in this order (this is for positions where the side to move is not in check, and in the regular search, not the quiescence search):

  1. The principal variation move if one is available (see description of Search module).
  2. Capture moves and pawn promotions are generated, and sorted. In this phase only apparently winning captures and promotions are included. The move generator initially sorts captures by MVV/LVA (most valuable victim/least valuable aggressor). Then, when moves are being actually selected for search, a static exchange evaluation is done on those moves with non-positive MVV/LVA score. Moves that have a negative SEE score are deferred to the losing capture phase. This approach minimizes calls to SEE, which is expensive.
  3. "Killer moves" are returned if available. A killer move is a non-capture, non-promoting move that is valid in the current board context and previously caused beta cutoff at the current ply. Killer moves are indexed by piece type and dest square.
  4. At this point all non-capturing moves are generated. The previous move is used to probe the "refutation table". This is a simple replace-always table that holds moves that have refuted another move (by causing the search to fail high). Refutations are indexed by the previous move that was made by the opponent. If the move list includes a move found in the refutation table, then that move is returned first before any of the history moves and is flagged as belonging to the refutation phase.
  5. All other non-capturing moves are returned, in sorted order. The ordering is based on a score derived from the move history and countermove history.
  6. Losing captures are searched.

Normally, the move generation process includes moves that are illegal because they place the side to move into check - these are weeded out in the search routine.

If the side to move is in check, a special function (generateEvasions) is called that strictly checks moves for legality. It is very important to know whether any legal moves are possible when in check: if there are none, the side to move is checkmated. Also, some search extensions depend on the existence of a forced move (one single legal move). Currently evasions are generated in two phases: first the hash move is tried, if there is one, and then only if that fails to produce cutoff is the full set of evasion moves generated.

Move ordering at the root node is done somewhat differently. The first time moves are generated at ply 0, a rough sorting by score is done. The next few plies are searched with a wide window, partly to facilitate "easy move" detection (a move that appears to be much better than alternatives may be selected after a shorter than usual search, so for example the program does not waste time on an obvious recapture). But another side-effect of the wide window is that we obtain scores for all moves and Arasan will sort the moves for the next iteration based on these scores.

After the "wide window" part of the search is over, the next iteration is searched putting the best move from the last iteration first. If another move then becomes best, it replaces the previous best move and all other moves are shifted down, so that their ordering from the previous iteration is retained in the next iteration.

Searching

Arasan uses an alpha-beta search algorithm with a variety of search extensions. The search class is the largest single module in the program, and is necessarily rather complicated, but I have tried to structure it and comment it so that it is understandable. I will assume that the reader knows the basics of the alpha-beta algorithm, and will concentrate on describing this implementation of it.

In general, the search routine tries to terminate a search tree, or some portion of one, as soon as possible, and will defer as much work as possible until it is certain that no earlier and quicker termination can be done. The techniques for doing this are mostly well-known and there is nothing very original about the search algorithms used by Arasan. However, as with most chess programs, there is a fine balance between terminating a search too soon and extending it into unprofitable and very unlikely lines of play. The precise nature of this balance depends not only on the search algorithms used, but also the relative efficiency of operations such as move generation, position evaluation and move ordering. Each program therefore strikes this balance in a somewhat different way.

The entry point for a search is a routine called findBestMove. This function does some initialization, and then calls ply0_search, which implements the alpha-beta search algorithm. The search proceeds one ply (half move, i.e. move by one side) at a time. That is, first a one-ply search is done, then a two-ply search, then three, etc. until either the maximum ply limit has been reached or the time control has been exceeded. Each search uses the results of the preceeding search. The variable "iterationDepth" holds the current nominal ply depth for the search. However, the presence of search extensions means that some nodes may be searched to a greater depth than this.

In the first few iterations of the ply0 search, Arasan now uses wider than usual search bounds, so that each move gets a preliminary score. The scores are used to order the moves for deeper searching, and for "easy move" detection: if one move is found to have a significantly higher score than all others, and if subsequent searching still selects this move, then the search may be terminated early.

ply0_search does some other special processing and bookkeeping because it is at the top of the search tree. This function then calls search() to recursively process lower-depth nodes.

The first step in search() is to check if the current board position is drawn, due to insufficient material, a 3-fold repetition of moves, or the 50-move rule.

If the position is drawn, move_search calls the function drawScore. Usually a draw is given a score of 0, but when playing on a chess server, the relative rating of the opponent is also factored in, so that draws against a lower-rated opponent are penalized.

Arasan will also terminate the search immediately if the absolute maximum ply depth is reached. This is quite unlikely.

If no draw is present and the maximum depth hasn't been reached, the next step is to look in the hash table (further described in the next section), in order to see if an identical position has been visited before. This may happen due to a transposition of moves that lead to the same position, or because a previous search to a shallower depth visited the same node. If a hash table entry is found and if it contains a valid value (i.e. one that did not cause cutoff), then that value is returned immediately and no further searching from that node occurs. In other cases, the hash table may not contain an exact value, but may hold an upper or lower bound that can be used to narrow the alpha-beta window.

If the hash table lookup didn't produce an exact value or narrow the bounds enough to cause cutoff, then we may also try the endgame tablebases, if they are installed and enabled, and if material is reduced enough that they can give an exact score.

In non-PV nodes where the side to move is not in check, Arasan first tries a couple of techniques to decide if the entire node can be pruned away, before doing any searching. The first of these is "static null pruning", which cuts off search for nodes that have very high scores.

Arasan may then try "razoring". Razoring takes place when the static score is very low. In this case a quiesence search is done and the search terminates if that also produces a low score (alpha or below).

If still no cutoff has occured, we then try a further trick to get a fast termination of the search. The side to move is changed without altering the board position and the opposing side is then allowed to move. Of course, this could not occur in a game - a player is not allowed to "pass," but must move. However, the theory is that if the null move causes cutoff, then the side to move must have a good position, since in effect giving the opponent a free move still produces a high value for the side to move. In this case, beta cutoff is allowed to occur and no more searching is done from this node.

Starting in Arasan 2.0, null move pruning is applied in subtrees that are themselves part of a null move search, provided that two null moves are not tried in a row. This is known as the "deep null" algorithm. See Donninger's article in ICCA for more information on this algorithm.

A null move is searched to a depth less than that used for regular moves. Arasan uses at least a so-called R=3 depth reduction for the null move search: in other words, a search is done with the regular 1-ply reduction in depth plus three extra plies. (At high depths, a depth-dependent amount is added to the reduction factor.)

After a null move produces a beta cutoff, Arasan does an extra reduced-depth verification search to ensure that the null move cutoff is actually valid. The verification search uses reduced depth, but doesn't insert the null move first.

After null move search, but before the regular search, Arasan now applies a pruning mechanism called ProbCut (see Buro).

If neither the null move search nor ProbCut cause cutoff, then we must actually do some searching from the current node.

The first move searched is called the "principal variation" move. In the case of an initial search (e.g. a one-ply search), the principal variation move is just the first move returned by the move generator. Otherwise, at ply 0, it is the highest-scoring move from the previous search iteration. At deeper plies, the hash table is queried and if a best move has been stored for the position, that move is tried first and is considered the principal variation.

In cases where there is no hash move, we do a shallow search to obtain a suitable move to try first. This is called "internal iterative deepening" and has been used in Hitech (see Ebeling's book) and also Bob Hyatt's program Crafty.

Now (in most cases) we should have an initial move to try (if not we generate all moves and take the first one). We make the move, then query the attack info for the board to see if the side to move is in check (remember, the move generator typically does not exclude moves into check). If a move into check is found, the special value Illegal is returned and the next move is tried. If the move passes the legality check, then move_search is called again (i.e., it is recursive).

Normally each move searched reduces the "depth" variable by a constant (DEPTH_INCREMENT). When depth reaches zero, the quiescence search will be entered (see below).

However, some moves are searched to a greater depth than normal. There are several cases in which this occurs:

  1. If the move checks the opponent, the search is extended, provided that the checking move does not lose material (according to SEE) or is a discovered check.
  2. Pushing a pawn to the 7th rank causes the search to be extended, on the theory that this pawn may soon promote.
  3. If a safe capture is done and the last opponent piece has been captured, the search is extended.

Note that since Arasan 5.0, most extensions can be a fraction of a full ply. Search "depth" is normally decremented by the constant DEPTH_INCREMENT every time a new ply is begun; however, extensions can decrement it faster, typically an additional amount between 1/2 of a full ply and a full ply.

Individual extensions can be combined, but at any given ply, the total depth reduction from extensions cannot exceed DEPTH_INCREMENT, the equivalent of one extra ply.

The principal variation phase of the search is over when we have found a legal move and searched its descendants (including quiesence nodes) so that we have a value for it. It is possible that this value will be greater than beta, which means that we have set the initial search window wrong and must repeat the search with a different window.

Assuming the principal variation move does not cause cutoff or mate, then the search function proceeds to search the remainder of the moves. These moves are searched with a zero-width alpha-beta window (i.e. beta is set to alpha+1). All such searches will cause a cutoff. If the value returned by the search is between alpha and the original beta, then the search is repeated with a wider search window to determine the correct score. It is generally faster to get a fast cutoff and then re-search with a wider (but not infinite) window than to do a single search with unlimited bounds.

Note that since the principal variation move is usually obtained from the hash table or the root move array, it may be the case that the move generator has never had to be called during the principal variation search. If so, we call it before doing the non-p.v. moves.

Arasan now optimizes things further by only generating part of the moves at a time. That way, if cutoff occurs, the remainder of the move generation can be skipped.

Arasan also performs several types of pruning and depth reduction. These are the opposite of extensions: instead of searching deeper, Arasan either cuts off the search prematurely (pruning) or reduces the search depth.

Arasan uses "late move reductions." After the first few moves have been searched, including any killer moves, if the move is not extended, and if it is not considered potentially dangerous or advantageous, then the search depth is reduced. This can occur even at high depths in the search tree. If the reduced depth move returns a score above alpha, it is re-searched with normal depth. This technique became popular due to its use in Fabien Letouzey's program "Fruit".

Arasan also does several different kinds of pruning. Several tests are made on each move and if the move was extended for any reason, or if it is a capture, special move (like castling) or advance of a passed pawn, then it is not pruned.

Note that it is necessary to search at least one legal move before any pruning is done, because we need to distinguish nodes at which no moves are searched due to stalemate from those at which all nodes would cause pruning. Also no pruning is done at the root node.

Futility pruning in Arasan is similar to that described by Ernst Heinz for the program Dark Thought. It allows some moves to not be searched, if it determined that the side to move is behind in material and the move to be made is not likely to gain enough material to be worth considering.

Futility pruning is applied at relatively low search depths before the quiescence search and during the quiescence search. First the program computes an optimistic score for the move and adds a margin that varies based on search depth; the sum is then compared against alpha and the move is pruned if the optimistic score + margin is still below alpha.

Late move reduction is done before futility pruning and the futility margin is based on the reduced depth (if a reduction was done).

Arasan also does "late move pruning." In non-PV nodes, quiet moves that are in the history phase of move generation (after the pv move, winning captures, and killer moves) and are relatively late in the move order can be pruned away. The theory is that if we have not already achieved a cutoff searching more moves is not likely to produce one.

In addition at depths just before the quiescence search, moves that appear to lose material based on a static exchange analysis are pruned. This is done with more liberal pruning conditions than other types of pruning.

The final part of search() checks to see if checkmate or stalemate occurred, updates the hash table, and maintains the best variation. It also updates history scores, but only if the best move is not a promotion or capture. The best move from the search gets a history bonus. The remaining moves are given a penalty. Scores are maintained in a table (per thread) that is indexed by piece type and destination square. In addition, Arasan now maintains a countermove history table (idea from Stockfish). This is indexed by the piece type from the previous move, previous dest square, current move's piece type, and current move's dest square. Both history and countermove history scores are used in move ordering.

When the search terminates at ply 0, it updates the "Statistics" structure with the time and other information about the search (key parts of this structure are also updated during the search when the p.v. changes, so the UI and test suite code can monitor it).

Quiescence search

As noted above, each recursive call to move_search decrements the "depth" parameter by a constant (DEPTH_INCREMENT). When "depth" drops below zero, Arasan enters the quiescence search by calling function quiesce(). Like search(), this is also a recursive search routine.

As the name implies, the goal of the quiescence search is to reach a relatively "quiet" position that can be more or less accurately evaluated. Generally, "quiesce" will only generate and search capture moves that appear to gain material, promotions of pawns, and moves that escape from a check.

If in check, all evasion moves are generated although some "quiet" evasions may be pruned and not searched.

The quiescence search for positions not in check also does forward pruning on moves, dropping any that appear to lose material based on the static exchange evaluator and those that fail a futility test.

Generally, forward pruning costs time (especially if the static exchange evaluator, "see", needs to be called), and involves some risk of dropping valuable capture moves. But, on the plus side, it significantly trims the size of the search tree.

The quiescence search terminates when no more moves of an appropriate type are available. The search may also terminate early if the side to move is not in check, and the current evaluation of the position is enough to cause cutoff. The theory here is that the side to move can choose to not capture any further. If the current evaluation is good enough to cause cutoff, there is no need to try captures and promotions to get a better score.

In version 10.0 and later, for nodes where it is not in check, if no cutoff has occurred so far, the quiescence search will generate and search moves that give check, at least in the first couple of quiescence search plies. But checks that appear to lose material are pruned.

The hash table

The search routine uses a hash table for storing the results of evaluating previously visited positions. This table is implemented in several static functions defined in hash.cpp. The hash table is basically an array of lists. Each list contains a series of nodes, each of which contains some data (in the case of the search engine, a class of type Position_Info) and a pointer to the next node. Each list holds entries that hash, modulo the hash table size, to the same value. Each node contains the whole hash code, so that finding a given node to match a given hash code consists of indexing into the hash table, then following the list until the full 64-bit hash codes match.

Besides the hash code, each hash entry also contains the score for the node, a set of flags indicating whether the value is exact, an upper bound or a lower bound, the depth of search used to evaluate the node, a word holding the castling status and en passant square, and the best move for the position.

The hash table is limited in size and may fill up during a long search. In this case, we have a choice: when a new position is encountered, we can overwite an existing entry in the hashtable with the new position, or we can discard the information for the new position and not put it into the hashtable.

Arasan will generally only replace entries that have greater depth than existing entries, or entries that came from an earlier search (i.e. whose "age" field does not match the current search).

The hash table is not cleared after each search: instead, it is kept full, but old entries (from the previous search) are considered candidates for replacement.

The size of the main hashtable defaults to 64 Megabytes, but can be overridden with -H switch, followed by a size (such as '256M'), or by modifying the arasan.rc parameter file. Standard Winboard or UCI option commands can also be used to alter the hash table size at runtime.

Because multiple threads can be reading and writing the hash table, a "lockless hashing" technique is used to prevent conflicts (as done in Crafty). When a hash key is stored it is xored with the data value. When retrieving, the current data contents for a hash table entry are xored again with the stored hash key and this is used to match the hash key for the position. If another thread has overwritten or is overwriting the data field, then the comparison will fail and a match will not be returned.

Besides the main hash table, some smaller hash tables are used by the Scoring module to store the results of pawn structure scoring and king cover calculation. These are relatively small in size and are allocated on a per-thread basis (each thread has its own tables).

Position Scoring

There are roughly five main components to the positional score used by Arasan:

  1. Pawn structure
  2. Development
  3. Castling
  4. King safety
  5. Endgame

The positional score is typically within the range of plus or minus the value of a pawn (1000), but can be greater in some circumstances.

Pawn structure scoring is done in two stages. First, the pawn hash table is probed to get the score for the position. If the position is not found in the hash table, then the pawn_structure_score routine is called. This routine only computes scoring parameters that depend only on the position of pawns. It also calculates several items that are useful in the rest of the evaluation.

The second-stage pawn scoring takes into account the relation between pawns and pieces.

The pawn structure score penalizes isolated, backward, weak, and doubled pawns. An increased penalty is given if there are multiple such defects in the pawn structure.

A bonus is given for passed pawns. If a passed pawn exists, its value depends on its rank and also on whether squares ahead of it are occupied or controlled by the opposing side. A bonus is given for a rook behind a passed pawn. Blocked passed pawns and passed pawns on the rook file are given a penalty.

In the opening and middlegame, the pawn structure score includes a center control component that gives bonuses for occupation or control of the center by pawns.

Arasan uses a pawn hash table, maintained in the Scoring class, which holds information on pawn structure and a score that evaluates how good or bad that structure is. This table is accessed via a hash key that depends only on the position of pawns. There are separate tables for the white and black sides. Entries in the pawn tables are always allowed to be replaced. This technique has been used in Cray Blitz and Crafty. It significantly speeds up the scoring routine by avoiding computation of the pawn structure score in most cases.

The development score encourages the program to move its pieces from the back rank, but discourages premature development of the Queen. A measure of piece mobility is also calculated for Bishops, Knights, and Queens. Bonuses are awarded for a Rook on the 7th rank, and also for a rook on an open or half-open file, and for doubled Rooks.

The castling score penalizes the program until it has castled.

Arasan calculates king safety using a "cover" score that measures how protected a king is by pawns, combined with a "proximity" score that measures the closeness of enemy pieces to the king. The "cover" score is calculated and hashed for fast lookup.

King "cover" is considered worse if pawns are missing near the king or if the king is near an open or half-open file.

"Proximity" includes several measures based on the proximity of enemy pieces, advancing pawns (indicating a "pawn storm"), Rooks on open files near the king, and Queens or Bishops bearing on diagonals near the king.

If both proximity and cover scores are high (indicating that the king is both exposed and attacked) an additional "bonus" penalty is given.

The king safety score is scaled according to the amount of material the opponent has. The more material the opponent has, the more dangerous an uncovered and/or attacked king position is.

In the endgame, passed pawns are evaluated to see if they can safely queen without being intercepted by the opposing king. Arasan versions 5.0 and higher also have code that gives a bonus for an "outside" passed pawn, and that helps to evaluate pawn "races," where both sides are likely to promote.

In the endgame, the king is encouraged to move towards the center or opposite side of the board, and to stay near pawns. This code is not very effective in producing good play, but it is better than nothing.

Prior to version 6.0, endgame and king saftey scoring were never applied together, but now they can be: if the position is in the process of transition to an endgame, it will be evaluated for both king safety and endgame positional factors.

A lookup table is used to speed up endgame calculations by storing information based on the position of king and pawns (for both sides). Once computed, this information is put into a hash table for later retrieval and use by any position that shares the same king + pawn configuration.

Arasan is aided in the endgame by the use of "bitbases" for the KPK endgame. Bitbases record for each position of the two kings and pawn whether the position is won for the stronger side, or only a draw. Unlike tablebases, bitbases do not record the number of moves to mate. Bitbases can be employed throughout the search tree since they are in memory. Arasan uses the KPK bitbases not only when there are only kings and one pawn left, but also as an aid to computing endgame scoring for more complex endgames.

Arasan also has some special-case code for other endgames including KNBK, KRK and KQK, which enables the program to play these fairly well, even without tablebases.

Multi-threading

Arasan version 9 and above have support for using more than one thread during searching, enabling it to make use of multi-processor machines and multi-core CPUs. Version 10.0 was the first version to have this fully working.

A pre-requisite for implementing multi-threading is that global data structures that are accessed for read & write be made safe for parallel access. This is done by using critical sections (under Windows) and spinlocks (under Linux).

However, locking has a performance overhead and is generally infeasible to do on a per-node basis. Arasan now does not lock the main hash table, to avoid this penalty, but uses a lockless hashing mechanism as described earlier. Other less frequently used data structures are locked, however.

My first attempt at implementing multi-threading used the ABDADA algorithm (see Weill's article). This is simple to implement since it uses the hash table as a single point of synchronization and control, but it did not perform well, because this usage of the hash table interferes with its normal usage as storage for scores and best moves from previously visited nodes. Consequently, the hash hit rate was reduced and multi processors did not achieve much of a speedup.

The current implementation in Arasan uses what is called the Young Brothers Wait concept (YBWC). This algorithm was described in Feldmann's Ph.D. thesis (see References) and several related publications in the early 1990s.

Arasan's implementation is based on a thread pool that is created at program start. This pool controls n threads, where n is the number of processors or cores available. These threads run continuously, saving the overhead of having to create and destroy threads during the search. However, a thread may or may not be assigned to a node in the search tree. If it is not assigned, its state is set to "Idle" and it is blocked on a synchronization object (an Event object under Windows; a semaphore under Linux).

There are three primitive operations available on the thread pool: "checkOut" obtains a thread if possible and sets its state to "Working". "start" then assigns work to a thread - this work is to search a given node of the search tree. "checkIn" makes the thread "Idle" again and de-associates it from the node, making it available to do another search.

Multiple threads are used only after the principal variation has been searched, since the search bounds are not well known until the p.v. has been completed.

After the p.v. is done, the search function checks if one or more threads are available. If there is an available thread, the board state and other essential information are cloned and the cloned state is given to function Search::searchSMP, which will perform the search in parallel.

A NodeInfo structure in the search routine keeps track of which threads are actively searching nodes below the current node. In addition, there is an array of SplitPoint structures in each Search class instance - this has all the points in the search history where a split has occurred from the current thread.

After conclusion of a parallel search, searchSMP will return, and will free up the Search class instance and associated thread to search another node.

It is possible that the main search completes before its dependent searches in other threads. In that case, the main search thread will make itself available to slave threads that it has started, assisting their work until all slave threads complete (this is known as the "helpful master" concept).

It is also possible that a cutoff will occur during the search, in which case all dependent threads need to be terminated - their values do not matter to the search result. In this case, we tell the threads to stop first and then wait for them to actually stop. Thus in any case, when the search routine exits, no threads that it starts are active anymore.

Windows user interface

The Windows user interface was re-written for version 6.0, although some code from earlier versions is still in use. The main difference from earlier versions is that the chess engine is now run as a separate process that communicates with the user interface through a socket, exactly like Winboard does. This is done to eliminate some problems that occured when using threads to manage the engine/UI communication in earlier versions.

Compared to Winboard, the Arasan UI lacks some features: for example, it cannot be used to communicate with a chess server, and has no facility for editing positions.

The Arasan user interface is a pretty standard MFC program. It uses the single document interface (SDI) model, so there is only one document class instance and one view instance active.

Earlier versions of the Arasan UI used bitmaps for display. The new version uses TrueType fonts. Several chess fonts are included in the program distribution. The original font archives, which include copyright and usage information, can be found in the fonts subdirectory of the Arasan source distribution.

Support

While no formal support is offered for this software, if you do find bugs in it, or discover a way to improve it, I would like to hear from you.

Contact information and additional information about Arasan can be found at arasanchess.org

References

Buro, M. ProbCut: An Effective Selective Extension of the Alpha-Beta Algorithm. ICCA Journal Vol. 18, No. 2, pp. 71-76 (1995).

Chess Programming Wiki, topic Magic Bitboards.

Crafty source code.

Duchi, John, Hazan, Elad and Singer, Yoram. "Adaptive Subgradient Methods for Online Learning and Stochastic Optimization" Journal of Machine Learning Research, Volume 12, 2/1/2011, pp. 2121-2159.

Donninger, Ch. (1993). "Null Move and Deep Search" ICCA Journal, v. 16 no. 3.

Ebeling, Carl. (1987). All The Right Moves: A VLSI Architecture for Chess. MIT Press.

Feldmann, Rainer. Game Tree Search on Massively Parallel Systems (1993). Ph.D. Thesis, University of Paderborn. (see also related publications.)

Frey, Peter W. (ed.) (1983). Chess Skill in Man and Machine. New York: Springer-Verlag.

Heinz, Ernst A. (1999). Scaleable Search in Computer Chess. Vieweg.

Hoki, Kunihuto and Kaneko, Tomoyuki "Large-Scale Optimization for Evaluation Functions with Minimax Search," Journal of Artificial Intelligence Research 49 (2014) 527-568.

Lai, Matthew (2015) Giraffe: Using Deep Reinforcement Learning to Play Chess. MSc Dissertation, Imperial College, London.

Kingman, Diederik P. and Ba, Jimmy Lei ADAM: A Method For Stochastic Optimization, ICLR 2015.

Marsand, T. Anthony and Schaeffer, Jonathan (1990). Computers, Chess and Cognition. New York: Springer-Verlag.

Weill, Jean-Christophe (1996). "The ABDADA Distributed Search Algorithm" Proceedings of the 1996 ACM 24th annual conference on Computer science, pp. 131-138.

Zobrist, A. L. (1970). "A new hashing method with applications for game playing," Technical report 88, Computer Science Department, University of Wisconsin.