Home Previous Bottom Next

Writing a chess program in 99 steps


14  Search

Chess programs determine the best move by searching a tree of moves, up to a certain depth, and then evaluate the resulting positions at the leafs (tree tips, see figure below) to see which move will ultimately lead to the best result, assuming that both sides will always play the best moves. 

So what are the best moves, for both sides, how is this determined? 

Minimax

Before I explain the alpha-beta algorithm, you need to understand the minimax strategy. In game theory, chess is a two-player zero-sum game.  Both players know where all pieces are and they are free to make any legal move. What's good for white is equally bad for black, so white's gain is equal to black's loss (hence: zero-sum).  Let's define the 'score' as being some heuristic value of a chess position. Positive scores means that white is winning, and negative scores means that black is winning.   With this definition, white will try to maximize the score, and black will try to minimize the score.  Having these opposite objectives is the origin of the name minimax (the theoretical background of minimax was established by John von Neumann).

There are a number of principles you need to remember.  They always work, irrespective of where you are in the tree, how big the tree is or how deep the tree is, and also irrespective of the algorithm (minimax, alphabeta, pvs, etcetera):

1) Every node (chess position) in the tree can be assigned a score
2) A node's score results from its child nodes.  If there are no child nodes (at the leafs), then the score is the heuristic evaluation value of that position.
3) Although the search starts at the root (node #0 in the figure below), scores originate from leaf node evaluations (calls to the evaluation function) and passed up to the respective parent nodes (see the arrow directions below)
4) Every node represents a call to the search function.  The search will only get information from the caller (its direct parent) and gets results passed back only from its direct child nodes.

Let's look at a numerical example. I am using very small trees to explain the concept. Once you understand how it works in a small tree, it's easy to understand how it works for large trees (because size does not matter!). The order in which nodes are visited can be seen in the figure, just look at the node numbers.  The search starts at node #0,  then visits node#1, then node #2, etc.   Nodes #1, #5, and #9 are visited in one search loop.  Similarly, nodes #2, #3, and #4 are also visited in one search loop (but at a different ply level).  Node #0 is the root node.  Evaluations are done at the leaf nodes only.  The minimax algorithm is a recursive function (recursive means: its making calls to itself, very much similar to perft), which traverses down the tree, up to a specified depth, calculates the static evaluation score of all leaf nodes, and passes the maximum, or minimum, score of all child nodes back to the parent node. If it's white's turn to move, then the score is the maximum of the child node scores. If it's black's turn to move, then the score is the minimum of the child node scores.

You would think that we need two calculations: one for white (to maximize), one for black (to minimize). Actually, it turns out that this can be coded in a single recursive function that is doing the min/max calculations for white AND for black.  This is accomplished by introducing an elegant way of changing perspective in the recursive calls.

The example below shows the minimax algorithm (this specific implementation is called negamax).  What's special about it is that return values are negated to reflect the changes in perspective that result from changing the side to move.  Remember: what's good for white is equally bad for black.  In this implementation, black will also maximize the score, and that is supposed to happen because we negated the score that is passed to black (so black now maximizes negated values, which is actually the same as minimizing the original values).  Hence  the same code can be used for white's and for black's perspective: 

int minimax(int depth)
{
       if (depth == 0) return eval();                // eval returns the score in negamax fashion, i.e. from the perspective of the side to move !
       best = -LARGE_NUMBER;
       movegen();
       for (i = firstmove; i < lastmove; i++)
       {
           makeMove();
           val = -minimax(depth-1);                  // note the minus sign to reflect the change in perspective!
           unmakeMove();
           if (val > best) best = val;               // both sides want to maximize!
       }
       return best;
}

These couple of lines form the core of 99% of all chess search functions (but with many improvements and refinements added). Once you understand why negating of scores is an efficient way of coding, it will also become clear that it's really not required to talk about  'black' or 'white'.  In the context of negamax, both sides want to maximize (from their perspective) and color has become irrelevant.  We only have to distinguish between 'side to move' (even plies) and 'side not to move' (odd plies).  And of course this only works if the evaluation function follows the same philosophy and returns the score from the perspective of the side to move.

Also note that minimax returns the best score, but not the best move.  In a real search algorithm we also want to remember the best line of play (the principal variation, or PV).  So, we are going to have to remember the last move that satisfied the condition 'val > best', at every ply, because that was the best move.

Alpha-beta

Minimax visits all leaf nodes, which turns out to be a rather naïve strategy involving a lot of unnecessary work!   We can improve minimax significantly without any loss of accuracy:

Have a look at the above figure again, and think about what happens after having visited node #10.   The evaluation of node #10 results in a score of -3 at node #9 (without having visited #11 and #12 yet).  At this point, a 'more intelligent' strategy would decide to skip nodes #11 and #12, because whatever the scores of #11 and #12 will be, node #9 is always going to be worse than node #5 (from the perspective of the side to move).  So we can just forget about evaluating #11 and #12.  This is how alpha-beta cut-offs work: branches that cannot possibly influence the final score and clearly don't represent best play for both sides are pruned away.

The alpha-beta algorithm returns the same score as minimax, but with much less nodes visited.  It is an important improvement that appears to have been reinvented a number of times in the late 1950's.

The improvement is achieved by keeping track of two scores during the search: alpha and beta.  Alpha represents the best score so far for the side to move (alpha starts off as a large negative number and can only become larger - i.e. improve from the perspective of the side to move). Beta represents the best score so far for the side not to move (it starts off as a large positive number and can only become smaller - i.e. improve from the perspective of the side not to move).  When beta becomes less than alpha during the search, it means that the current position cannot be the result of best play by both players, and hence this position does not need to be explored any further. 

The algorithm negates the return values,  just like minimax,  but alpha and beta are also negated AND they swap places to reflect the changes in perspective that result from changing the side to move:  

int Board::alphabeta(int depth, int alpha, int beta)
{
       // Negascout
 
       if (depth == 0) return eval();
       movegen();
       for (i = firstmove; i < lastmove; i++)
       {
              makeMove();
              val = -alphabeta(depth-1, -beta, -alpha);  // note the minus signs & switching of places of alpha and beta
              unmakeMove();
              if (val >= beta) return beta;
              if (val > alpha) alpha = val;                // both sides want to maximize
       }
       return alpha;
}

Normally, tree diagrams show values from the perspective of the side to move at the root (in this case White), and don't take all this swapping & negating of the scores and alpha/beta into account:

As programmers however, we want to know and understand what happens at the level of the search function.  Due to the constant swapping and negating, the actual values of alpha and beta in the search code are always seen from the perspective of the side to move at the current node (irrespective who's turn it is at the root).  The diagram below should explain the alpha-beta mechanism in more detail, following the search tree above at the level of the search function (in negascout fashion), step-by-step.  Alpha values are shown in the upper right corner of every node, beta values are shown in the lower right corner of every node.  Note that every node will receive an alpha and beta value from its parent, but the node will only pass back one score to its parent. The parent node then checks if that score causes a beta cut-off, or if it improves alpha.

 

One key property of the alpha-beta algorithm is that the order in which moves are searched is very important. The minimum amount of work is done if we always start with the best move (perfect move ordering).  This is not possible of course (if we already knew what the best move is, then we wouldn't have to search in the first place), but there are some elegant ways to guess which moves are good, and we can order moves based on that heuristic.  The worst case scenario for alpha-beta is that the total amount of work is equal to minimax (this happens if the best move is always searched last) .  If we search the best move first, then it can be shown that the search can go twice as deep in the same amount of time (same amount of node visited). 

Note that if the first move is indeed the best move, then the order of the remaining moves does not matter anymore. Theoretically, we only have to place the best move first to get the most cut-offs. This also means that a move ordering algorithm probably needs to order the best 3 to 5 moves only, and not worry too much about the other moves.

 Even without any move ordering, there will be sufficient cut-offs that make alpha-beta orders of magnitude faster than plain minimax. This is the same tree as before, but now with all moves perfectly sorted:

And here is how the cut-offs work with perfect sorting:

One last example, this time using a chess position and search like an engine. It's a very basic example. Once you understand how alpha-beta works for a simple position, it's easy to understand how it works for complex positions (because complexity does not matter!). The position below has white to move next, and we will emulate a 2 ply engine search (every side does one move, similar to the tree that we've been looking at, there are 242 leaf nodes in this tree). Note that chess engines don't search like human chess players, human players tend to consider only a very limited amount of moves at the root.
The root node looks even, because every side has a king with one rook:

White has 18 moves to choose from, but 1. Kxg4 looks to be promising.
So we will start our 'search' with Kxg4, and then check possible replies from black (actually there's only three replies: 1. ... Kg6, 1. ... Kg7, or 1. ... Kf7).
After having considered all of black's replies, we conclude that 1. Kxg4 is going to leave us with a significant material advantage, whatever black's response will be.
So let's keep this move in mind and start looking for an even better move (improve alpha). 
Maybe 1. Kf2. ?
Black now has 22 moves to choose from. 
We already have a good move, so now we have to check if black's best reply would leave us in a better place than 1. Kxg4
Our first reply move to consider will be 1. ... Kg6,  we don't know if this is black's best move, but this turns out to be worse (for us) than our best option so far. 
Wait a minute: If  1. Kf2 Kg6 does not improve on 1. Kxg4 ....,  then surely black's best reply on 1. Kf2 will not improve our position after 1. Kxg4
So why continue searching more black moves after 1. Kf2?   We already know that 1. Kf2 cannot possibly be better than 1. Kxg4 !
We just saved ourselves 21 black reply moves to consider (this was a beta cut-off)
 

Principal Variation search

As you can see in above tree diagrams, there are going to be a lot of beta-cutoffs with perfect move ordering. Actually, if the first leaf node is the main PV, then all remaining searches are basically to prove that we already have a PV.  Principal Variation Search (PVS) is a small performance improvement compared to alpha-beta, by making the assumption that move ordering is good, and that the first node does return a score between alpha and beta, which would make it a PV node.  Once we've found a move with a score between alpha and beta, the rest of the moves are going to be searched with the goal of proving that they are all bad.  This can be done a bit faster because we narrow down the alpha-beta window (resulting in more cut-offs), because we don't worry that one of the remaining moves might be good.  If the algorithm finds out that this assumption was wrong, it will have to re-search with the normal alpha-beta window. And if move ordering is good, this should not happen too often:

int alphabetapvs(int ply, int depth, int alpha, int beta)
{
       // PV search
       pvfound = false;
       if (depth == 0) return eval();
       movegen();
       for (i = firstmove; i < lastmove; i++)
       {
              makeMove();
              {
                     if (pvfound)
                     {
                            val = -alphabetapvs(ply+1, depth-1, -alpha-1, -alpha);
                            if ((val > alpha) && (val < beta)) val = -alphabetapvs(ply+1, depth-1, -beta, -alpha);    // In case of failure, proceed with normal alphabeta
                     }
                     else val = -alphabetapvs(ply+1, depth-1, -beta, -alpha);           // Normal alphabeta
                     unmakeMove();
                     if (val >= beta) return beta;
                     if (val > alpha)
                     {
                            alpha = val;
                            pvfound = true;
                     }
              }
       }
       return alpha;
}

Collecting the principal variation

None of the above algorithms return the PV (principal variation).  It's not very difficult to add some code lines to remember the PV.  All we need is the last best move (where 'val > alpha')  for ply=0, and then before that particular move was found,  the last best move at ply=1, and so on.  The only problem is that we don't know beforehand if a move is going to be the last best move,  there might be nodes upcoming with scores > alpha.  And if a move turns out to be the last best move, we also need to copy the last best line of play for all deeper plies. A simple and elegant way of collecting the PV is by introducing a 'triangular array'. This idea is  demonstrated in Tom Kerrigan's TSCP, see the code below for an example implementation in minimax:

int minimax(int ply, int depth)
{
       best = -LARGE_NUMBER;
       triangularLength[ply] = ply;                                          // current PV will start here
       if (depth == 0) return eval();
       movegen();
       for (i = firstmove; i < lastmove; i++)
       {
              makeMove(moveBuffer[i]);
              val = -minimax(ply+1, depth-1);                               
              unmakeMove(moveBuffer[i]);
              if (val > best)                                                
              {
                  best = val;
                  triangularArray[ply][ply] = moveBuffer[i];                 // save this move
                  for (j = ply + 1; j < triangularLength[ply + 1]; j++)
                  {
                      triangularArray[ply][j] = triangularArray[ply+1][j];   // and append the latest best PV from deeper plies
                  }
                  triangularLength[ply] = triangularLength[ply + 1];
              }
       }
       return best;
}

This code will collect the PV in  triangularArray[0][0..triangularLength[0]].


Home Previous Top Next

last update: Saturday 02 July 2011