Again, we've assembled to hear me talk about performance in Reversi AI. This time I'm going to talk about larger code changes that bring about significant gains, which I'm calling "large scale optimizations".
Calculating quickly, a little bit at a time
One simple performance tweak that really pays off is iteratively tracking state rather than recalculating it. Generally doing this enables you to take a calculation that's probably linear and make it constant time. Much of the time I think programmers do this intuitively, as it's a very simple technique, but it pays to be aware of it in case it doesn't subconsciously flow into your code.
In Reversi, iterative calculation can be used in a number of different places, like tracking the number of pieces each player has on the board. Counting the pieces on the board seems pretty simple, but, if you were to re-calculate this whenever it was needed (which is relatively often), it'd really add up, particularly since it falls within board value estimation equation.
One of the places I use this is tracking the stables pieces on the game board. By definition a stable piece doesn't become un-stable through the course of the game, so there's little point in continually recalculating that a given piece is stable. Calculating stable pieces is, itself, an iterative process, though... a piece becomes stable when the correct pieces next to it are stable. This calculation can be made much faster by caching the stable pieces from the last time it was calculated. For example, corner pieces are inherently stable (which is one of the reasons why they are so desirable). A 'C' square (squares orthogonally adjacent to the corner) is stable if the corner next to it is occupied and the same color. This is true along the edge... if all the squares from a given square to a corner on that edge are occupied by the same color piece, all of those pieces are stable. By keeping track of the stable pieces, I skipped having to start all the way from the corners each time.
In the chess domain iterative calculation is even more common and useful, because chess programs make use of a technique called Zobrist hashing. The idea here is to have an algorithm that hashes the board state, accounting for every unique board configuration, with a very low chance of collision. This hashing calculation would be somewhat expensive (in that it'd be based on the number of pieces on the board: O(n)) if it were completely recalculated for every board change. Instead, with Zobrist hashes, the calculation is doable in constant time. To enable this, each possible board element (say a white pawn in location B2) has its own unique value. To calculate the initial hash for a board, these unique values are XORed together. Since XOR is reversible ((a XOR b) XOR b == a), you can update the hash value very quickly in 2 simple steps:
- Remove the piece from the old location - XORing the original board's hash with the value corresponding to the piece's old location, removing it from the board
- Add the piece at the new location - XORing the result from step 1 with the value corresponding to the new position
Opening books are databases of good ways to play the beginning of a game. Most strategy games with non-random setups have these, often where the common openings have specific names attached. When playing these games, people may not even realize they're using opening books, but instead probably just think of them as a favorite way to start the game or a way that worked well for them in the past. This is much more formalized with game AI.
Having an opening book is a common optimization for game AI. From a high level, it's a manifestation of pre-calculation. As long as you can store the data in an efficient way, it can lead to huge savings. For a book to be useful, it should be calculated to many more ply than the game would normally be able to do. This means that it can think much further ahead at a critical time (the beginning of the game). Also, since the book is pre-calculated, much more time can be devoted to creating it than would be feasible during regular gameplay.
To create an opening book, you do the same sort of calculations you would do during the game, but store the paths that lead to the best outcomes and the resulting board values. At the beginning of the game, instead of having to spend time calculating board values, you can just do a lookup in your book. When you get to the end of the book's move tree, you start the normal calculation.
The idea here is to repack the board into a data structure that can be manipulated quickly in useful ways. In the Reversi case, I represented the board using 2 separate bitboards: one for white and one for black. Since the board is 8x8 and each square can be represented by 1 bit (is there a black/white piece there?), the board for each player packs nicely into 1 unsigned 64-bit number (QWORD). Each row is represented by a byte of the QWORD. On some architectures, this can be stored in a single register, making it even faster.
Here is an illustration of a way to order it:
The number in each location is the power of two it represents. If the board is empty except for the top-left piece, the value of the board will be 2^63. If the only piece is the top-right corner, the board's value will be 2^56.
These bitboards can be used extensively for things like finding all white pieces, looking for valid moves and stable pieces, determining if a given player has any pieces left, etc.. The advantage may not be obvious, since dealing with individual bits can be a hassle, but it allows some interesting calculations to be done basically in parallel. By using 64-bit operations on these boards, an calculation can be done simultaneously on all positions. I'll talk about some examples of this in detail shortly.
Using this structure has a number of advantages. At it's simplest, copying the board is as simple as copying 2 64-bit numbers, making cloning boards for further moves quick and easy. Another advantage is that manipulation of the board can by done quickly using simple binary operations. Adding a piece to the board is a logical-or and removing a piece is a logical-and.
Now let's look at an example of how this works for something more complicated. Say you're trying to find all the valid moves for black and the board looks like this
The idea is to find a line of white pieces sandwiched between a black piece and an empty square. The simple way to do this would be to find a line that starts at a black piece, goes through only white pieces, and ends in an empty space. To begin, break the normal board into separate bitboards for the white pieces, black pieces, and empty spaces, like this:
Before looking for valid moves in the downward direction, where the black piece is at the top, the white pieces are vertical, and the empty piece is at the bottom, first I'm going to define a way to quickly adjust all the pieces on the board downward simultaneously. Since a bitboard is a 64-bit number with each row represented by a contiguous byte (see the illustration above), I can move all the pieces down one by shifting it right 8 bits. Since each row wraps down to the row below, a single shift right moves all the pieces right one column, but 8 shifts right shifts the whole board down one row. The problem is that, since the bits DO wrap lines (and some shift operators may carry-extend or even wrap from the other end or the carry bit), I use a mask (with a logical-AND) to clear out the row/column that has moved in from off-board. Each direction will need its own mask. The downward mask looks like this:
Now I'm ready to start looking for moves in the downward direction. The black bitboard has all of the black pieces in it. If I move all of these pieces down one and logical-AND it with the white bitboard, the resulting board (which I'll call the "potential moves" board, since these are used to find valid moves further down) shows all the locations that have a black piece with a white piece right below it. Here's an illustration of these steps:
From here, I continue looking downward, but now I'm looking for empty spaces (indicating a valid move) as well as white spaces (indicating future potential moves). If you shift the potential move board downward again, you can determine these two things with logical-ANDs. Valid moves are found by logical-ANDing with the empty bitboard, and the new potential moves bitboard by logical-ANDing with the white bitboard again.
I continue shifting downward and doing these two evaluations until the potential moves bitboard is empty. The logical-OR of all the valid moves results in a bitboard that contains every valid move for black in the downward direction. If this is done in all 8 directions, the bitboards from those can be logical-ORed together to find all valid moves for black.
Next time I'll continue the presentation of large-scale optimizations with talk about move tree searching and a couple of methods for how it can be done.