Warning: file_put_contents(): Only 0 of 8322 bytes written, possibly out of free disk space in system/classes/filecache.php line 154
Warning: file_put_contents(): Only 0 of 303 bytes written, possibly out of free disk space in system/classes/filecache.php line 157
RedAlt

If you've done Unix shell scripting of any kind, you're likely to recognize the `grep` command. `grep` will search through a source input (a file, files, recursive directories, or piped output) for lines that contain a specific string or regular expression.

What I often find myself doing is piping the output of `ps` to `grep`. `ps` gets a list of running processes. By passing the output of `ps` to `grep`, I can check for the existence of a running process. For example, if I want to know if Apache is running, I can execute `ps auwx | grep "apache"` and the output will show apache processes.

The trick with this is that it's not useful for automated scripts. The reason is that `ps` will always include the `ps` process itself. So if even Apache isn't running, there is always one line at the end of the `grep` that shows that I ran the `ps auwx | grep "apache"`.

The solution is to use `pgrep`, which is a grep for running processes. This avoids finding the command that executed the search in the first place, and only returns the process ids of the matching processes, one per line. As you can imagine, this is very useful if you want to write an automation script, because you don't have to deal with the extra process found on the end, and you can use the process ids as an argument to other tools, like `nice` or `kill`.

This is the final post in my multi-part series on performance in Reversi AI. I'm going to wrap up the performance bits and then stray a little from performance talk before wrapping up with a set of some resources.

Related performance topics

I want to touch briefly on a number of different topics before I wrap up the performance section. I'm not going to spend a lot of time covering them because either I didn't use them or they didn't yield good results for me (which doesn't mean they couldn't have in different circumstances).

If you think back to when I talked about alpha-beta pruning, you'll remember that it uses properties of the minimax algorithm to skip having to evaluate large portions of the tree. A way to exploit this attribute is to re-order the moves so that the ones that cause this cut off behavior are done sooner. This is called "move ordering" and there are many different ways to do it. The simplest method is to apply knowledge that certain positions, like corners, can cause pruning and order them first. Another method is to keep track of the moves at each level that cause this cut off and try them first. These are called "killer moves". A more complicated version (which I'm not sure is necessarily better) of this might be to keep track of the list of best moves from previous evaluation attempts and order them first. There are many possibilities here, and it's another place where I'd encourage measurement.

Transposition tables allow you to cache previously calculated boards so you don't need to recalculate their value. This is useful in cases where making moves A then B results in the same board configuration as B then A. Since the value of the board is not dependent on the order of the moves to get there, the work to calculate the values in this second case can be skipped. This is where the Zobrist hashing comes into play... you will need a fast way to create a low-collision hashing system that uniquely identifies the boards in order to make the lookup fast. For each board, you're going to be checking if you've already seen that particular configuration before, so this will need to be fast. Also, the "hit" rate on these lookups must be high enough to offset the number of "miss" lookups being done.

Non-Performance-related concepts

I'd like to take a final moment to reflect on some interesting concepts that came up.

A common tactic used by AI in these sorts of games is called the "quiescence search". This is directly related to the "horizon effect". If you think about how the AI is searching for a move, it generally evaluates the game all the way to a certain max depth. This is generally fine, except for this max depth is a horizon beyond which the AI doesn't see... what if, just beyond that horizon, what looks like the best move results in a path that leads to inevitable loss? There are certain conditions in a game that are indicative of dangerous positions, such as being in check in chess or having only one move or all bad moves in Reversi. In these cases, continuing to evaluate a little deeper may result in a significantly different view of that move tree. This is the "quiescence search". One nice attribute of these conditions is that these searches are often fast, because there are few possibilities. Unfortunately, it may still take a number of ply to resolve the uncertainty.

When looking for a good move, there is typically a max depth to which the AI searches, and it's hard to say exactly how long it will take to evaluate a tree to a certain depth. If building an AI to play against a human player, you must take into account that a person probably doesn't want to wait a minute for the AI to figure out its move. Combine these two things with the fact that cutting the evaluation off early will potentially result in missing good moves and there is a problem. To rectify this, programmers use a technique called "iterative deepening". This is just a fancy term that means "Do the full evaluation to depth X, then, do the full evaluation to depth X+1". If you hit a maximum time limit when doing X+1, you just stop calculating and use the result from X. Also, since the move tree grows exponentially and you can take advantage of past evaluations with move ordering, the benefits typically outweigh the costs. Typically you start with a depth that you know can be calculated quickly under all circumstances and then calculate until a set time limit or some max depth is hit.

General Resources

Reversi strategy:

Programming game AI for two player zero-sum games:

Excellent Reversi programs:

Thor database - This is a collection of many (100K+) games played by good players over many years. It's very useful for training purposes. The file format is described at the link on the bottom of the page. It's in French, though, so you may need to get a translation.

... and we're back just in time for me to continue talking about performance-related ideas in Reversi AI. In this case, I'm going to keep it pretty short and talk about a couple of simple things that made a surprising difference.

Small-scale optimizations

Caching results

Values like the number of pieces for each player and the positions of valid moves for each player are data that are used pretty often. Instead of calculating it every time it's needed, just keep the data around until it's changed. This is basically the first step of iteratively calculating values.

Another manifestation of this technique is pre-calculating values. Quite often the same values are calculated again and again. Instead of recalculating them, calculate them once at the beginning of the program and then use these pre-calculated values. Depending on how static these values are, you can often just supply them as constants or a pre-defined static table in the application, completely obviating the need for runtime calculation. Opening and closing books are more complicated examples of where this may fit in for a game.

There are cases where caching data can be bad. For example, one of the experiments I tried was to keep around the evaluated move trees such that I wouldn't have to recalculate them all. One of the (many) problems with this is that it is a lot of data... so much that it may get pushed to virtual memory on the hard drive. Hard drives are so slow that it would be pretty disastrous for performance if you had to use it... computers are so fast now that you can probably recalculate the data in the time it'd take to be read from the disk. Example: A fast hard drive (non-SSD) will have an average latency greater than 2ms. Let's say a computer can process 1 billion instructions a second (not unreasonable for a 3 GHz processor), which means 2 million instructions run in the time it took to read the data from the disk. You can do a lot in 2 million instructions.

Use compatible data structures everywhere

Rather than dealing with (x,y) sets for positions or moves, a position can be one 64-bit number with a single bit set that represents the location: a bitboard. This makes it very easy to use with the rest of the bitboards, especially since the "valid moves" calculation results in a bitboard. As long as you don't have to convert back and forth often, it is very convenient and efficient.

After that short post, we're heading into the home stretch. Next time: Wrapping it up.

We're back to continue talking about Reversi performance ideas, in this post's case: Move tree search algorithms.

Finding the Best Moves

The idea of all this thinking ahead is for the AI to determine what it thinks is the best move to make. All of the algorithms mentioned in this section are equivalent in that they will return the same move, but some are generally more efficient than others. If you're willing to (potentially) sacrifice accuracy for a significant speed gain, take a look at multi-prob cut... just make sure your evaluation function is good, or you will be skipping good moves for potentially bad ones, since this algorithm doesn't do exhaustive evaluation.

There are many algorithms for searching for the best move, but I am only going to cover the more basic ones. If you're interested in more and better algorithms, look up things like NegaScout, MTD(f), and/or SSS*, all of which should work at least as well as the best listed here.

Before we start, there is a basic concept to understand in order for this section to make sense: Reversi is a zero-sum game - basically, what's good for one player is equally bad for the other and vice-versa. This fact can be exploited for significant gain.

Minimax

This is the basic search algorithm used by AI in a two person zero-sum game like Reversi.

If it's thinking more than 1 ply ahead, the AI will need to consider what the opponent will do. Since it's not really possible to know what move the opponent will make, the safest (most conservative) idea is to assume the opponent will make the best possible move (as far as the AI is able to determine it). Because Reversi is a zero-sum game, a simplification of this is to alternate between looking for the move with the maximum score (when it's the AI's turn) and the minimum score (when it's the opponent's turn). When this is done properly, it guarantees that the best possible move is chosen (assuming the game is evaluated to the end by the AI - otherwise it's just the best possible move as the AI sees it). By assuming the best possible opponent, it won't be caught by a surprise move.

Alpha-Beta Pruning

With minimax, every move position is evaluated... the algorithm itself is not necessarily any faster than any other exhaustive algorithm you might chose. Alpha-Beta (A/B) pruning changes that by eliminating (pruning) certain parts of the move tree. Understanding how this works requires a solid grasp of minimax, so make sure that makes sense to you before going forwards.

The general idea behind A/B pruning is to take advantage of the knowledge that you're going to alternate between minimizing and maximizing the move scores used. Wikipedia has a good article explaining A/B pruning, including an animated illustration that is very helpful. I recommend studying this if you're having a hard time with understanding A/B pruning.

I'll try an example of how it works: Let's say you're at a point where the opponent is moving at depth 2, so you're looking for the move with the lowest score. You check the entire tree for the first move at depth 2 and get the value 3 and start looking at the 2nd move at depth 2. As you look at the resulting available moves at depth 3, you evaluate the first move and get the score 4. Since you're trying to maximize the value at depth 3, the final result from all of the moves in this tree must be at least the score of the first move, 4. If you look back at depth 2 (where we're minimizing), the first move was a 3. Since depth 2 is minimizing, 3 is less than 4, and the second move at depth 3 must be at least 4 (since depth 3 is maximizing), the this 2nd move at depth 2 will not be taken because the first move would be taken instead. The rules of A/B pruning take advantage of this bit of logic and allow the evaluation to skip the rest of the 2nd move's tree. This can cut out a significant amount of evaluation (greater the deeper you'd be going) without missing any moves.

Now that the big performance changes are out of the way, it's time for another break. Next time I'll give examples of small-scale optimizations.

Again, we've assembled to hear me talk about performance in Reversi AI. This time I'm going to talk about larger code changes that bring about significant gains, which I'm calling "large scale optimizations".

Large-scale Optimizations

Calculating quickly, a little bit at a time

One simple performance tweak that really pays off is iteratively tracking state rather than recalculating it. Generally doing this enables you to take a calculation that's probably linear and make it constant time. Much of the time I think programmers do this intuitively, as it's a very simple technique, but it pays to be aware of it in case it doesn't subconsciously flow into your code.

In Reversi, iterative calculation can be used in a number of different places, like tracking the number of pieces each player has on the board. Counting the pieces on the board seems pretty simple, but, if you were to re-calculate this whenever it was needed (which is relatively often), it'd really add up, particularly since it falls within board value estimation equation.

One of the places I use this is tracking the stables pieces on the game board. By definition a stable piece doesn't become un-stable through the course of the game, so there's little point in continually recalculating that a given piece is stable. Calculating stable pieces is, itself, an iterative process, though... a piece becomes stable when the correct pieces next to it are stable. This calculation can be made much faster by caching the stable pieces from the last time it was calculated. For example, corner pieces are inherently stable (which is one of the reasons why they are so desirable). A 'C' square (squares orthogonally adjacent to the corner) is stable if the corner next to it is occupied and the same color. This is true along the edge... if all the squares from a given square to a corner on that edge are occupied by the same color piece, all of those pieces are stable. By keeping track of the stable pieces, I skipped having to start all the way from the corners each time.

In the chess domain iterative calculation is even more common and useful, because chess programs make use of a technique called Zobrist hashing. The idea here is to have an algorithm that hashes the board state, accounting for every unique board configuration, with a very low chance of collision. This hashing calculation would be somewhat expensive (in that it'd be based on the number of pieces on the board: O(n)) if it were completely recalculated for every board change. Instead, with Zobrist hashes, the calculation is doable in constant time. To enable this, each possible board element (say a white pawn in location B2) has its own unique value. To calculate the initial hash for a board, these unique values are XORed together. Since XOR is reversible ((a XOR b) XOR b == a), you can update the hash value very quickly in 2 simple steps:

  1. Remove the piece from the old location - XORing the original board's hash with the value corresponding to the piece's old location, removing it from the board
  2. Add the piece at the new location - XORing the result from step 1 with the value corresponding to the new position

Opening Book

Opening books are databases of good ways to play the beginning of a game. Most strategy games with non-random setups have these, often where the common openings have specific names attached. When playing these games, people may not even realize they're using opening books, but instead probably just think of them as a favorite way to start the game or a way that worked well for them in the past. This is much more formalized with game AI.

Having an opening book is a common optimization for game AI. From a high level, it's a manifestation of pre-calculation. As long as you can store the data in an efficient way, it can lead to huge savings. For a book to be useful, it should be calculated to many more ply than the game would normally be able to do. This means that it can think much further ahead at a critical time (the beginning of the game). Also, since the book is pre-calculated, much more time can be devoted to creating it than would be feasible during regular gameplay.

To create an opening book, you do the same sort of calculations you would do during the game, but store the paths that lead to the best outcomes and the resulting board values. At the beginning of the game, instead of having to spend time calculating board values, you can just do a lookup in your book. When you get to the end of the book's move tree, you start the normal calculation.

Bitboards

The idea here is to repack the board into a data structure that can be manipulated quickly in useful ways. In the Reversi case, I represented the board using 2 separate bitboards: one for white and one for black. Since the board is 8x8 and each square can be represented by 1 bit (is there a black/white piece there?), the board for each player packs nicely into 1 unsigned 64-bit number (QWORD). Each row is represented by a byte of the QWORD. On some architectures, this can be stored in a single register, making it even faster.

Here is an illustration of a way to order it:

Bit layout for a reversi bitboard

The number in each location is the power of two it represents. If the board is empty except for the top-left piece, the value of the board will be 2^63. If the only piece is the top-right corner, the board's value will be 2^56.

These bitboards can be used extensively for things like finding all white pieces, looking for valid moves and stable pieces, determining if a given player has any pieces left, etc.. The advantage may not be obvious, since dealing with individual bits can be a hassle, but it allows some interesting calculations to be done basically in parallel. By using 64-bit operations on these boards, an calculation can be done simultaneously on all positions. I'll talk about some examples of this in detail shortly.

Using this structure has a number of advantages. At it's simplest, copying the board is as simple as copying 2 64-bit numbers, making cloning boards for further moves quick and easy. Another advantage is that manipulation of the board can by done quickly using simple binary operations. Adding a piece to the board is a logical-or and removing a piece is a logical-and.

Now let's look at an example of how this works for something more complicated. Say you're trying to find all the valid moves for black and the board looks like this

Sample Reversi board

The idea is to find a line of white pieces sandwiched between a black piece and an empty square. The simple way to do this would be to find a line that starts at a black piece, goes through only white pieces, and ends in an empty space. To begin, break the normal board into separate bitboards for the white pieces, black pieces, and empty spaces, like this:

Sample Reversi board split into white, black, and empty bitboards

Before looking for valid moves in the downward direction, where the black piece is at the top, the white pieces are vertical, and the empty piece is at the bottom, first I'm going to define a way to quickly adjust all the pieces on the board downward simultaneously. Since a bitboard is a 64-bit number with each row represented by a contiguous byte (see the illustration above), I can move all the pieces down one by shifting it right 8 bits. Since each row wraps down to the row below, a single shift right moves all the pieces right one column, but 8 shifts right shifts the whole board down one row. The problem is that, since the bits DO wrap lines (and some shift operators may carry-extend or even wrap from the other end or the carry bit), I use a mask (with a logical-AND) to clear out the row/column that has moved in from off-board. Each direction will need its own mask. The downward mask looks like this:

Bitboard mask for downward move checking

Now I'm ready to start looking for moves in the downward direction. The black bitboard has all of the black pieces in it. If I move all of these pieces down one and logical-AND it with the white bitboard, the resulting board (which I'll call the "potential moves" board, since these are used to find valid moves further down) shows all the locations that have a black piece with a white piece right below it. Here's an illustration of these steps:

Bitboard downward move check step 1

From here, I continue looking downward, but now I'm looking for empty spaces (indicating a valid move) as well as white spaces (indicating future potential moves). If you shift the potential move board downward again, you can determine these two things with logical-ANDs. Valid moves are found by logical-ANDing with the empty bitboard, and the new potential moves bitboard by logical-ANDing with the white bitboard again.

Bitboard downward move check step 2

I continue shifting downward and doing these two evaluations until the potential moves bitboard is empty. The logical-OR of all the valid moves results in a bitboard that contains every valid move for black in the downward direction. If this is done in all 8 directions, the bitboards from those can be logical-ORed together to find all valid moves for black.

Next time I'll continue the presentation of large-scale optimizations with talk about move tree searching and a couple of methods for how it can be done.