This is the final post in my multi-part series on performance in Reversi AI. I'm going to wrap up the performance bits and then stray a little from performance talk before wrapping up with a set of some resources.

Related performance topics

I want to touch briefly on a number of different topics before I wrap up the performance section. I'm not going to spend a lot of time covering them because either I didn't use them or they didn't yield good results for me (which doesn't mean they couldn't have in different circumstances).

If you think back to when I talked about alpha-beta pruning, you'll remember that it uses properties of the minimax algorithm to skip having to evaluate large portions of the tree. A way to exploit this attribute is to re-order the moves so that the ones that cause this cut off behavior are done sooner. This is called "move ordering" and there are many different ways to do it. The simplest method is to apply knowledge that certain positions, like corners, can cause pruning and order them first. Another method is to keep track of the moves at each level that cause this cut off and try them first. These are called "killer moves". A more complicated version (which I'm not sure is necessarily better) of this might be to keep track of the list of best moves from previous evaluation attempts and order them first. There are many possibilities here, and it's another place where I'd encourage measurement.

Transposition tables allow you to cache previously calculated boards so you don't need to recalculate their value. This is useful in cases where making moves A then B results in the same board configuration as B then A. Since the value of the board is not dependent on the order of the moves to get there, the work to calculate the values in this second case can be skipped. This is where the Zobrist hashing comes into play... you will need a fast way to create a low-collision hashing system that uniquely identifies the boards in order to make the lookup fast. For each board, you're going to be checking if you've already seen that particular configuration before, so this will need to be fast. Also, the "hit" rate on these lookups must be high enough to offset the number of "miss" lookups being done.

Non-Performance-related concepts

I'd like to take a final moment to reflect on some interesting concepts that came up.

A common tactic used by AI in these sorts of games is called the "quiescence search". This is directly related to the "horizon effect". If you think about how the AI is searching for a move, it generally evaluates the game all the way to a certain max depth. This is generally fine, except for this max depth is a horizon beyond which the AI doesn't see... what if, just beyond that horizon, what looks like the best move results in a path that leads to inevitable loss? There are certain conditions in a game that are indicative of dangerous positions, such as being in check in chess or having only one move or all bad moves in Reversi. In these cases, continuing to evaluate a little deeper may result in a significantly different view of that move tree. This is the "quiescence search". One nice attribute of these conditions is that these searches are often fast, because there are few possibilities. Unfortunately, it may still take a number of ply to resolve the uncertainty.

When looking for a good move, there is typically a max depth to which the AI searches, and it's hard to say exactly how long it will take to evaluate a tree to a certain depth. If building an AI to play against a human player, you must take into account that a person probably doesn't want to wait a minute for the AI to figure out its move. Combine these two things with the fact that cutting the evaluation off early will potentially result in missing good moves and there is a problem. To rectify this, programmers use a technique called "iterative deepening". This is just a fancy term that means "Do the full evaluation to depth X, then, do the full evaluation to depth X+1". If you hit a maximum time limit when doing X+1, you just stop calculating and use the result from X. Also, since the move tree grows exponentially and you can take advantage of past evaluations with move ordering, the benefits typically outweigh the costs. Typically you start with a depth that you know can be calculated quickly under all circumstances and then calculate until a set time limit or some max depth is hit.

General Resources

Reversi strategy:

Programming game AI for two player zero-sum games:

Excellent Reversi programs:

Thor database - This is a collection of many (100K+) games played by good players over many years. It's very useful for training purposes. The file format is described at the link on the bottom of the page. It's in French, though, so you may need to get a translation.

Comments

There are no comments on this post.