Quite a while ago, I started work on a Reversi game, mostly to see if I could challenge my friend who is a pretty strong Reversi player. I don't have much experience with this sort of task, as it's not the sort of thing on which I generally work, but it was a nice change of pace. While it was new to me, this IS a pretty well-trod area... I never expected any revelations or unique developments. Certainly, if you expect to find something unavailable elsewhere here, you are mistaken, but if you want a quick look into some interesting challenges, mostly performance-related challenges, hopefully this entry (and related entries) will be interesting. I'll be breaking it into a number of posts:
- Basic introduction to Reversi AI
- High-level performance considerations
- Large-scale optimizations
- Search tree algorithms
- Small-scale optimizations
- Wrap up and Resources
I suppose first I'll give an introduction to Reversi. You can find plenty of descriptions online, but for the lazy among us, here's a breakdown: Reversi is a 2 player game played on an 8 by 8 grid of equal-sized squares. Each player has their own unique color (traditionally black and white). The pieces are flat and cylindrical, with top and bottom sides solidly colored one for each player. Pieces are placed centered in the squares on the board and the top color of the piece indicates which player owns that square (initially this must match the player that played the piece). The starting configuration for the board has the 4 middle squares filled by pieces, alternating white-black on the upper middle row, then black-white on the lower middle row. Traditionally, play starts with the black player. Players take turns placing a piece into a valid position until neither player has a valid move. If a player doesn't have a valid move, they pass, but otherwise they must place a piece (players cannot opt to pass). A move is valid if it results in at least one of the opponent's pieces flipping. This happens when, horizontally, vertically, or diagonally, a line can be drawn from the move's position through 1 or more of the opponent's pieces and then to another of the current player's pieces without going through an empty square or piece owned by the current player. This sort of sandwiching of the opponent's pieces with the current player's pieces is called "flanking". When a piece is played, all of the flanked opponent's pieces (in every direction) are flipped to be owned by the player who just placed the piece. The winner is the player who has the most pieces at the end of the game.
One area that was particularly new to me was the development of artificial intelligence for the computer players. Like many pieces of technology, it seemed as if it would be wondrous: Neural nets and/or complex logical rules. Indeed, if you look at traditional human strategic elements, you will see mention of things like parity, stable pieces, and balanced edges. Somehow, these ideas must be combined together to produce an incredibly intelligent player.
Sadly, like many magic tricks, the wonder dissolved once the secret was revealed: Try to think as far ahead as possible. By simulating every possible move from both sides, the AI is able to look ahead a number of moves. Each move it looks ahead is called a "ply". The further ahead the computer can calculate (i.e. the more ply), the better a move it is likely to make. If the computer is able to calculate to the end of the game, then it will know if it will win or lose and the sequence of moves to get there.
What's important to understand at this point is that Reversi is a relatively deep game, in that there are many turns per game (usually 30 per side). Also, to know who won the game, you need to get to the end of the game and then it's down to a simple question: Who has more pieces on the board? In the ideal case, the computer could solve the game all the way to the end, trying every possible move, and then it could know which move would be the best move. Unfortunately, that's not really feasible with the current technology; the number of possible games is extremely large as there are usually many potential moves per board position... on average about 7.488, according to my estimation. This grows exponentially with each move... it doesn't take too many ply before it takes more than a day to compute all the possible board outcomes. From this, we can conclude a few things:
- We need a way to deal with the fact that we're not going to be able to solve to the end of the game
- Performance will be pretty important
Dealing with uncertainty
Given the computational needs of completely solving Reversi, it is pretty obvious that the AI has to deal with the fact that completely solving Reversi is not going to happen. Generally the method used is to play the game as deeply as we have time for, and then come up with an accurate estimate of how good the resulting board is. In an ideal world, this estimate would indicate how likely that board is to lead to a win for the current player. This depth will eventually reach the end states of the game and, when that happens, the estimation can be replaced by certainty.
This estimation may take in any number of different calculations and combine them in an unlimited number of ways. Generally, it seems that the most common way is to use linear combination to combine the different metrics together into the final estimate. This solution is quick, easy, and provides good results. Linear combination involves combining a number of features, like "available moves" or "pieces on board", applying a weight to them, and then adding them together. There is no limit on the number of terms that can be combined in this way or what the weights may be. A simple (arbitrary and poor) example of this sort of algorithm: board_strength = 0.3 * stable_pieces - 0.1 * number_of_valid_moves. In this example, the algorithm is trying to maximize the number of stable pieces and reduce the number of valid moves. Because of the weighting, an increase of 1 stable piece offsets the increase of 3 valid moves.
Possibilities range from choosing a single metric, to using machine learning to discover a superior weighting system. If you're interested in the latter, I found this lecture on Autonomous Derivation interesting, if a bit much to dive into without preparation. You could probably also just search for the ideas covered there, like linear regression and gradient descent, and find something useful.
I'm not really going to discuss the specifics of what metrics should be used or how they should be weighted, but I WILL say that this evaluation function must be fast, since it is one of the most common calculations the AI will make.
Next post, I'll talk about some of the philosophical concepts of performance and look at difference kinds of optimizations.