Welcome back to my multi-part discussion on Reversi AI, and particularly the related performance ideas. This time around, I'm going to wax philosophical and set up the foundation for future posts.
Philosophical discussion of performance
One of the key, if not THE key, tenants to performance improvement is measurement. Optimizing without taking measurements is like shooting in the dark. Finding out where the program is spending its time is key to good optimization. The biggest bang for the buck will be gotten by tweaking the code where the most time is spent. Programs often follow the 80/20 rule, and this "20" is often the "inner loop" of the program. It's often easy to tell where this is, but it's just as easy to be wrong and waste time tweaking the wrong code. This is where profilers come in. I'm not going to go into their usage or provide recommendations, but a good profiler is an excellent tool to have available.
Tweaking the performance of an inner loop can be critical to the performance of a program. Let's say you want to calculate moves to a depth where you need to evaluate 1 million different boards (maybe 7 moves ahead). If you can remove 1000 clock cycles from that evaluation function, that's 1 billion cycles saved, or ~1 second on a 1Ghz processor.
There is an interesting trade-off between fidelity of the information gleaned and the depth of the search. Let's say that there's a calculation that will provide a pretty good estimation of how often a certain board will result in a player winning. Calculations take time and will need to be done for any board that is evaluated. When considering performance, this time must be gauged against the increase in fidelity provided by that value. Let's say that, during the time lost to calculate that value, the program could search 1 move deeper. At this point, you have to look at the gains of the new metric versus knowing what the game will look like after 1 more ply. Almost always, 1 ply deeper is better.
In fact, you can turn this concept right around... instead of trying to develop a new algorithm that provides higher fidelity, look at what you have and see if you can sacrifice some fidelity but gain extra depth. As an example of this, it is very common for strong Reversi AI to evaluate boards based on matching against patterns rather than looking at something like stable pieces. The patterns (assuming they're reasonably chosen) provide some of the same value as things like the number of stable pieces or edge evaluations (but not all), but can be done MUCH faster. This extra time easily adds up to another move when you do it enough.
A lot may be gained from tweaking the code and code can be tweaked in a variety of different ways. I like to categorize these as "large-scale optimizations" and "small-scale optimizations" (this is my own thinking and not any sort of standard terminology). Bigger gains can be found from employing large-scale optimizations, but they also result in significantly more code churn. The possible benefit must be weighed against the amount of time spent to make the change. Calling it "small-scale" doesn't necessarily mean that the gains to be had aren't significant, but that the amount of change to your program is smaller. Sometimes huge benefits can be had by making small changes. This is where a profiler would help.
Large scale optimizations require significant changes to be made to the behavior of the code. Typically these are done by changing the algorithm used: changing the runtime complexity to something more efficient. Doing this is often easier than small-scale optimizations, because often it's pretty straightforward to substitute one algorithm for another. It's common for areas of a program to have linear runtime performance that can be easily be made sub-linear with a simple data structure change such as moving from a list to a tree or even an array. Since the gains are greater, I recommend large-scale optimizations before small scale ones (unless you only need very minor performance improvements). Also, tweaking code with small-scale optimizations will be wasted if you change to a new algorithm.
There are numerous places I employed these large-scale optimizations, particularly around the game board and the "think-ahead" methodology of the AI. I don't have the performance impact numbers anymore, but it is MANY, MANY times faster than it used to be when I was first putting things together. I'll talk about these in more detail later.
Small scale optimizations are generally taking the existing code and making tweaks that reduce the number of CPU cycles required. The runtime efficiency is basically the same, but the code runs faster. There are pretty standard ways to start doing this, like making better use of pointers, but also more advanced methods, like reorganizing code/data to cause better cache utilization, improve pipelining, or to result in less memory paging. Small scale optimizations are where performance measuring is really important.
Running a profiler on my Reversi game was quite revealing. As it turns out, there are two key places where the program was spending its time:
- Calculating possible moves / which positions are flipped when a move is made
- Estimating the value of the board
Before it's possible to determine the best move, you need to be able to determine which moves are even valid. That means that this calculation must be done once per ply. Determining all the valid moves on a board is not trivial, as you may have noticed from the earlier description. Likewise, estimating the value of the board is something that happens very often... every board at the final search depth must have this calculation done. These calculations took a significant amount of time, and they were called often (over 90% of the time was spent there), so they were very good candidates for optimization.
Now that I've set up the discussion, next time I'll start giving specifics about what I'm calling large-scale optimizations.