TicTacToe: A subtle difference between Minimax and Negamax
Yesterday afternoon I decided to change the core algorithm of my TicTacToe program from using Minimax to use Negamax. The algorithms look identical, Negamax is a similar but more concise algorithm that is ideal of a two-person turn taking game like TicTacToe.
My Minimax algorithm worked just fine, and I had several unit tests ensuring that the computer player will pick the right moves, i.e.
1. Win when possible
2. Block when it has to
3. Fork if possible.
4. Start in the centre.
However, my initial attempt at Negamax just wouldn't work and it consistently picked the wrong moves. This kind of recursive algorithm is difficult to break down into smaller parts without obfuscating the code just to make it easier to write tests for. My only option to debug this with print line statements and trying to dissect the code at various stages.
After several hours of this, I was getting nowhere and gave up for the day. I decided to read the Negamax algorithm again, and there is a difference in how Minimax and Negamax work.
With Minimax, you calculate the heuristic value of the node based on the maximizing player. In other words, when the game is over you calculate the score based on the computer player (lets call this player A) you are trying to calculate the next move for. If player A wins, return +10, if player A loses, return -10, else return 0. We are only interested in calculating the score based on player's A perspective.
function minimax(node, depth, maximizingPlayer)
if depth = 0 or node is a terminal node
return the heuristic value of node //from perspective of player A
if maximizingPlayer
bestValue := -∞
for each child of node
val := minimax(child, depth - 1, FALSE)
bestValue := max(bestValue, val)
return bestValue
else
bestValue := +∞
for each child of node
val := minimax(child, depth - 1, TRUE)
bestValue := min(bestValue, val)
return bestValue
My initial Negamax algorithm did the same thing, it calculated the score only based on Player A's perspective. However on Wikipedia the following is stated;
"The negamax node's return value is a heuristic score from the point of view of the node's current player"
This was the issue, I should've read it thoroughly the first time round without making an assumption about the scoring mechanism. I don't think the Wikipedia pseudocode makes this very clear, so here is the algorithm which I have now followed which works.
function negamax(node, current_player)
if node is a terminal node
return the heuristic value of node based on current player
bestValue := -∞
foreach child of node
val := -negamax(child, other_player)
bestValue := max( bestValue, val )
return bestValue
Therefore if you are calculating the score for Player A, and the current player being called in Negamax is Player B, you calculate the score based on Player B's perspective.