Game Playing: Adversarial Search

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

What kind of games we would like to explore?

Sequence of moves
Has got some defined rules
Rules also gives idea about reward for a move
Need to maximize our reward

The goal of adversarial search is to create an intelligent agent that can make decisions to maximize its chances of winning the game. The agent needs to consider the possible moves of both players and anticipate the outcomes of these moves to make informed decisions.

Table of contents:

Two player Games
Some Definitions
Mini-max algorithm
Alpha-Beta pruning
Some diversion
- Iterative Deepening Search
ML in Game Playing
- Samuel's Checker Player
- Quiescence Search
Some interesting development
- AlphaGo

Two player Games

The game usually proceeds as follows -

step 1 - Player 1 makes a move
step 2 - Game completed ??
If not -
step 3 - Player 2 makes a move
step 4 - Game completed ??
else repeat from step 1

Some Definitions

State - one of the configurations the game board can take
Initial state - Board state before any player makes a move
Successor function - Gives pair of moves and state telling what move can lead to what state
Terminal State - The game state where game ends.
Utility function - Associates a terminal state with a numerical value

Mini-max algorithm

Lets say you are player 1 and your opponent is player 2 and both of you play optimally that is every move you take can give you best possible future outcome, when you make a move you are trying to win the game not only gain some extra points in particular state and lose the game. Let 'f' be a function that gives points gained by player 1, so player 1 tries to maximize 'f' and player 2 tries to minimize.

This is nothing bit mini-max algorithm.

                                4 __ max
             |------------------|-------------------|
             2                  3                   4 ___ min
      |------|------|    |------|------|      |-----|-----|
      9      4      2    7      8      3      12    8     4
    |---|  |---|  |---| |---| |---|  |---|   |---| |--|  |---|
    9   4  4   1  0   2 7   1 8   7  3   0   0  12 8  2  4   3

This shows how player 1 can get score of 4 if both are playing optimally.

The above is example of a Game Tree in which branches denotes the possible moves at a step and all terminal nodes gives final states possible thus from here we get what reward we can get from the particular move.

def nextState(move, state):
    '''
        gives next state based on current state
        (modify accordingly)
    '''
    pass

moves = [all possible moves e.g left, right ...]
terminalState = function that checks for terminal state 

def f(player1, state):
    if terminalState(state):
        return reward(state) 

    if player1:
        score = 0
        for move in moves:
            newState = nextState(move, currentState)
            score = max(score, f(not player1, newState))
        return score
    else:
        score = float('inf')
        for move in moves:
            newState = nextState(move, currentState)
            score = min(score, f(not player1, newState))
        return score

This shows a example of any general game.

Time complexity - O(b^m)
Space complexity - O(b*m)
b - Branching factor
m - length of game

This seems to be a good strategy but not good when it comes to complecated games like chess for which typical branching factor is 35 for game length of lets say 100 it's going to take forever to make a move so need to find some strategy to improve the algorithm.

Alpha-Beta pruning

If you observe the tree creted while playing using the above strategy there occurs some places where we are also calculating the scores from the subtrees which are not necessary. Here Alpha-Beta pruning helps as it makes some smart decisions by cutting off some subtrees thus helps in reducing the time complexity further.
The improvement is as follows:

alpha - maximum value that a node trying to maximize the outcome can take
beta - minimum value that a node trying to minimize the outcome can take

When for any node alpha >= beta we can ignore the entire subtree rooted at that node.

def f(player1, state, alpha, beta):
    if state in terminalState:
        return reward(state)

    if player1:
        score = 0
        for move in moves:
            newState = nextState(move, currentState)
            score = max(score, \
                    f(not player1, newState, alpha, beta))
            alpha = max(alpha, score)
            if alpha > beta:
                return score
        return score
    else:
        score = float('inf')
        for move in moves:
            newState = nextState(move, currentState)
            score = min(score, \
                    f(not player1, newState, alpha, beta))
            beta = min(beta, score)
            if alpha > beta:
                return score
        return score

This code shows improvement required to utilize the alpha beta pruning. Initilly the value of alpha is set usually to -infinity and beta to +infinity.

                                4 __ max
             |------------------|-------------------|
             2                  3                   4 ___ min
      |------|------|    |------|------|      |-----|-----|
      9      4      2    7      8      3      12    8     4
    |---|  |---|  |---| |---| |---|  |---|  |---|  |--|  |---|
    4   9  1   4  0   2 1   7 7   8  0   3  1  12  2  8  3   4

This is worst tree for our strategy as we have to check of all

                                4 __ max
             |------------------|-------------------|
             4                  3                   2 __ min
      |------|------|    |------|------|      |-----|-----|
      4      8-X   12-X  8-X    7-X    3-X    2     4-X   8-X
    |---|                                    |--| 
    4   3                                    2  0

This is best as we can ignore all subtree denoted by X

Time complexity - O(b^(m/2))
Space Complexity - O(b*m)

Some diversion

Iterative Deepening Search

This is some modification of depth first search. It first tries to search till mentioned depth next increases its bound for depth and searches again, if results are found then it ends its search.

It helps in not going forever in wrong direction and blowing up memory.

def IDS(depth, currstate, max_depth):
        if depth > max_depth:
            return None
        if currstate == goal_state:
            return reward(currstate)
        for next_state in action(currstate):
            score = IDS(depth+1, next_state)
            if score:
                return score
for max_depth in range(5, N):
    reward = IDS(0, initial_state, max_depth)
    if reward:
        break

We can use this idea to improve our previous strategy.

ML in Game Playing

Earlier we tried going till all depth i.e. we discovered all possible ways in which game can end and considered the best move, now we can try going till a particular depth but don't have any scoring function that can tell us if we are going in right direction. Lets define a function also known as Evaluation function e.g. if we are playing chess the function for a player can be linear cobination of all chess pieces (asigned some weight) and the opponent pieces they attack. This really depends on designer of the strategy.

Eval(s) = w1f1(s) + w2f2(s) + ........ + wn*fn(s)

where f1(s) can be queen attacking opponent pieces. w1 is weight that can be learned.

Samuel's Checker Player

In this computer acts as 2 player. Lets say X, Y in which X adjusts its weight after every move and Y keeps same for entire game, If X wins then its weight are assigned to Y and repeat the process again.

Weight update -
Eval(s), Eval(s') be initial and backed up value of a node.

If Eval(s') > Eval(s) then the weights that are positive are increased further and negative are decreased. Opposite happens in other case.

The problem with fixed depth search - Unachievable goals appear achievable, it can give short term gain but inevtable losses are postponed.

Quiescence Search

This focuses on searching deeper into positions that are considered "quiet" or "quiescent." A quiescent position is one where the board is relatively stable, with fewer tactical possibilities, captures, or other dynamic events. By exploring quiescent positions, the algorithm can more accurately evaluate the static aspects of the position and avoid being misled by dynamic tactical variations.

Some interesting development

AlphaGo

AlphaGo is a computer program developed by DeepMind, a subsidiary of Alphabet Inc. (Google's parent company), that gained international fame for its success in playing the ancient board game Go. The development of AlphaGo represents a significant milestone in artificial intelligence and machine learning, particularly in the domain of deep learning and reinforcement learning.

Other games like Tic Tac Toe(Tied with best player in world), Othello(Computer is better than any human), Scrabble (Maven beat world champions Joel Sherman and Matt Graham), Backgammon(1992, Tesauro combines 3-ply search & neural networks (with 160 hidden units) yielding top-3 player), Chess(1997, Deep Blue beat human champion Gary Kasparov in six game match, Deep Blue searches 200M positions/second, up to 40 ply).