Adversarial search
Professor Devin Balkcom
devin@cs.dartmouth.edu
office: Sudikoff 206
My happiness, your expense: I win (+1), you lose (-1).
Basic case: zero-sum, deterministic, perfect-information, 2-player games.
XKCDMax will play x's, and Min will play o's.
Some terminal states:
With optimal play, every state has a fixed utility.
Min gets last move.
Min gets last move.
Min gets last move.
Max gets the move prior to the last move.
Optimal values, assuming two optimal players.
Can Max do better if Min is not optimal?
What if we'd gotten a 2 before the 14? Order matters!
(Photo by Cburnett, Wikipedia.)
material: pawn 1, bishop 3, rook 4, queen 8, king ???
Horizon effects are a problem: stalling tactics can push bad states beyond the depth searched.
Quiescent states are good candidates for cutoff.
Perhaps search slightly deeper, along captures to determine quiescence? Or a single best move each time.
Assume current player forfeits turn. If position still quite strong, a very strong position indeed.
Transposition table: avoid re-visiting states in search.
Problems with letting Java just hash that object:
So we need a fast hash function and we won't store objects -- just hope for limited collisions.
Some sources:
Max moves, then chance, then Min moves, then chance. In Backgammon, dice are rolled to give available moves.
Expectiminimax can compute the expected value of any node: value of sum of the known values for the terminal node.
Assumes values are actual utilities and that humans maximize expected value -- if that were true, Sheldon Adelson would not be so rich.
Technical problems: computation and modeling.
In deterministic games, only order matters. But now:
An order-preserving transformation changes actions.
Cards dealt randomly at the start of the match.
Bad approach:
Problems:
Mondays:
Tuesdays:
I can't remember what day it is. Choose B?
Oderint, dum metuant. (Lucius Accius)
Prisoner's dilemma:
B Co-op | B Defect | |
---|---|---|
A Co-op | (-1, -1) | (-9, 0) |
A Defect | (0, -9) | (-6, -6) |
payoff(A) + payoff(B) different for each outcome
Is it possible to co-operate to raise both payoffs?
Each of $n$ players has:
Example:
Some strategies are better for me than others
no matter what you do.
B Co-op | B Defect | |
---|---|---|
A Co-op | (-1, -1) | (-9, 0) |
A Defect | (0, -9) | (-6, -6) |
Assume we intend to play 20 games. Assume both players agree to play the following strategy:
Punishment: I will cooperate, until you do not. If you defect on one round, I will take revenge for one turn by defecting, and then go back to cooperating.
Is this strategy believable? (Do I believe your threat? Do we cooperate?)
Assume we intend to play 20 games. Assume both players agree to play the following strategy:
Grim trigger: I will cooperate, until you do not. If you defect on one round, I will take revenge for all remaining rounds, and defect each time.
Is this strategy believable? (Do I believe your threat? Do we cooperate?)
Grim trigger strategy: I'll cooperate until you defect. Then I will hate you and defect until the game ends.
Is your cooperation credible, ever? (Propagation of horizon effects.)
Is payoff infinite? How do we compare infinities?
Discount factor models possibility that game will end at some time, but we don't know when.
Cooperation is possible!