Tuesday, January 31, 2017

GTO and Exploitative Play

GTO and Exploitative play

Today I’m going to expand on a dichotomy that has been largely subtextual but would be more useful made explicit.

GTO
In poker there is a concept known as GTO, or Game Theory Optimal. In a GTO model every decision is optimal because it cannot be exploited by an opposing player even if he has full knowledge of your gameplan because the risk-reward is entirely and mathematically accounted for.

The best illustration of GTO is the Prisoner’s Dilemma. The Prisoner’s Dilemma is easily solvable. A payoff matrix reveals that without knowledge of the opponent’s decision you should always betray them. This is the best decision precisely because it has the highest reward attached to an outcome that cannot be made worse (punished) by the opponent. Interestingly, humans have a cognitive bias toward cooperative behavior even though cooperation is in this case a mathematically losing strategy. This demonstrates the importance of actually making the matrix to determine the GTO in even this simple scenario.

Now, what if we turn to Rock Paper Scissors?
GTO for RPS is to throw rock 1/3 times, paper 1/3 times, and scissors 1/3 times in a random order. This has the highest reward attached to an outcome that cannot be punished. But if your opponent deviates from GTO then this is a little bit problematic. Let’s say that you face an opponent that abandons randomization and always throws rock. GTO demands that you ignore him and continue to randomize your throws. In the case that there is some knowledge of the opponent, GTO is in practice suboptimal depending on how you define optimal. This is of course what makes the a theoretical GTO so interesting in poker, a game in which results are measured in profit over time. Game Theory Optimal carries the highest profit with the least amount or risk but this does not not necessarily equal Most Profitable— in fact GTO only breaks even. Thus, if maximum profit is the goal then GTO is suboptimal in any case in which the opponent is not also playing GTO! By refusing to open yourself up to exploitation, you cannot exploit an opponent.

Exploitative Play
Exploitative play is, in a nutshell, recognizing risk in an opponent’s gameplan and compensating for it. Let’s say I recognize that my opponent throws rock every hand. Even though rock-only is exploitable, GTO cannot exploit it. In order to exploit rock-only I have to abandon GTO and adopt a more paper-heavy strategy. Once I do, provided that my opponent does not deviate from rock-only, Paper-heavy has an increased profit that is exactly as profitable as it proportionally favors paper. HOWEVER, in abandoning GTO to exploit my opponent’s strategy, I have adhered to a new strategy that is equally exploitable. It is entirely possible for my opponent to counter-adjust and switch to Scissors-only. That is the risk attached to abandoning GTO in search of profit. Your opponent may punish you at least as severely as you sought to punish them.

In summary:
GTO is maximizing profit by eliminating risk.
Exploitative play is further maximizing profit while inviting risk.


So what does this mean for Melee?


Potentially a lot. As I’ve repeatedly discussed, mixups are closely related to RPS. There is an inherent GTO. Adhering to or abandoning GTO for a more exploitative strategy is a judgement call that we always make deliberately, intuitively, or out of ignorance. It might be appropriate, it might not be. It’s a matter for individual assessment.

Here is what we should remember:

* GTO goes even unless you gain an unfair advantage, at which point GTO will always win over time precisely because it eliminates risk. It's specifically designed not to lose.

* Similarly, an optimized GTO model is more profitable than an underdeveloped GTO model.
If you are playing with rock (1pt), paper (1pt), and scissors (1pt) but your opponent is playing with rock (1pt), paper (1pt), and nail-clippers(.25 pts) then you win over the long term even without any exploitative play because you're using better options.

* Exploitative play requires that you understand your opponent’s strategy. You might consider it Attacking your Opponent’s Understanding. Maybe your opponent’s brain honestly believes that rock-only is optimal. Or maybe he’s just leading with rock to try and bait a paper switch. In a fighting game in which prepared reactions can trump a mixup scenario altogether there's a huge difference. In order to be successful, exploitative play requires 1) information and 2) acumen, otherwise it is not strategy, it’s just blind hope and high-risk variance.



Further reading:
https://arxiv.org/pdf/1404.5199v1.pdf and
http://poker.cs.ualberta.ca/publications/IJCAI03.pdf

2 comments: