Adapting to game trees in zero-sum imperfect information games

Côme Fiegel · Pierre Menard · Tadashi Kozuno · Remi Munos · Vianney Perchet · Michal Valko

**Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:**

graph LR
classDef game fill:#f9d4d4, font-weight:bold, font-size:14px
classDef strategies fill:#d4f9d4, font-weight:bold, font-size:14px
classDef exploration fill:#d4d4f9, font-weight:bold, font-size:14px
classDef learning fill:#f9f9d4, font-weight:bold, font-size:14px
A[Adapting to game

trees in zero-sum

imperfect information games] --> B[Game

Information] A --> C[Strategies] A --> D[Exploration] A --> E[Learning

and

Complexity] B --> B1[Partial information

games. 1] B --> B2[Players remember

previous moves. 2] B --> B3[Indistinguishable game

states. 5] B --> B4[Probability of action

pairs. 6] B --> B5[Entropies: Tsallis,

Shannon, Dilated. 11, 12, 13] B --> B6[Balanced exploration.

14] C --> C1[Near-optimal

strategies. 3] C --> C2[Minimize

regret. 4] C --> C3[Regret on information

sets. 7] C --> C4[Proportional action

sampling. 9] C --> C5[FTRL: Minimize

regret. 10] C --> C6[Regularizer in

FTRL. 11] D --> D1[Reduce loss estimate

variance. 8] D --> D2[Adaptive learning

rates. 15] D --> D3[Bounds with high

probability. 16] D --> D4[Theoretical regret

limits. 17] D --> D5[Realizations for

-optimal strategy. 18] D --> D6[Learn from game

trajectories. 19] E --> E1[Optimal rates

with known structure. 20] E --> E2[Near-optimal with

unknown structure. 21] E --> E3[Estimation bias

components. 22] E --> E4[Regularization

component. 23] E --> E5[Estimate variance

component. 24] E --> E6[Algorithm computational

cost. 27] class A,B,B1,B2,B3,B4,B5,B6 game class C,C1,C2,C3,C4,C5,C6 strategies class D,D1,D2,D3,D4,D5,D6 exploration class E,E1,E2,E3,E4,E5,E6 learning

trees in zero-sum

imperfect information games] --> B[Game

Information] A --> C[Strategies] A --> D[Exploration] A --> E[Learning

and

Complexity] B --> B1[Partial information

games. 1] B --> B2[Players remember

previous moves. 2] B --> B3[Indistinguishable game

states. 5] B --> B4[Probability of action

pairs. 6] B --> B5[Entropies: Tsallis,

Shannon, Dilated. 11, 12, 13] B --> B6[Balanced exploration.

14] C --> C1[Near-optimal

strategies. 3] C --> C2[Minimize

regret. 4] C --> C3[Regret on information

sets. 7] C --> C4[Proportional action

sampling. 9] C --> C5[FTRL: Minimize

regret. 10] C --> C6[Regularizer in

FTRL. 11] D --> D1[Reduce loss estimate

variance. 8] D --> D2[Adaptive learning

rates. 15] D --> D3[Bounds with high

probability. 16] D --> D4[Theoretical regret

limits. 17] D --> D5[Realizations for

-optimal strategy. 18] D --> D6[Learn from game

trajectories. 19] E --> E1[Optimal rates

with known structure. 20] E --> E2[Near-optimal with

unknown structure. 21] E --> E3[Estimation bias

components. 22] E --> E4[Regularization

component. 23] E --> E5[Estimate variance

component. 24] E --> E6[Algorithm computational

cost. 27] class A,B,B1,B2,B3,B4,B5,B6 game class C,C1,C2,C3,C4,C5,C6 strategies class D,D1,D2,D3,D4,D5,D6 exploration class E,E1,E2,E3,E4,E5,E6 learning

**Resume: **

**1.-** Imperfect information games: Games where players have partial information about the current game state.

**2.-** Perfect recall: Players remember their previous moves and information sets form a tree structure.

**3.-** ε-optimal strategies: Strategies that are within ε of the optimal strategy in terms of expected value.

**4.-** Regret minimization: Minimizing the difference between cumulative gain and gain of the best fixed strategy.

**5.-** Information sets: Sets of game states indistinguishable to a player.

**6.-** Realization plan: Probability of reaching a given information set-action pair.

**7.-** Counterfactual regret: Regret defined on information sets rather than full game states.

**8.-** Implicit exploration (IX): Technique to reduce variance in loss estimates.

**9.-** Balanced policy: Sampling actions proportionally to size of associated sub-trees.

**10.-** Follow the Regularized Leader (FTRL): Algorithm that minimizes regret by following the best regularized strategy.

**11.-** Tsallis entropy: Regularizer used in FTRL algorithms.

**12.-** Shannon entropy: Alternative regularizer used in FTRL algorithms.

**13.-** Dilated entropy: Entropy regularizer applied across the game tree.

**14.-** Balanced transitions: Transition kernel defined to balance exploration across the game tree.

**15.-** Adaptive learning rates: Learning rates that adapt based on observed game structure.

**16.-** High probability bounds: Bounds that hold with high probability rather than just in expectation.

**17.-** Lower bounds: Theoretical lower limits on regret or sample complexity.

**18.-** Sample complexity: Number of game realizations needed to learn an ε-optimal strategy.

**19.-** Trajectory feedback: Learning from observed game trajectories rather than full game information.

**20.-** Balanced FTRL: Algorithm using balanced transitions to achieve optimal rates with known structure.

**21.-** Adaptive FTRL: Algorithm adapting to unknown structure while maintaining near-optimal rates.

**22.-** BIAS terms: Components of regret related to estimation bias.

**23.-** REG term: Component of regret related to regularization.

**24.-** VAR term: Component of regret related to variance of estimates.

**25.-** Azuma-Hoeffding inequality: Concentration inequality for bounded martingale difference sequences.

**26.-** Freedman's inequality: Concentration inequality for martingales with bounded conditional variance.

**27.-** Time complexity: Computational cost of algorithm updates.

**28.-** Kuhn poker: Simple poker game used as a benchmark.

**29.-** Leduc poker: More complex poker variant used as a benchmark.

**30.-** Liars dice: Dice game used as a benchmark for imperfect information algorithms.

Knowledge Vault built byDavid Vivancos 2024