Knowledge Vault 6 /88 - ICML 2023
Adapting to game trees in zero-sum imperfect information games
Côme Fiegel · Pierre Menard · Tadashi Kozuno · Remi Munos · Vianney Perchet · Michal Valko
< Resume Image >

Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:

graph LR classDef game fill:#f9d4d4, font-weight:bold, font-size:14px classDef strategies fill:#d4f9d4, font-weight:bold, font-size:14px classDef exploration fill:#d4d4f9, font-weight:bold, font-size:14px classDef learning fill:#f9f9d4, font-weight:bold, font-size:14px A[Adapting to game
trees in zero-sum
imperfect information games] --> B[Game
Information] A --> C[Strategies] A --> D[Exploration] A --> E[Learning
and
Complexity] B --> B1[Partial information
games. 1] B --> B2[Players remember
previous moves. 2] B --> B3[Indistinguishable game
states. 5] B --> B4[Probability of action
pairs. 6] B --> B5[Entropies: Tsallis,
Shannon, Dilated. 11, 12, 13] B --> B6[Balanced exploration.
14] C --> C1[Near-optimal
strategies. 3] C --> C2[Minimize
regret. 4] C --> C3[Regret on information
sets. 7] C --> C4[Proportional action
sampling. 9] C --> C5[FTRL: Minimize
regret. 10] C --> C6[Regularizer in
FTRL. 11] D --> D1[Reduce loss estimate
variance. 8] D --> D2[Adaptive learning
rates. 15] D --> D3[Bounds with high
probability. 16] D --> D4[Theoretical regret
limits. 17] D --> D5[Realizations for
-optimal strategy. 18] D --> D6[Learn from game
trajectories. 19] E --> E1[Optimal rates
with known structure. 20] E --> E2[Near-optimal with
unknown structure. 21] E --> E3[Estimation bias
components. 22] E --> E4[Regularization
component. 23] E --> E5[Estimate variance
component. 24] E --> E6[Algorithm computational
cost. 27] class A,B,B1,B2,B3,B4,B5,B6 game class C,C1,C2,C3,C4,C5,C6 strategies class D,D1,D2,D3,D4,D5,D6 exploration class E,E1,E2,E3,E4,E5,E6 learning

Resume:

1.- Imperfect information games: Games where players have partial information about the current game state.

2.- Perfect recall: Players remember their previous moves and information sets form a tree structure.

3.- ε-optimal strategies: Strategies that are within ε of the optimal strategy in terms of expected value.

4.- Regret minimization: Minimizing the difference between cumulative gain and gain of the best fixed strategy.

5.- Information sets: Sets of game states indistinguishable to a player.

6.- Realization plan: Probability of reaching a given information set-action pair.

7.- Counterfactual regret: Regret defined on information sets rather than full game states.

8.- Implicit exploration (IX): Technique to reduce variance in loss estimates.

9.- Balanced policy: Sampling actions proportionally to size of associated sub-trees.

10.- Follow the Regularized Leader (FTRL): Algorithm that minimizes regret by following the best regularized strategy.

11.- Tsallis entropy: Regularizer used in FTRL algorithms.

12.- Shannon entropy: Alternative regularizer used in FTRL algorithms.

13.- Dilated entropy: Entropy regularizer applied across the game tree.

14.- Balanced transitions: Transition kernel defined to balance exploration across the game tree.

15.- Adaptive learning rates: Learning rates that adapt based on observed game structure.

16.- High probability bounds: Bounds that hold with high probability rather than just in expectation.

17.- Lower bounds: Theoretical lower limits on regret or sample complexity.

18.- Sample complexity: Number of game realizations needed to learn an ε-optimal strategy.

19.- Trajectory feedback: Learning from observed game trajectories rather than full game information.

20.- Balanced FTRL: Algorithm using balanced transitions to achieve optimal rates with known structure.

21.- Adaptive FTRL: Algorithm adapting to unknown structure while maintaining near-optimal rates.

22.- BIAS terms: Components of regret related to estimation bias.

23.- REG term: Component of regret related to regularization.

24.- VAR term: Component of regret related to variance of estimates.

25.- Azuma-Hoeffding inequality: Concentration inequality for bounded martingale difference sequences.

26.- Freedman's inequality: Concentration inequality for martingales with bounded conditional variance.

27.- Time complexity: Computational cost of algorithm updates.

28.- Kuhn poker: Simple poker game used as a benchmark.

29.- Leduc poker: More complex poker variant used as a benchmark.

30.- Liars dice: Dice game used as a benchmark for imperfect information algorithms.

Knowledge Vault built byDavid Vivancos 2024