The End Of Knowledge - Vault 6/46 - CVPR - 2019

graph LR classDef value fill:#f9d4d4, font-weight:bold, font-size:14px classDef planning fill:#d4f9d4, font-weight:bold, font-size:14px classDef implementation fill:#d4d4f9, font-weight:bold, font-size:14px classDef challenges fill:#f9f9d4, font-weight:bold, font-size:14px classDef main fill:#f9d4f9, font-weight:bold, font-size:14px Main[Value Focused Models] --> A[Value-Focused Models] Main --> B[Planning Approaches] Main --> C[Model Implementation] Main --> D[Challenges and Limitations] A --> A1[Value-focused models
predict optimal planning
functions 1] A --> A2[Real-hypothetical Bellman equations
ensure consistency 2] A --> A3[Telescoping extends consistency
over time steps 3] A --> A4[Multiple value functions
improve generalization 4] A --> A5[Predictron estimates consistent
value functions 5] A --> A6[Lambda Predictron combines
n-step predictors 6] B --> B1[Action Predictron handles
control tasks 10] B --> B2[Imagination vs reality
policy flexibility 11] B --> B3[Grounded Predictron matches
real-world policies 12] B --> B4[Tree Predictron optimizes
action sequences 13] B --> B5[Value Iteration Network
applies convolutional networks 17] B --> B6[Algorithmic function approximation
uses networks 18] C --> C1[Model-free training for
value-focused models 7] C --> C2[Consistency updates improve
model predictions 8] C --> C3[Reward grounding aligns
intermediate predictions 9] C --> C4[Value Prediction Network
implements Grounded Predictron 14] C --> C5[Tree QN improves
Atari performance 15] C --> C6[Monte Carlo Tree
Search Networks scale 16] D --> D1[Universal approximators learn
planning algorithms 19] D --> D2[AlphaGo demonstrates implicit
planning behaviors 20] D --> D3[Combining learning and
planning methods 21] D --> D4[Neural networks represent
models implicitly 22] D --> D5[Value-focused models improve
sample efficiency 23] D --> D6[Exploration limitations in
unseen states 24] class Main main class A,A1,A2,A3,A4,A5,A6 value class B,B1,B2,B3,B4,B5,B6 planning class C,C1,C2,C3,C4,C5,C6 implementation class D,D1,D2,D3,D4,D5,D6 challenges

Resume:

1.- Value-focused models: Models that focus exclusively on predicting value functions, which are sufficient for optimal planning and ignore irrelevant details.

2.- Real vs. hypothetical Bellman equations: Consistency between real-world value functions and model-based value predictions is key for effective planning.

3.- Telescoping Bellman equations: Extending consistency requirements over multiple time steps in both real and hypothetical dimensions.

4.- Many value-focused models: Learning models consistent with multiple value functions for improved data efficiency and generalization.

5.- Predictron: A multi-step, many-value-focused model that unrolls predictions to estimate value functions consistently.

6.- Lambda Predictron: Combines different n-step predictors using TD(λ)-style weighting for more robust value estimation.

7.- Model-free training of model-based systems: Training value-focused models end-to-end as if they were model-free value function approximators.

8.- Consistency updates: Improving model predictions by enforcing consistency between different n-step value estimates, even without new experiences.

9.- Reward grounding: Aligning intermediate model predictions with actual environment rewards for improved temporal consistency.

10.- Action Predictron: Extends the Predictron to handle actions and policies, allowing for control tasks.

11.- Hypothetical vs. real policies: Allowing different policies in imagination vs. reality for more flexible planning.

12.- Grounded Predictron: Constrains imagined actions to match real-world policies for improved consistency.

13.- Tree Predictron: Uses a tree structure to optimize over sequences of actions, similar to Q-learning approaches.

14.- Value Prediction Network: A specific implementation of the grounded Predictron concept.

15.- Tree QN: An implementation of the Tree Predictron concept, showing improved performance on Atari games.

16.- Monte Carlo Tree Search Networks: Efficiently scaling tree search by using incremental simulations instead of brute-force expansion.

17.- Value Iteration Network: Applies value iteration over an entire implicit state space using convolutional neural networks.

18.- Algorithmic function approximation: Approximating planning algorithms directly with neural networks instead of just value functions.

19.- Universal algorithmic function approximators: Using powerful recurrent neural networks to learn planning algorithms from scratch.

20.- Implicit planning in AlphaGo: Demonstrating that deep neural networks can implicitly capture complex planning-like behaviors.

21.- Combining learning and planning: Exploring methods that leverage both learning and planning for improved performance.

22.- Implicit models and planning: Using neural networks to represent both world models and planning algorithms implicitly.

23.- Data efficiency in model-based RL: Learning about multiple value functions can improve sample efficiency compared to model-free methods.

24.- Exploration limitations: Value-focused models can't magically explore unseen states but can use data more efficiently.

25.- State representation learning: Value-focused models can learn state representations that focus on relevant features for the task.

26.- Policy iteration vs. value iteration: Different approaches to optimizing policies, either through iterative improvement or direct optimization.

27.- Stochastic vs. deterministic policies: Challenges in handling stochastic policies in model-based planning.

28.- Scalability of tree search: Addressing computational challenges when planning over long horizons or with large action spaces.

29.- Inductive biases for planning: Exploring appropriate neural network structures for capturing planning-like behaviors.

30.- Trade-offs between structured and universal approximators: Balancing specialized planning architectures with more general neural network approaches.

Knowledge Vault built byDavid Vivancos 2024