The End Of Knowledge - Vault 6/100 - CVPR - 2024 - Learning actions, policies, rewards, and environments from videos alone

graph LR classDef video fill:#f9d4d4, font-weight:bold, font-size:14px classDef robotics fill:#d4f9d4, font-weight:bold, font-size:14px classDef learning fill:#d4d4f9, font-weight:bold, font-size:14px classDef future fill:#f9f9d4, font-weight:bold, font-size:14px classDef patterns fill:#f9d4f9, font-weight:bold, font-size:14px Main[Learning actions, policies,
rewards, and environments
from videos alone] --> V[Video Processing] Main --> R[Robotics & Movement] Main --> L[Learning Methods] Main --> F[Future States] Main --> P[Pattern Analysis] V --> V1[Videos teach actions without
direct supervision 1] V --> V2[Gaming speedruns reveal
behavior trends 6] V --> V3[Model detects animals
beyond games 8] V --> V4[VQ-VAE tokens divide
video parts 21] V --> V5[Video values train
learning agents 19] R --> R1[Robots learn sign language
basics 2] R --> R2[Robots acquire facial expression
skills 3] R --> R3[Map human moves to
robot features 4] R --> R4[Movement patterns across
varied subjects 5] R --> R5[New environments create
motion guides 7] L --> L1[Future states need
action models 22] L --> L2[Videos teach without
reward systems 17] L --> L3[End rewards guide
value learning 18] L --> L4[Learning from suboptimal
video data 15] L --> L5[Direct transitions replace
action pairs 16] L --> L6[Data thrives without
action needs 29] F --> F1[Predicting numerous features
simultaneously 9] F --> F2[Shared actions through
future clustering 10] F --> F3[Video transitions reveal
latent actions 11] F --> F4[Model predicts possible
future states 12] F --> F5[Future states weight
policy choices 13] F --> F6[Fast real-world policy
adaptation 14] P --> P1[Interactive spaces from
video content 20] P --> P2[Models enhance world
interaction 23] P --> P3[Images transform into
living spaces 24] P --> P4[Testing through platform games 25] P --> P5[Training future artificial minds 26] P --> P6[Real videos need bigger
structures 27] P2 --> P7[Managing rewards across
virtual worlds 28] P3 --> P8[Structure creates basic
patterns 30] class Main,V,V1,V2,V3,V4,V5 video class R,R1,R2,R3,R4,R5 robotics class L,L1,L2,L3,L4,L5,L6 learning class F,F1,F2,F3,F4,F5,F6 future class P,P1,P2,P3,P4,P5,P6,P7,P8 patterns

Resume:

1.- Learning actions, policies, rewards from videos without explicit supervision

2.- Initial research on teaching robots sign language gestures

3.- Pivot to teaching facial expressions to robots

4.- Motion templates for mapping human expressions to robot features

5.- Domain-agnostic representation of movement patterns across different subjects

6.- Analysis of video game speedruns to infer behavioral patterns

7.- Motion template generation from unseen gaming environments

8.- Model's unexpected success in segmenting animals from non-gaming content

9.- Multiple feature prediction instead of single-mode predictions

10.- Clustering future predictions to identify shared action representations

11.- ILPO: Learning latent actions from video transitions

12.- Generative modeling to predict possible next states

13.- Policy learning through weighting different possible future states

14.- Quick adaptation of learned policies to real environments

15.- Learning optimal value functions from suboptimal video demonstrations

16.- State-to-state transitions versus traditional state-action pairs

17.- Learning without reward functions using video sequence ordering

18.- Value function derivation from end-of-video rewards

19.- Training reinforcement learning agents using learned video values

20.- Genie: Creating interactive environments from video data

21.- Video tokenization using discretized VQ-VAE model

22.- Latent action modeling for future state prediction

23.- Dynamic modeling for environment interaction

24.- Text-generated images becoming interactive environments

25.- Application to platformer game environments

26.- Potential for training future AI agents

27.- Scaling to real-world videos through architecture expansion

28.- Handling reward hacking across multiple generated environments

29.- Benefits of action-free learning for diverse datasets

30.- Low-level representation emergence in environment structure

Knowledge Vault built byDavid Vivancos 2024