The End Of Knowledge - Vault 6/21 - CVPR - 2017 - Towards Reinforcement Learning in the Real World

graph LR classDef main fill:#f9d9c9, font-weight:bold, font-size:14px classDef foundations fill:#d4f9d4, font-weight:bold, font-size:14px classDef techniques fill:#d4d4f9, font-weight:bold, font-size:14px classDef environments fill:#f9f9d4, font-weight:bold, font-size:14px classDef challenges fill:#f9d4f9, font-weight:bold, font-size:14px classDef applications fill:#d4f9f9, font-weight:bold, font-size:14px Main[Towards Reinforcement Learning
in the Real
World] Main --> A[AI Foundations] Main --> B[RL Techniques] Main --> C[Training Environments] Main --> D[Challenges and Future Directions] Main --> E[Applications and Implementations] A --> A1[Intelligent creatures evolved through
increasing complexity 1] A --> A2[AI requires complex and
simple algorithms 2] A --> A3[Rich environments elicit diverse
AI behaviors 3] A --> A4[Human games repurposed for
AI training 4] A --> A5[Montezumas Revenge: challenging for
AI 5] A --> A6[Feudal RL uses hierarchy
of policies 6] B --> B1[Feudal Networks adapt principles
using neural networks 7] B --> B2[Elastic Weight Consolidation retains
previous task performance 8] B --> B3[Progressive Neural Networks avoid
catastrophic forgetting 9] B --> B4[DISTRAL trains task-specific policies
near shared policy 10] B --> B5[Auxiliary tasks scaffold reinforcement
learning 11] B --> B6[UNREAL agents learn policy
and pixel changes 12] C --> C1[3D mazes test AI
navigation abilities 13] C --> C2[Navigation agents use memory,
auxiliary tasks 14] C --> C3[StreetLearn: Google Street View
as RL environment 15] C --> C4[StreetLearn agent has specialized
neural pathways 16] C --> C5[Agent navigates NYC using
small RGB images 17] C --> C6[Parkour environments test physical
agent capabilities 18] D --> D1[Separating inputs facilitates robust
locomotion learning 19] D --> D2[Progressive curricula improve learning
and transfer 20] D --> D3[Humanoids develop idiosyncratic but
effective locomotion 21] D --> D4[Complex simulations platform for
real-world applications 22] D --> D5[GPU clusters enable computationally-intensive
RL algorithms 23] D --> D6[Simplified representations make RL
more accessible 24] E --> E1[Sample efficiency challenge in
dialogue robotics 25] E --> E2[Hierarchical RL promising for
useful subgoals 26] E --> E3[AI achievements are steps
towards AGI 27] E --> E4[Adaptive computation memory attention
key AI ingredients 28] E --> E5[Generalizing from simulation to
reality important 29] E --> E6[Research aims for efficient,
flexible, robust AI 30] class Main main class A,A1,A2,A3,A4,A5,A6 foundations class B,B1,B2,B3,B4,B5,B6 techniques class C,C1,C2,C3,C4,C5,C6 environments class D,D1,D2,D3,D4,D5,D6 challenges class E,E1,E2,E3,E4,E5,E6 applications

Resume:

1.- Intelligent creatures evolved through increasing complexity in organisms and their environments.

2.- Artificial intelligence requires both complex algorithms for power and simple ones for generality.

3.- Environments for AI training should have rich variability, grounding in physics, and task complexity to elicit diverse behaviors.

4.- Human games are often repurposed as AI training environments due to their challenges, diversity, and built-in reward functions.

5.- Montezuma's Revenge is a challenging Atari game for AI due to sparse, delayed rewards and the need for human-like concepts.

6.- Feudal reinforcement learning uses a hierarchy of policies with increasingly abstract policies higher up and more temporal resolution lower down.

7.- Feudal Networks for Deep RL adapt feudal learning principles using neural networks with a manager setting goals and a worker achieving them.

8.- Elastic Weight Consolidation allows neural networks to learn new tasks while retaining performance on previous tasks by constraining important weights.

9.- Progressive Neural Networks avoid catastrophic forgetting by adding columns for new tasks with lateral connections to previous frozen columns.

10.- DISTRAL trains task-specific policies while keeping them close to a shared policy, allowing transfer while avoiding divergence or collapse.

11.- Auxiliary tasks like depth prediction and loop closure classification provide stable gradients that scaffold and speed up reinforcement learning.

12.- UNREAL agents learn both a standard policy and one predicting pixel changes, with experience replay improving data efficiency.

13.- Navigation tasks in 3D mazes test AI's ability to explore, memorize, and locate goals using only visual inputs.

14.- Navigation agents benefit from memory, auxiliary tasks like depth prediction, and structured architectures separating representation learning and locomotion.

15.- StreetLearn turns Google Street View of New York City into an interactive RL environment for training navigation at scale.

16.- The StreetLearn navigation agent has specialized pathways for visual processing, goal representation, and locomotion, enabling both task-specific and general navigation.

17.- The StreetLearn navigation agent can localize itself and navigate to goals in NYC using only 84x84 pixel RGB images.

18.- Continuous control tasks in diverse parkour environments test the physical capabilities of simulated agents.

19.- Separating proprioceptive and exteroceptive inputs facilitates learning robust and transferable locomotion skills in simulated agents.

20.- Curricula that progress from easier to harder terrain during an episode lead to better overall learning and transfer.

21.- Humanoid agents develop idiosyncratic but effective and robust locomotion strategies when trained with simple rewards like forward progress.

22.- Complex simulated environments provide a platform for developing AI systems with potential for real-world applications.

23.- Large-scale GPU clusters enable experiments with computationally-intensive RL algorithms using RGB visual inputs.

24.- Simplified representations and transfer learning can make challenging RL domains more accessible for research with limited compute.

25.- Sample efficiency remains a significant challenge for applying deep RL to domains like dialogue and robotics where data is expensive.

26.- Hierarchical RL with multiple levels of temporal abstraction is a promising approach for devising useful and achievable subgoals.

27.- Current AI achievements are steps towards AGI, but debate remains over the definition and remaining challenges.

28.- Adaptive computation, memory and attention within neural networks are key ingredients making progress towards more capable and general AI systems.

29.- Generalizing from simulation to the real world is an important frontier for AI systems aiming to tackle practical applications.

30.- Ongoing research aims to make AI systems more sample-efficient, flexible, and robust in complex environments through architectural and algorithmic innovations.

Knowledge Vault built byDavid Vivancos 2024