The End Of Knowledge - Vault 2 - ICLR (2014-2023)

graph LR classDef computational fill:#f9d4d4, font-weight:bold, font-size:14px; classDef policies fill:#d4f9d4, font-weight:bold, font-size:14px; classDef domains fill:#d4d4f9, font-weight:bold, font-size:14px; classDef representations fill:#f9f9d4, font-weight:bold, font-size:14px; classDef engineering fill:#f9d4f9, font-weight:bold, font-size:14px; classDef planning fill:#d4f9f9, font-weight:bold, font-size:14px; classDef components fill:#f9d4d4, font-weight:bold, font-size:14px; classDef learning fill:#d4f9d4, font-weight:bold, font-size:14px; classDef skills fill:#d4d4f9, font-weight:bold, font-size:14px; classDef partial fill:#f9f9d4, font-weight:bold, font-size:14px; classDef search fill:#f9d4f9, font-weight:bold, font-size:14px; classDef generalization fill:#d4f9f9, font-weight:bold, font-size:14px; classDef insight fill:#f9d4d4, font-weight:bold, font-size:14px; classDef biases fill:#d4f9d4, font-weight:bold, font-size:14px; classDef progress fill:#d4d4f9, font-weight:bold, font-size:14px; A[Leslie Kaelbling
ICLR 2020] --> B[Understand computational mechanisms
for intelligent robots. 1] A --> C[Robot policies: programs mapping
history to action. 2] C --> D[Simple domains: simple policies.
Complex: adaptive policies. 3] C --> E[Policy representations: raw, value
functions, planners, hierarchies. 4] A --> F[Robots: engineered for narrow
tasks or learning/adaptation. 5] F --> G[Classical engineering for known tasks.
RL for complex tasks. 6] A --> H[Online planning: long horizons.
Hierarchical: complex tasks. 7] A --> I[Components: perception, planning,
execution, control. HPN demo. 8] A --> J[Experience costly. Priors/biases
needed for efficient learning. 9] J --> K[Approach: learn skills, perception,
models to expand HPN. 10] K --> L[Pre-image models integrate
new skills in planner. 11] L --> M[Gaussian process regression learns
skill success mode. 12] L --> N[Full level-set learning enables
flexibility, e.g. varied grasps. 13] K --> O[Learned operators compiled to
balance planning and skills. 14] O --> P[Lifted partial policies generalize,
e.g. put object in box. 15] J --> Q[Graph neural nets, relations
guide combinatorial planning. 16] A --> R[Right abstraction enables generalization
across different tasks. 17] A --> S[Human insight provides useful
biases for robot learning. 18] S --> T[Biases: planning algorithms, hierarchies,
objects, convolutions. 19] A --> U[Despite ML/RL progress, robots
need structure for real learning. 20] class A,B computational; class C,D policies; class D,E domains; class E representations; class F,G engineering; class H planning; class I components; class J,K,L,M,N,O learning; class L,M,N skills; class O,P partial; class Q search; class R generalization; class S,T insight; class T biases; class U progress;

Resume:

1.-The goal is to understand computational mechanisms needed for general-purpose intelligent robots that can handle variability in environments and tasks.

2.-Robot policies can be represented as a program Pi that maps action/observation history to next action, optimized for a domain distribution.

3.-Simple domains allow for simple policies, while complex/uncertain domains require more general/adaptive policies. Finding the optimal policy is technically challenging.

4.-Policies can be represented in various ways, e.g. as raw policies, value functions, planners with transition models, hierarchical abstraction.

5.-Robots can be engineered for narrow known tasks or require learning/adaptation for broader uncertain task distributions.

6.-Classical engineering works for known narrow tasks. RL in simulation can compile simulators into policies for moderately complex tasks.

7.-Online planning allows handling longer horizons by re-planning, e.g. AlphaZero. Hierarchical planning enables very complex tasks.

8.-Various components can enable complex robot behavior: perception, planning, hierarchical execution, low-level control. HPN system demonstrates this without learning.

9.-Experience is very costly when learning online in the real world. Priors/biases are needed to learn efficiently from limited samples.

10.-One approach is to learn new skills, perceptual capabilities, transition models to expand a system like HPN through learning.

11.-Learning pre-image models allows integrating a new primitive skill (e.g. pouring) as an operator in a task and motion planner.

12.-Gaussian process regression enables learning the "mode" in which a skill like pouring will succeed from few samples.

13.-Learning the full level-set of successful parameters enables flexibility, e.g. pouring from different grasps if nominal is infeasible.

14.-Compiling learned operators into partial policies can provide a balance between flexibility of planning and efficiency of reactive skills.

15.-Lifted partial policies enable generalization, e.g. learning a policy for putting any object into any box at an abstract level.

16.-Learning search control knowledge, e.g. with graph neural nets and relational features, can guide planning in large combinatorial spaces.

17.-Generalization across meaningfully different tasks is possible with the right state abstraction, e.g. clearing access to an object.

18.-Human insight is still needed to provide useful algorithmic and structural biases for robot learning systems, especially in complex domains.

19.-Key biases include: algorithms like planning, architectures like hierarchies, state abstractions like objects, learning structures like convolutions.

20.-Over past decades, major progress in ML/RL but autonomous robots require additional structure to learn efficiently in real costly environments.

Knowledge Vault built byDavid Vivancos 2024