The End Of Knowledge - Vault 2 - ICLR (2014-2023)

graph LR classDef sutton fill:#f9d4d4, font-weight:bold, font-size:14px; classDef representation fill:#d4f9d4, font-weight:bold, font-size:14px; classDef jeff fill:#d4d4f9, font-weight:bold, font-size:14px; classDef learning fill:#f9f9d4, font-weight:bold, font-size:14px; classDef future fill:#f9d4f9, font-weight:bold, font-size:14px; A[Rich Sutton ICLR 2014] --> B[Sutton: renowned machine learning,
reinforcement expert. 1] A --> C[Representation learning: key AI/ML problem. 2] C --> D[Sutton: representation enables
faster subsequent learning. 3] C --> E[Other representation benefits:
expressiveness, generalization, interpretability. 4] C --> F[Audience divided on represenwtation's
key benefit. 5] C --> G[Representation requires slow initial,
fast later learning. 6] G --> H[Non-stationary, continual learning
needed for representation. 7] A --> I[JEFF challenge tests
representation for fast learning. 8] I --> J[JEFF: online regression,
two-layer target network. 9] J --> K[Hidden features enable
fast output learning. 10] I --> L[JEFF: simple, direct fast learning test. 11] I --> M[Sutton: feature search beats
random baseline. 12] M --> N[Feature search + gradient descent
outperforms either. 13] I --> O[Backprop struggles with
non-stationary problems, interference. 14] I --> P[Adaptive per-feature rates
may preserve useful features. 15] C --> Q[Representation learning strayed
from enabling fast learning. 16] Q --> R[Online continual learning required,
JEFF well-controlled. 17] I --> S[Sutton: preliminary JEFF results,
recommends pursuit. 18] I --> T[Sequential learning can lead
to faster learning. 19] I --> U[JEFF non-stationarity rate could be varied. 20] U --> V[Sutton prefers sudden changes,
considers gradual. 21] I --> W[Real life: repetitive
but changing problems. 22] I --> X[Sutton prefers uninterrupted continual change,
no signals. 23] I --> Y[Shallow JEFF first, then deeper
hierarchical versions. 24] Y --> Z[Hierarchical feature learning
important future direction. 25] I --> AA[Frequently changing features
should get more resources. 26] C --> AB[Fast learning features: new,
important, unstudied problem. 27] I --> AC[Sutton proposes JEFF
without complete results. 28] I --> AD[JEFF could have varying
feature change rates. 29] AD --> AE[Learning allocation based on
feature's rate of change. 30] class A,B sutton; class C,D,E,F,G,H,Q,AB representation; class I,J,K,L,M,N,O,P,S,T,U,V,W,X,Y,AA,AC,AD,AE jeff; class Z future;

Resume:

1.-Richard Sutton is a famous contributor to machine learning, especially reinforcement learning. He wrote the well-known book "Reinforcement Learning and Interaction".

2.-Sutton believes representation learning is a key problem in AI/ML that is finally getting proper attention and hard work.

3.-Sutton wants to convince the audience that the key benefit of representation learning is enabling faster subsequent learning.

4.-Other potential benefits of representation learning include greater expressive power, better generalization, and producing intuitively pleasing representations.

5.-A show of hands reveals mixed opinions on the key benefit - some agree it's faster learning, others favor expressive power or generalization.

6.-Sutton argues representation learning requires a slow initial learning period in order to subsequently enable fast learning on new problems.

7.-This implies representation learning requires non-stationary, continual learning rather than one-time batch learning in order to demonstrate fast later learning.

8.-Sutton proposes a challenge problem called "JEFF" (generic online feature finding) to directly test ability to learn representations that enable fast learning.

9.-JEFF is an online regression problem with a two-layer target network where the goal is to find the hidden unit features.

10.-The hidden unit features are randomly generated when each JEFF instance is created. Finding them enables fast learning of the changing output.

11.-JEFF avoids test set leakage, has no role for unsupervised learning, is simple to implement, and directly tests fast learning ability.

12.-Sutton presents results on JEFF demonstrating the benefits of searching for good features vs a fixed random feature baseline.

13.-Combining feature search with gradient descent performs better than either alone, showing they both contribute to efficient feature finding.

14.-On non-stationary problems like a variant of MNIST with rotating labels, algorithms like backprop tend to do poorly and suffer catastrophic interference.

15.-A key to enabling fast learning and avoiding catastrophic interference appears to be adaptive per-feature learning rates that can preserve useful features.

16.-Sutton argues the field of representation learning has strayed from the original goal of enabling fast learning, but this should be the focus.

17.-Achieving this requires moving to online continual learning settings. JEFF provides a well-controlled way to study this without methodological issues.

18.-Sutton has preliminary results on parts of JEFF but not yet on the full non-stationary feature-finding problem. He recommends pursuing this.

19.-An audience member notes their own work found sequential learning can lead to faster learning, emergent from consolidation during simulated sleep.-

20.-Sutton agrees the rate of non-stationarity in JEFF could be varied, such as more slowly drifting changes rather than sudden shifts.

21.-Sutton likes the "prude directness" of sudden changes requiring fast adaptation, but agrees gradual changes are also worth considering.

22.-When asked about non-synthetic tasks with the desired properties, Sutton argues real life is full of repetitive but changing learning problems.

23.-Sutton resists the idea of explicitly signaling task changes to the learner, preferring the elegance of uninterrupted continual change.

24.-The shallow, two-layer formulation of JEFF is a necessary first step before considering deeper, hierarchical versions with features built from features.

25.-Sutton acknowledges JEFF as initially proposed doesn't involve hierarchical feature learning, but sees that as an important future direction to pursue.

26.-When features change at different rates, the representation learner should devote more learning resources to features that change more often.

27.-Sutton views finding features that enable fast learning as a new and unstudied problem of key importance that the field should pursue.

28.-Sutton apologizes for proposing JEFF without yet having complete results, but believes it is an important new research direction.

29.-An audience member suggests JEFF could be extended with a range of rates of change in different features.

30.-Sutton agrees, noting this occurs in the step-size adaptation results, and that learning should be allocated based on feature's rate of change.

Knowledge Vault built byDavid Vivancos 2024