The End Of Knowledge - Vault 2 - ICLR (2014-2023)

graph LR classDef selfSupervised fill:#f9d4d4, font-weight:bold, font-size:14px; classDef curiosity fill:#d4f9d4, font-weight:bold, font-size:14px; classDef evolution fill:#d4d4f9, font-weight:bold, font-size:14px; classDef challenges fill:#f9f9d4, font-weight:bold, font-size:14px; classDef future fill:#f9d4f9, font-weight:bold, font-size:14px; A[Alexei Efros
ICLR 2021 ] --> B[Self-supervised learning: no categories,
datasets, objectives 1] B --> C[Big companies solve tasks using labels 2] B --> D[Self-supervision: bottom-up associations,
instance similarities 3] D --> E[Humans categorize via associations
and prototypes 4] D --> F[Early work: distances separate
similar, dissimilar instances 5] F --> G['One against all' performed
like category-based 6] D --> H[SimCLR: augmentations create
'pseudo-class' of variations 7] H --> I[Augmentations choice: human supervision
affects performance 8] D --> J[Video provides automatic augmentation
through correspondences 9] J --> K[Contrastive walks learn
through cycle consistency 10] J--> L[Dense pixel-centered walks:
promising optical flow-related 11] B --> M[Biological agents: samples are
tests, then training 12] B --> N[Machine learning repeats samples,
encouraging memorization 13] B --> O[Self-supervision: no repeated epochs,
biological-like 14] B --> P[Test-time training adapts models
to new samples 15] P --> Q[Online test-time training adapts
to changing distributions 16] A --> R[Genetic algorithms optimize fixed objectives 17] R --> S[Evolutionary objectives emerge
through 'arms races' 18] S --> T[Self-play, GANs: symmetric,
asymmetric 'arms races' 19] A --> U[Prediction: emergent meta-objective
in complex worlds 20] U --> V[Curiosity-driven exploration: prediction failure
as objective 21] V --> W[Curious agents exhibit emergent
video game behaviors 22] W --> X[Curious pong agents prefer
rallies over points 23] V --> Y[Challenge: scaling curious exploration
to real robots 24] Y --> Z[Real world: larger action spaces
require attention 25] A --> AA[Multi-modal self-supervision: vision+sound,
vision+touch 26] A --> AB[Curiosity, adversarial losses:
adaptive meta-objectives 27] A --> AC[Real-world data reveals self-supervised
learning challenges 28] A --> AD[Evolution doesn't optimize fitness
it emerges 29] A --> AE[Adversarial setups may prevent
emergent learning shortcuts 30] class B,C,D,H,I,J,K,L,M,N,O,P,Q,AA,AC selfSupervised; class U,V,W,X,Y,Z,AB curiosity; class R,S,T,AD,AE evolution; class E,F,G challenges;

Resume:

1.-Self-supervised learning is exciting because it allows getting away from semantic categories, fixed datasets, and fixed objectives.

2.-Labels are expensive, but big companies can solve clearly defined tasks by hiring enough people to provide labels.

3.-Self-supervision enables moving from semantic categories based on shared properties to bottom-up associations and similarities between instances.

4.-Humans categorize based on bottom-up associations and prototypes (Rosch), not based on shared properties defining category membership (classical view).

5.-Early work tried to operationalize bottom-up visual categories by learning distances to separate similar and dissimilar instances.

6.-Ensemble of "one against all" classifiers performed as well as category-based classifier.

7.-SimCLR uses image augmentations to create a "pseudo-class" of an instance's variations, contrasted against other instances.

8.-Choice of data augmentations is a form of human supervision that has a big effect on self-supervised learning performance.

9.-Video can provide automatic data augmentation through temporal correspondences across frames, similar to how infants learn.

10.-Contrastive random walk learns features by walking through video frames, using cycle consistency to get back to starting patch.

11.-Dense contrastive random walks on patches centered at each pixel is a promising direction related to optical flow.

12.-Biological agents never see the same data twice - each sample is first a test, then becomes training for the future.

13.-Machine learning usually sees the same sample repeatedly, encouraging memorization. Data augmentation helps get away from this a bit.

14.-With self-supervision, data is free, so there's no reason to do multiple epochs - treat each sample once like biological agents.

15.-Test-time training adapts a model to a new test sample using self-supervised loss, to handle distribution shift.

16.-Online test-time training allows continuously adapting to a smoothly changing data distribution.

17.-Genetic algorithms just optimize a fixed objective - the magic of evolution is that it doesn't optimize any objective.

18.-Evolutionary objectives emerge through "arms races" - e.g. pressure to miniaturize calculators created the emergent objective of fitting in a pocket.

19.-Self-play is a symmetric "arms race" of an agent vs itself, but still has a specified objective. GANs are an asymmetric "arms race".

20.-Prediction can be an emergent meta-objective - in a complex world, one can always try to predict further. The world is the "adversary".

21.-Curiosity-driven exploration uses failure to predict as an emergent objective. Agent tries to predict consequences of actions and gets "curious" when wrong.

22.-With no external reward, just curiosity, emergent behaviors arise in video games, like Mario exploring and killing enemies.

23.-For curious agents playing pong, keeping the rally going emerges as more "interesting" than scoring points.

24.-Challenge is getting curious exploration to work for real-world robots. Curiosity works in video games because action space is small.

25.-Real world has much larger action spaces. Attention is needed to prioritize what to be curious about. Babies have a "curriculum" of curiosity.

26.-Combining multiple modalities like vision+sound or vision+touch is a good way to study multi-modal self-supervised learning from the bottom up.

27.-Curiosity and adversarial losses are "meta-objectives" that can adjust to the world and are hard to overfit, unlike fixed losses.

28.-We need to run self-supervised learning on real-world data to uncover the real challenges. Theories and formalisms will follow from well-posed problems.

29.-Evolution doesn't optimize for fitness - fitness emerges from evolution. Explicitly encoding an objective leads to shortcuts.

30.-Adversarial setups may help push the objective back and avoid shortcuts in emergent learning, but the fundamental "loss" remains an open question.

Knowledge Vault built byDavid Vivancos 2024