Alexei Efros ICLR 2021 - Invited Talk - Self-Supervision for Learning from the Bottom Up
1.-Self-supervised learning is exciting because it allows getting away from semantic categories, fixed datasets, and fixed objectives.

2.-Labels are expensive, but big companies can solve clearly defined tasks by hiring enough people to provide labels.

3.-Self-supervision enables moving from semantic categories based on shared properties to bottom-up associations and similarities between instances.

4.-Humans categorize based on bottom-up associations and prototypes (Rosch), not based on shared properties defining category membership (classical view).

5.-Early work tried to operationalize bottom-up visual categories by learning distances to separate similar and dissimilar instances.

6.-Ensemble of "one against all" classifiers performed as well as category-based classifier.

7.-SimCLR uses image augmentations to create a "pseudo-class" of an instance's variations, contrasted against other instances.

8.-Choice of data augmentations is a form of human supervision that has a big effect on self-supervised learning performance.

9.-Video can provide automatic data augmentation through temporal correspondences across frames, similar to how infants learn.

10.-Contrastive random walk learns features by walking through video frames, using cycle consistency to get back to starting patch.

11.-Dense contrastive random walks on patches centered at each pixel is a promising direction related to optical flow.

12.-Biological agents never see the same data twice - each sample is first a test, then becomes training for the future.

13.-Machine learning usually sees the same sample repeatedly, encouraging memorization. Data augmentation helps get away from this a bit.

14.-With self-supervision, data is free, so there's no reason to do multiple epochs - treat each sample once like biological agents.

15.-Test-time training adapts a model to a new test sample using self-supervised loss, to handle distribution shift.

16.-Online test-time training allows continuously adapting to a smoothly changing data distribution.

17.-Genetic algorithms just optimize a fixed objective - the magic of evolution is that it doesn't optimize any objective.

18.-Evolutionary objectives emerge through "arms races" - e.g. pressure to miniaturize calculators created the emergent objective of fitting in a pocket.

19.-Self-play is a symmetric "arms race" of an agent vs itself, but still has a specified objective. GANs are an asymmetric "arms race".

20.-Prediction can be an emergent meta-objective - in a complex world, one can always try to predict further. The world is the "adversary".

21.-Curiosity-driven exploration uses failure to predict as an emergent objective. Agent tries to predict consequences of actions and gets "curious" when wrong.

22.-With no external reward, just curiosity, emergent behaviors arise in video games, like Mario exploring and killing enemies.

23.-For curious agents playing pong, keeping the rally going emerges as more "interesting" than scoring points.

24.-Challenge is getting curious exploration to work for real-world robots. Curiosity works in video games because action space is small.

25.-Real world has much larger action spaces. Attention is needed to prioritize what to be curious about. Babies have a "curriculum" of curiosity.

26.-Combining multiple modalities like vision+sound or vision+touch is a good way to study multi-modal self-supervised learning from the bottom up.

27.-Curiosity and adversarial losses are "meta-objectives" that can adjust to the world and are hard to overfit, unlike fixed losses.

28.-We need to run self-supervised learning on real-world data to uncover the real challenges. Theories and formalisms will follow from well-posed problems.

29.-Evolution doesn't optimize for fitness - fitness emerges from evolution. Explicitly encoding an objective leads to shortcuts.

30.-Adversarial setups may help push the objective back and avoid shortcuts in emergent learning, but the fundamental "loss" remains an open question.

