The End Of Knowledge - Vault 2 - ICLR (2014-2023) - Yoshua Bengio & Yann LeCun ICLR 2020

graph LR classDef ML fill:#f9d4d4, font-weight:bold, font-size:14px; classDef SSL fill:#d4f9d4, font-weight:bold, font-size:14px; classDef challenges fill:#d4d4f9, font-weight:bold, font-size:14px; classDef energy fill:#f9f9d4, font-weight:bold, font-size:14px; classDef contrastive fill:#f9d4f9, font-weight:bold, font-size:14px; classDef regularized fill:#d4f9f9, font-weight:bold, font-size:14px; classDef S1S2 fill:#f9d4d4, font-weight:bold, font-size:14px; classDef causality fill:#d4f9d4, font-weight:bold, font-size:14px; A[Yoshua Bengio & Yann LeCun
ICLR 2020] --> B[Future ML/AI: self-supervised
learning, dependencies, blanks. 1] A --> C[SSL: quick learning,
little supervision, like babies. 2] A --> D[AI challenges: supervision,
reasoning, planning. 3] B --> E[SSL: predicting missing/future
info, multiple possibilities. 4] B --> F[Energy models: compatibility,
no probabilities needed. 5] F --> G[Train energy models
with contrastive methods. 6] F --> H[Estimating densities problematic,
creates narrow canyons. 7] G --> I[Contrastive functions: push
down data, up contrast. 8] B --> J[SSL successful in
NLP, not images. 9] J --> K[Contrastive embedding for
images computationally expensive. 10] G --> L[GANs as contrastive
energy-based methods. 11] B --> M[Regularized latent variables
limit info capacity. 12] M --> N[VAEs: regularized latent
energy models, add noise. 13] M --> O[Graph/temporal regularization for
good representations. 14] M --> P[Conditional regularized models
predict multi-modal futures. 15] B --> Q[SSL best for
AI common sense learning. 16] A --> R[System 1: fast,
intuitive. System 2: slow. 17] R --> S[Extend deep learning
to System 2 tasks. 18] A --> T[Semantic variables have
sparse graphical structure. 19] T --> U[Simple semantics-language
relationship. Reusable knowledge. 20] T --> V[Local changes in
semantic variable distribution. 21] S --> W[Systematic generalization by
recombining concepts. 22] S --> X[Combine deep learning
and symbolic AI advantages. 23] R --> Y[Conscious processing focuses
attention, broadcasts, stores. 24] A --> Z[Language: perceptual and
semantic knowledge. 25] Z --> AA['Consciousness prior': sparse
dependencies, strong predictions. 26] V --> AB[Localized changes enable
faster adaptation, meta-learning. 27] T --> AC[Learning speed uncovers
causal graph structure. 28] W --> AD[Recurrent independent mechanisms
improve generalization. 29] A --> AE[Core ideas: recombinable
knowledge, local changes. 30] class A,B,C,E,F,G,H,I,J,K,L,M,N,O,P,Q SSL; class D challenges; class R,S,W,X,Y,Z,AA S1S2; class T,U,V,AB,AC,AD,AE causality;

Resume:

1.-The future of machine learning and AI is self-supervised learning, which involves learning dependencies between variables and filling in blanks.

2.-Self-supervised learning may enable machines to learn quickly with little supervision or interaction, similar to how babies learn basic concepts.

3.-The main challenges in AI are reducing supervision requirements, learning to reason beyond fixed steps, and learning to plan complex actions.

4.-Self-supervised learning involves predicting missing or future information from known information. Predictions must allow for multiple possibilities.

5.-Energy-based models can handle uncertainty by measuring compatibility between observed and predicted variables without requiring probabilities.

6.-Energy-based models can be trained using contrastive methods that push energy down on data points and up elsewhere.

7.-Probabilistic methods estimating densities are problematic as they create narrow canyons in the energy function that aren't useful for inference.

8.-Contrastive objective functions push down energy of data points and up on contrasting points with some margin.

9.-Self-supervised learning methods like BERT have been very successful in NLP but not as much for images.

10.-Contrastive embedding methods for images are computationally expensive as there are many ways for images to be different.

11.-GANs can be interpreted as contrastive energy-based methods that shape the energy function.

12.-Regularized latent variable methods limit information capacity to regularize volume of low energy space, as in sparse coding.

13.-Variational autoencoders are regularized latent variable energy-based models that add noise to the latent code to limit information.

14.-Graph-based and temporal continuity regularization can yield good representations by exploiting similarity structure or temporal predictability.

15.-Conditional versions of regularized latent variable models enable learning to predict multi-modal futures, as in vehicle trajectory prediction.

16.-Self-supervised learning is the best current approach for common sense learning in AI. Scaling supervised/reinforcement learning is insufficient.

17.-System 1 tasks are fast, intuitive, implicit and where current deep learning excels. System 2 tasks are slow, sequential, explicit.

18.-Extending deep learning to system 2 tasks can enable reasoning, planning, and systematic generalization through recombining semantic concepts.

19.-Joint distribution of semantic variables has sparse graphical model structure. Variables often relate to causality, agents, intentions, actions, objects.

20.-Simple relationship exists between semantic variables and language. Pieces of knowledge as rules can be reused across instances.

21.-Changes in distribution of semantic variables are local, e.g. due to causal interventions, with rest of model unchanged.

22.-Systematic generalization involves dynamically recombining concepts to explain novel observations, improving over current deep learning's lack of distribution shift robustness.

23.-Goal is to combine advantages of deep learning (grounded representations, distributed symbols, uncertainty handling) with symbolic AI's systematic generalization.

24.-Sequential conscious processing focuses attention on subsets of information which are broadcast and stored to condition subsequent processing.

25.-Language understanding requires combining system 1 perceptual knowledge with system 2 semantic knowledge in a grounded way.

26.-"Consciousness prior" posits sparse dependencies between semantic variables, enabling strong predictions from few variables as in language.

27.-Under localized distributional change hypothesis, changes in abstract semantic space are localized, enabling faster adaptation and meta-learning.

28.-Empirical results show learning speed can uncover causal graph structure. Parametrizing graphs by edges enables causal discovery.

29.-Recurrent independent mechanisms architecture with attention between modules improves out-of-distribution generalization by dynamically recombining stable modules.

30.-Core ideas are decomposing knowledge into recombinable pieces with sparse dependencies, and local changes in distribution enabling fast learning/inference.

Knowledge Vault built byDavid Vivancos 2024