The End Of Knowledge - Vault 2 - ICLR (2014-2023)

graph LR classDef learning fill:#f9d4d4, font-weight:bold, font-size:14px; classDef statistical fill:#d4f9d4, font-weight:bold, font-size:14px; classDef environments fill:#d4d4f9, font-weight:bold, font-size:14px; classDef invariance fill:#f9f9d4, font-weight:bold, font-size:14px; classDef methods fill:#f9d4f9, font-weight:bold, font-size:14px; A[Leon Bottou
ICLR 2019] --> B[Learning systems outperform
heuristics with data 1] A --> C[Statistical algorithms optimize,
may not generalize 2] A --> D[Nature's data from different
biased environments 3] D --> E[Robust learning minimizes
error across environments 4] D --> F[Extrapolation to new
environments needed 5] A --> G[Invariance related to causation 6] G --> H[Learn environment-independent
representation 7] G --> I[Invariant predictor recovers
target's direct causes 8] G --> J[Adversarial domain adaptation
learns invariant representation 9] A --> K[Multiple environments define
domain for extrapolation 10] K --> L[Linear regression: S matrix
for error minimization 11] K --> M[High-rank invariant solutions
via cosine direction 12] K --> N[Frozen dummy layer
penalizes gradient 13] K --> O['Colored MNIST' overcomes
unstable color reliance 14] K --> P[Invariance regularizer non-convex,
challenging to scale 15] A --> Q[Realizable problems: invariance
over training supports 16] A --> R[Non-realizable: find invariant
representation and predictor 17] A --> S[Statistical proxy, environment
info improves stability 18] A --> T[Invariance enables extrapolation,
not just interpolation 19] A --> U[Invariance informs causal
inference with interventions 20] A --> V[Learn invariant representation
to enforce invariance 21] A --> W[Realizable problems: efficiently
find perfect predictor 22] A --> X[Meta-learning learns transferable
representations 23] A --> Y[Large models may exhibit
invariance with data, compute 24] A --> Z[Learn stable properties across
environments to extrapolate 25] class B,Q,R,W learning; class C,S statistical; class D,E,F,K,T environments; class G,H,I,J,U,V,Z invariance; class L,M,N,O,P,X,Y methods;

Resume:

1.-Machine learning is useful when formal problem specifications are lacking. With enough data, learning systems can outperform heuristic programs.

2.-Statistical algorithms optimize for the training data, but may miss the point and not generalize well due to spurious correlations.

3.-Nature doesn't shuffle data like we do in machine learning. Data comes from different environments with different biases.

4.-Robust learning aims to minimize the maximum error across environments. This interpolates but does not extrapolate beyond convex combinations of environments.

5.-In some applications, extrapolation to new environments is needed, not just interpolation between training environments. Search engines are one example.

6.-Invariance is related to causation. To predict interventions, you need the intervention properties and what remains invariant.

7.-The goal is to learn a representation in which an invariant predictor exists across environments, ignoring spurious correlations.

8.-Peters et al. 2016 considered interventions on known variables in a causal graph. The invariant predictor recovers the target's direct causes.

9.-Adversarial domain adaptation learns an environment-independent representation, but the fairness and invariance perspectives have key differences regarding dependence on the target.

10.-The robust approach defines an a priori family of environments. Using multiple environments to define the domain enables extrapolation via invariance.

11.-For linear regression, the matrix S is sought such that a vector v simultaneously minimizes error in all environments. Solutions exist when gradients are linearly dependent.

12.-High-rank invariant solutions can be found by solving along the cosine direction between weight vector w and the space spanned by cost gradients.

13.-Inserting a frozen dummy layer and penalizing its gradient achieves invariance without linear assumptions. This extends to neural networks.

14.-A toy "Colored MNIST" example shows how relying on unstable features like color can be overcome by penalizing cross-environment variance.

15.-The invariance regularizer is highly non-convex. Tractability and scaling remain challenging. Realizable problems (where a perfect invariant predictor exists) differ from unrealizable ones.

16.-In realizable supervised learning, asymptotic invariance holds over the union of supports of the training environments. Large datasets are needed.

17.-In non-realizable settings, the challenge is finding an invariant representation and predictor to enable extrapolation. In realizable settings, it's about data efficiency.

18.-Machine learning uses a statistical proxy and doesn't shuffle data like nature does. Utilizing environment information could improve stability.

19.-Invariance across environments provides extrapolation, not just interpolation. This challenges the notion that extrapolation fails in high dimensions.

20.-Invariance is related to causation. Stable properties inform causal inference when combined with knowledge of interventions.

21.-Where invariance doesn't naturally hold, learning an invariant representation can enforce it, with interesting mathematical properties.

22.-Realizable supervised problems, where a perfect invariant predictor exists, pose different challenges around efficiently finding the predictor, rather than its existence.

23.-Meta-learning aims to learn transferable representations, while invariance focuses on mathematically characterizing stable properties to enable extrapolation and causal inference.

24.-With enough data and compute, large models may exhibit invariance, but an explicit invariance approach provides clearer understanding and guarantees.

25.-The key ideas are: learn stable properties across environments to enable extrapolation, relate invariance to causation, and tailor methods to realizable vs non-realizable regimes.

Knowledge Vault built byDavid Vivancos 2024