Knowledge Vault 2/42 - ICLR 2014-2023
Bernhard Schoelkopf ICLR 2018 - Invited Talk - Learning Causal Mechanisms
<Resume Image >

Concept Graph & Resume using Claude 3 Opus | Chat GPT4 | Gemini Adv | Llama 3:

graph LR classDef scholkopf fill:#f9d4d4, font-weight:bold, font-size:14px; classDef causation fill:#d4f9d4, font-weight:bold, font-size:14px; classDef statistics fill:#d4d4f9, font-weight:bold, font-size:14px; classDef machine fill:#f9f9d4, font-weight:bold, font-size:14px; classDef physics fill:#f9d4f9, font-weight:bold, font-size:14px; A[Bernhard Schoelkopf
ICLR 2018] --> B[Schölkopf: SVMs, kernel revolution. 1] A --> C[Dependence vs. causation: historical issue. 2] C --> D[Storks, births correlation ? causation. 3] C --> E[Dependence implies common cause. 4] C --> F[Observational data can't distinguish
cause, effect. 5] C --> G[Causal models > statistical models. 6] C --> H[Causal graph: arrows represent
direct causation. 7] H --> I[Distribution changes from mechanisms
or noise. 8] H --> J[Wrong factorization requires
changing factors. 9] H --> K[Causal decomposition aids learning
across tasks. 10] H --> L[Independence: covariance of
input, mechanism. 11] L --> M[Causal direction implies dependence
in anticausal. 12] C --> N[Causal captures physical,
statistical is epiphenomenal. 13] N --> O[Causal Markov Condition. 14] N --> P[Common cause links dependence
to graph. 15] N --> Q[Kolmogorov complexity formalizes
graphical results. 16] A --> R[Causal direction matters in ML. 17] R --> S[Semisupervised learning:
causal vs anticausal. 18] R --> T[Independent mechanisms found
new exoplanets. 19] R --> U[Fairness as causal inference problem. 20] A --> V[Neural architecture inverts
causal mechanisms. 21] V --> W[Independence enables specialization
during training. 22] V --> X[Goal: structural causal models
for transfer. 23] V --> Y[Representing interventional distributions
is open question. 24] A --> Z[Causality relates to 'thinking'. 25] A --> AA[Industrial revolutions: steam, information. 26] AA --> AB[Industrial information processing
requires AI. 27] AA --> AC[Information may be conserved
like energy. 28] AC --> AD[Current AI uses 'crude'
information processing. 29] A --> AE[Open problems in causality,
representation learning. 30] class A,B scholkopf; class C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q causation; class R,S,T,U machine; class V,W,X,Y,Z machine; class AA,AB,AC,AD physics; class AE causation;

Resume:

1.-Bernhard Schölkopf is known for developing support vector machines and leading the kernel revolution in the early 2000s before deep learning.

2.-Dependence versus causation is a big historical issue in philosophy of science and science in general.

3.-There is a strong correlation between the number of storks and human birth rates in Germany, but correlation doesn't imply causation.

4.-If two observables X and Y are statistically dependent, there exists a third variable Z that causally influences both of them.

5.-Without additional assumptions, we cannot distinguish cause from effect based on just observational data of two variables.

6.-A causal model contains genuinely more information than a statistical model. Causal models were further developed by Judea Pearl and others.

7.-In a causal graph, arrows represent direct causation. Each node has a function giving its value based on its parents.

8.-Every change in an observed distribution must come from a change in the causal conditionals/mechanisms or the noise variables.

9.-Factorizing a distribution according to the wrong causal graph implies changing one factor requires changing others to maintain overall distribution.

10.-Causal decomposition into invariant conditionals makes it easier to learn from different tasks/datasets, explaining why modeling phonemes helps model acoustics.

11.-Statistical independence of cause and mechanism can be formalized as vanishing covariance between input density and log derivative of mechanism.

12.-Provable asymmetry: independence in causal direction implies dependence in anticausal direction, allowing inference of cause vs effect from data.

13.-Causal structure captures physical mechanisms that generate statistical independence. Statistical structure is an epiphenomenon of underlying causal model.

14.-Causal model implies Causal Markov Condition - a node is conditionally independent of non-descendants given its parents in the graph.

15.-Reichenbach's common cause principle links statistical dependence to causal graph. But statistical independence is not fundamental, causal independence is.

16.-Kolmogorov complexity formalizes independence without probability, proving graphical model results. Implies thermodynamic arrow of time from causal model.

17.-Causal direction makes a difference in machine learning - generative direction shows independence between layers, discriminative shows increasing dependence.

18.-Semisupervised learning impossible for causal problems, potentially helpful for anticausal due to dependence of p(x) and p(y|x). Matches benchmarks.

19.-Removing confounding by exploiting independent mechanisms and half-sibling structure enabled discovering new exoplanets in Kepler telescope data.

20.-Enforcing fairness can be framed as causal inference problem. Technique developed using causal methods.

21.-Neural architecture learns to invert independent causal mechanisms from mixed data via competition between experts and discriminator feedback.

22.-Independence of causal mechanisms enables specialization of experts to mechanisms during competitive training. Generalizes to novel input classes.

23.-Goal is learning structural causal models that enable task transfer via independent, reusable components. Related to disentanglement.

24.-Much progress in learning representations of i.i.d. data, but representing interventional distributions of causal models is an open question.

25.-Representing causal models for reasoning and planning has to do with "thinking" - acting in imagined spaces per Konrad Lorenz.

26.-First industrial revolution driven by steam engine (energy). Current "revolution" started mid-20th century, driven by information (cybernetics).

27.-Information processing at industrial scale requires computers. Intelligent information processing may require AI and machine learning.

28.-Information may be a conserved quantity in physics like energy. We can convert and process it but not create it.

29.-Current AI success based on "crude" information processing. Deeper understanding may come from causality - statistical information is an epiphenomenon.

30.-Open problems remain in understanding causality and time, representation learning for causal models. Much more research is needed.

Knowledge Vault built byDavid Vivancos 2024