Bernhard Schoelkopf ICLR 2018 - Invited Talk - Learning Causal Mechanisms

**Concept Graph & Resume using Claude 3 Opus | Chat GPT4 | Gemini Adv | Llama 3:**

graph LR
classDef scholkopf fill:#f9d4d4, font-weight:bold, font-size:14px;
classDef causation fill:#d4f9d4, font-weight:bold, font-size:14px;
classDef statistics fill:#d4d4f9, font-weight:bold, font-size:14px;
classDef machine fill:#f9f9d4, font-weight:bold, font-size:14px;
classDef physics fill:#f9d4f9, font-weight:bold, font-size:14px;
A[Bernhard Schoelkopf

ICLR 2018] --> B[Schölkopf: SVMs, kernel revolution. 1] A --> C[Dependence vs. causation: historical issue. 2] C --> D[Storks, births correlation ? causation. 3] C --> E[Dependence implies common cause. 4] C --> F[Observational data can't distinguish

cause, effect. 5] C --> G[Causal models > statistical models. 6] C --> H[Causal graph: arrows represent

direct causation. 7] H --> I[Distribution changes from mechanisms

or noise. 8] H --> J[Wrong factorization requires

changing factors. 9] H --> K[Causal decomposition aids learning

across tasks. 10] H --> L[Independence: covariance of

input, mechanism. 11] L --> M[Causal direction implies dependence

in anticausal. 12] C --> N[Causal captures physical,

statistical is epiphenomenal. 13] N --> O[Causal Markov Condition. 14] N --> P[Common cause links dependence

to graph. 15] N --> Q[Kolmogorov complexity formalizes

graphical results. 16] A --> R[Causal direction matters in ML. 17] R --> S[Semisupervised learning:

causal vs anticausal. 18] R --> T[Independent mechanisms found

new exoplanets. 19] R --> U[Fairness as causal inference problem. 20] A --> V[Neural architecture inverts

causal mechanisms. 21] V --> W[Independence enables specialization

during training. 22] V --> X[Goal: structural causal models

for transfer. 23] V --> Y[Representing interventional distributions

is open question. 24] A --> Z[Causality relates to 'thinking'. 25] A --> AA[Industrial revolutions: steam, information. 26] AA --> AB[Industrial information processing

requires AI. 27] AA --> AC[Information may be conserved

like energy. 28] AC --> AD[Current AI uses 'crude'

information processing. 29] A --> AE[Open problems in causality,

representation learning. 30] class A,B scholkopf; class C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q causation; class R,S,T,U machine; class V,W,X,Y,Z machine; class AA,AB,AC,AD physics; class AE causation;

ICLR 2018] --> B[Schölkopf: SVMs, kernel revolution. 1] A --> C[Dependence vs. causation: historical issue. 2] C --> D[Storks, births correlation ? causation. 3] C --> E[Dependence implies common cause. 4] C --> F[Observational data can't distinguish

cause, effect. 5] C --> G[Causal models > statistical models. 6] C --> H[Causal graph: arrows represent

direct causation. 7] H --> I[Distribution changes from mechanisms

or noise. 8] H --> J[Wrong factorization requires

changing factors. 9] H --> K[Causal decomposition aids learning

across tasks. 10] H --> L[Independence: covariance of

input, mechanism. 11] L --> M[Causal direction implies dependence

in anticausal. 12] C --> N[Causal captures physical,

statistical is epiphenomenal. 13] N --> O[Causal Markov Condition. 14] N --> P[Common cause links dependence

to graph. 15] N --> Q[Kolmogorov complexity formalizes

graphical results. 16] A --> R[Causal direction matters in ML. 17] R --> S[Semisupervised learning:

causal vs anticausal. 18] R --> T[Independent mechanisms found

new exoplanets. 19] R --> U[Fairness as causal inference problem. 20] A --> V[Neural architecture inverts

causal mechanisms. 21] V --> W[Independence enables specialization

during training. 22] V --> X[Goal: structural causal models

for transfer. 23] V --> Y[Representing interventional distributions

is open question. 24] A --> Z[Causality relates to 'thinking'. 25] A --> AA[Industrial revolutions: steam, information. 26] AA --> AB[Industrial information processing

requires AI. 27] AA --> AC[Information may be conserved

like energy. 28] AC --> AD[Current AI uses 'crude'

information processing. 29] A --> AE[Open problems in causality,

representation learning. 30] class A,B scholkopf; class C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q causation; class R,S,T,U machine; class V,W,X,Y,Z machine; class AA,AB,AC,AD physics; class AE causation;

**Resume: **

**1.-**Bernhard Schölkopf is known for developing support vector machines and leading the kernel revolution in the early 2000s before deep learning.

**2.-**Dependence versus causation is a big historical issue in philosophy of science and science in general.

**3.-**There is a strong correlation between the number of storks and human birth rates in Germany, but correlation doesn't imply causation.

**4.-**If two observables X and Y are statistically dependent, there exists a third variable Z that causally influences both of them.

**5.-**Without additional assumptions, we cannot distinguish cause from effect based on just observational data of two variables.

**6.-**A causal model contains genuinely more information than a statistical model. Causal models were further developed by Judea Pearl and others.

**7.-**In a causal graph, arrows represent direct causation. Each node has a function giving its value based on its parents.

**8.-**Every change in an observed distribution must come from a change in the causal conditionals/mechanisms or the noise variables.

**9.-**Factorizing a distribution according to the wrong causal graph implies changing one factor requires changing others to maintain overall distribution.

**10.-**Causal decomposition into invariant conditionals makes it easier to learn from different tasks/datasets, explaining why modeling phonemes helps model acoustics.

**11.-**Statistical independence of cause and mechanism can be formalized as vanishing covariance between input density and log derivative of mechanism.

**12.-**Provable asymmetry: independence in causal direction implies dependence in anticausal direction, allowing inference of cause vs effect from data.

**13.-**Causal structure captures physical mechanisms that generate statistical independence. Statistical structure is an epiphenomenon of underlying causal model.

**14.-**Causal model implies Causal Markov Condition - a node is conditionally independent of non-descendants given its parents in the graph.

**15.-**Reichenbach's common cause principle links statistical dependence to causal graph. But statistical independence is not fundamental, causal independence is.

**16.-**Kolmogorov complexity formalizes independence without probability, proving graphical model results. Implies thermodynamic arrow of time from causal model.

**17.-**Causal direction makes a difference in machine learning - generative direction shows independence between layers, discriminative shows increasing dependence.

**18.-**Semisupervised learning impossible for causal problems, potentially helpful for anticausal due to dependence of p(x) and p(y|x). Matches benchmarks.

**19.-**Removing confounding by exploiting independent mechanisms and half-sibling structure enabled discovering new exoplanets in Kepler telescope data.

**20.-**Enforcing fairness can be framed as causal inference problem. Technique developed using causal methods.

**21.-**Neural architecture learns to invert independent causal mechanisms from mixed data via competition between experts and discriminator feedback.

**22.-**Independence of causal mechanisms enables specialization of experts to mechanisms during competitive training. Generalizes to novel input classes.

**23.-**Goal is learning structural causal models that enable task transfer via independent, reusable components. Related to disentanglement.

**24.-**Much progress in learning representations of i.i.d. data, but representing interventional distributions of causal models is an open question.

**25.-**Representing causal models for reasoning and planning has to do with "thinking" - acting in imagined spaces per Konrad Lorenz.

**26.-**First industrial revolution driven by steam engine (energy). Current "revolution" started mid-20th century, driven by information (cybernetics).

**27.-**Information processing at industrial scale requires computers. Intelligent information processing may require AI and machine learning.

**28.-**Information may be a conserved quantity in physics like energy. We can convert and process it but not create it.

**29.-**Current AI success based on "crude" information processing. Deeper understanding may come from causality - statistical information is an epiphenomenon.

**30.-**Open problems remain in understanding causality and time, representation learning for causal models. Much more research is needed.

Knowledge Vault built byDavid Vivancos 2024