Causal Learning

Bernhard Schölkopf

**Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:**

graph LR
classDef main fill:#f9d9c9, font-weight:bold, font-size:14px
classDef foundations fill:#d4f9d4, font-weight:bold, font-size:14px
classDef causality fill:#d4d4f9, font-weight:bold, font-size:14px
classDef applications fill:#f9f9d4, font-weight:bold, font-size:14px
classDef future fill:#f9d4f9, font-weight:bold, font-size:14px
Main[Causal Learning]
Main --> A[ML Foundations]
Main --> B[Causality Concepts]
Main --> C[Causal Models and Applications]
Main --> D[Challenges and Future Directions]
A --> A1[ML successes: data, models,

computation, IID 1] A --> A2[IID assumption breaks with

interventions 2] A --> A3[Dependency implies common cause

principle 3] A --> A4[Observational data insufficient for

causal direction 4] A --> A5[Causal graphical models: DAG

with arrows 5] A --> A6[Observational distribution inherits causal

graph properties 6] B --> B1[Complex causality complicates conditional

independence testing 7] B --> B2[Causal direction identifiable by

mechanism footprints 8] B --> B3[Cause-mechanism independence: log fX

p x uncorrelated 9] B --> B4[Half-sibling regression removes astronomical

systematic noise 10] B --> B5[Regression recovers latent variable

under assumptions 11] B --> B6[Additive noise model identifies

causal direction 12] C --> C1[Gravitational wave detection: classify

strain anomalies 13] C --> C2[Fair ML: demographic parity,

equalized odds 14] C --> C3[Causal fairness: decisions via

resolving variables 15] C --> C4[Causal conditionals stable across

environments 16] C --> C5[Causal models enable shortest

data description 17] C --> C6[Multi-environment learning finds robust

causal components 18] D --> D1[Causal models between statistical

and DE 19] D --> D2[ML weak in transfer, interventions,

time, counterfactuals 20] D --> D3[Digital revolution focuses on

information 21] D --> D4[AI impact: benefits and

potential upheaval 22] D --> D5[Information understanding may be

incomplete 23] D --> D6[Dependencies from asymmetric causal

structures 24] class Main main class A,A1,A2,A3,A4,A5,A6 foundations class B,B1,B2,B3,B4,B5,B6 causality class C,C1,C2,C3,C4,C5,C6 applications class D,D1,D2,D3,D4,D5,D6 future

computation, IID 1] A --> A2[IID assumption breaks with

interventions 2] A --> A3[Dependency implies common cause

principle 3] A --> A4[Observational data insufficient for

causal direction 4] A --> A5[Causal graphical models: DAG

with arrows 5] A --> A6[Observational distribution inherits causal

graph properties 6] B --> B1[Complex causality complicates conditional

independence testing 7] B --> B2[Causal direction identifiable by

mechanism footprints 8] B --> B3[Cause-mechanism independence: log fX

p x uncorrelated 9] B --> B4[Half-sibling regression removes astronomical

systematic noise 10] B --> B5[Regression recovers latent variable

under assumptions 11] B --> B6[Additive noise model identifies

causal direction 12] C --> C1[Gravitational wave detection: classify

strain anomalies 13] C --> C2[Fair ML: demographic parity,

equalized odds 14] C --> C3[Causal fairness: decisions via

resolving variables 15] C --> C4[Causal conditionals stable across

environments 16] C --> C5[Causal models enable shortest

data description 17] C --> C6[Multi-environment learning finds robust

causal components 18] D --> D1[Causal models between statistical

and DE 19] D --> D2[ML weak in transfer, interventions,

time, counterfactuals 20] D --> D3[Digital revolution focuses on

information 21] D --> D4[AI impact: benefits and

potential upheaval 22] D --> D5[Information understanding may be

incomplete 23] D --> D6[Dependencies from asymmetric causal

structures 24] class Main main class A,A1,A2,A3,A4,A5,A6 foundations class B,B1,B2,B3,B4,B5,B6 causality class C,C1,C2,C3,C4,C5,C6 applications class D,D1,D2,D3,D4,D5,D6 future

**Resume: **

**1.-** Machine learning has had spectacular successes in the last decade due to massive data, high-capacity models, computation power, and IID data.

**2.-** The IID assumption is not innocuous - recommending items to users constitutes an intervention that leaves the IID setting.

**3.-** Causality and correlation are connected - if two variables are dependent, there must be a third causing both (Reichenbach's common cause principle).

**4.-** Without assumptions, observational data cannot distinguish between X->Y, Y->X, and X<-Z->Y. A causal model contains more information than a statistical one.

**5.-** A causal graphical model represents variables as vertices in a DAG, with arrows for direct causation. Unexplained variables provide the randomness.

**6.-** An observational distribution inherits properties from the causal graph, allowing inferring a class of graphs by testing conditional independences in data.

**7.-** Conditional independence testing becomes hard for complex causal relationships. With only two variables, no conditional independences exist to test.

**8.-** The causal direction may be identifiable by examining footprints left by the causal mechanism in the observed distribution.

**9.-** Independence of cause and mechanism can be formalized: log f'(X) and p(x) are uncorrelated if X->Y but correlated if Y->X.

**10.-** Half-sibling regression was used to remove systematic noise in astronomical data by explaining each pixel using other pixels recording different stars.

**11.-** Under certain assumptions, regressing Y on X and subtracting the estimate recovers an unknown latent variable affecting X and Y up to expectation.

**12.-** The additive noise model (Y=f(X)+N) makes the causal direction identifiable because an uncorrelated noise is unlikely in the anti-causal direction.

**13.-** Gravitational wave detection data is very noisy. Classifying the strain from its past and future can highlight anomalies like real events.

**14.-** In fair ML, demographic parity requires the decision be independent of sensitive attributes. Equalized odds conditions on the true label.

**15.-** Fairness can be framed causally - the decision should only depend on sensitive attributes via resolving variables, not proxy variables.

**16.-** Causal conditionals are more likely to be stable across environments than anti-causal ones. Adversarial examples may arise from anti-causal learning.

**17.-** Jointly compressing datasets by fitting causal models may reveal invariant mechanisms. The true SCM should enable the shortest description.

**18.-** Learning a large causal model on multi-environment data could find robust components via competition between mechanisms specializing to the environments.

**19.-** A taxonomy places causal models between statistical and differential equation models - more powerful than statistical, more learnable than DE.

**20.-** Compared to animals, ML is weak at transfer, interventional generalization, utilizing time, and counterfactual reasoning. Causality may help.

**21.-** The first two industrial revolutions concerned energy. The current "digital revolution", which began with cybernetics, focuses on information.

**22.-** The industrial revolution had great benefits but also upheaval. Naively assuming all will be positive with AI is unwise.

**23.-** It took over a century after the industrial revolution began to deeply understand energy. We may not yet deeply understand information.

**24.-** Statistical information may just be an epiphenomenon, with dependencies actually due to underlying causal structures which can be asymmetric.

**25.-** Many collaborators and students contributed to the works presented. The speaker thanked them and the audience for their attention.

Knowledge Vault built byDavid Vivancos 2024