Knowledge Vault 6 /22 - ICML 2017
Causal Learning
Bernhard Schölkopf
< Resume Image >

Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:

graph LR classDef main fill:#f9d9c9, font-weight:bold, font-size:14px classDef foundations fill:#d4f9d4, font-weight:bold, font-size:14px classDef causality fill:#d4d4f9, font-weight:bold, font-size:14px classDef applications fill:#f9f9d4, font-weight:bold, font-size:14px classDef future fill:#f9d4f9, font-weight:bold, font-size:14px Main[Causal Learning] Main --> A[ML Foundations] Main --> B[Causality Concepts] Main --> C[Causal Models and Applications] Main --> D[Challenges and Future Directions] A --> A1[ML successes: data, models,
computation, IID 1] A --> A2[IID assumption breaks with
interventions 2] A --> A3[Dependency implies common cause
principle 3] A --> A4[Observational data insufficient for
causal direction 4] A --> A5[Causal graphical models: DAG
with arrows 5] A --> A6[Observational distribution inherits causal
graph properties 6] B --> B1[Complex causality complicates conditional
independence testing 7] B --> B2[Causal direction identifiable by
mechanism footprints 8] B --> B3[Cause-mechanism independence: log fX
p x uncorrelated 9] B --> B4[Half-sibling regression removes astronomical
systematic noise 10] B --> B5[Regression recovers latent variable
under assumptions 11] B --> B6[Additive noise model identifies
causal direction 12] C --> C1[Gravitational wave detection: classify
strain anomalies 13] C --> C2[Fair ML: demographic parity,
equalized odds 14] C --> C3[Causal fairness: decisions via
resolving variables 15] C --> C4[Causal conditionals stable across
environments 16] C --> C5[Causal models enable shortest
data description 17] C --> C6[Multi-environment learning finds robust
causal components 18] D --> D1[Causal models between statistical
and DE 19] D --> D2[ML weak in transfer, interventions,
time, counterfactuals 20] D --> D3[Digital revolution focuses on
information 21] D --> D4[AI impact: benefits and
potential upheaval 22] D --> D5[Information understanding may be
incomplete 23] D --> D6[Dependencies from asymmetric causal
structures 24] class Main main class A,A1,A2,A3,A4,A5,A6 foundations class B,B1,B2,B3,B4,B5,B6 causality class C,C1,C2,C3,C4,C5,C6 applications class D,D1,D2,D3,D4,D5,D6 future

Resume:

1.- Machine learning has had spectacular successes in the last decade due to massive data, high-capacity models, computation power, and IID data.

2.- The IID assumption is not innocuous - recommending items to users constitutes an intervention that leaves the IID setting.

3.- Causality and correlation are connected - if two variables are dependent, there must be a third causing both (Reichenbach's common cause principle).

4.- Without assumptions, observational data cannot distinguish between X->Y, Y->X, and X<-Z->Y. A causal model contains more information than a statistical one.

5.- A causal graphical model represents variables as vertices in a DAG, with arrows for direct causation. Unexplained variables provide the randomness.

6.- An observational distribution inherits properties from the causal graph, allowing inferring a class of graphs by testing conditional independences in data.

7.- Conditional independence testing becomes hard for complex causal relationships. With only two variables, no conditional independences exist to test.

8.- The causal direction may be identifiable by examining footprints left by the causal mechanism in the observed distribution.

9.- Independence of cause and mechanism can be formalized: log f'(X) and p(x) are uncorrelated if X->Y but correlated if Y->X.

10.- Half-sibling regression was used to remove systematic noise in astronomical data by explaining each pixel using other pixels recording different stars.

11.- Under certain assumptions, regressing Y on X and subtracting the estimate recovers an unknown latent variable affecting X and Y up to expectation.

12.- The additive noise model (Y=f(X)+N) makes the causal direction identifiable because an uncorrelated noise is unlikely in the anti-causal direction.

13.- Gravitational wave detection data is very noisy. Classifying the strain from its past and future can highlight anomalies like real events.

14.- In fair ML, demographic parity requires the decision be independent of sensitive attributes. Equalized odds conditions on the true label.

15.- Fairness can be framed causally - the decision should only depend on sensitive attributes via resolving variables, not proxy variables.

16.- Causal conditionals are more likely to be stable across environments than anti-causal ones. Adversarial examples may arise from anti-causal learning.

17.- Jointly compressing datasets by fitting causal models may reveal invariant mechanisms. The true SCM should enable the shortest description.

18.- Learning a large causal model on multi-environment data could find robust components via competition between mechanisms specializing to the environments.

19.- A taxonomy places causal models between statistical and differential equation models - more powerful than statistical, more learnable than DE.

20.- Compared to animals, ML is weak at transfer, interventional generalization, utilizing time, and counterfactual reasoning. Causality may help.

21.- The first two industrial revolutions concerned energy. The current "digital revolution", which began with cybernetics, focuses on information.

22.- The industrial revolution had great benefits but also upheaval. Naively assuming all will be positive with AI is unwise.

23.- It took over a century after the industrial revolution began to deeply understand energy. We may not yet deeply understand information.

24.- Statistical information may just be an epiphenomenon, with dependencies actually due to underlying causal structures which can be asymmetric.

25.- Many collaborators and students contributed to the works presented. The speaker thanked them and the audience for their attention.

Knowledge Vault built byDavid Vivancos 2024