The End Of Knowledge - Vault 6/32 - CVPR - 2018 - Intelligence per Kilowatthour

graph LR classDef intro fill:#f9d4d4, font-weight:bold, font-size:14px classDef physics fill:#d4f9d4, font-weight:bold, font-size:14px classDef energy fill:#d4d4f9, font-weight:bold, font-size:14px classDef fairness fill:#f9f9d4, font-weight:bold, font-size:14px classDef future fill:#f9d4f9, font-weight:bold, font-size:14px Main[Intelligence per Kilowatthour] Main --> A[Max Welling: Intelligence
and Energy Consumption 1] A --> B[Reduce energy usage
in AI 2] A --> C[Analogy: Industrial Revolution
and Information Age 3] Main --> D[Physics and Information] D --> E[Information fundamental to
physics: it from bit 4] D --> F[Black hole entropy,
holographic principle 5] D --> G[Gravity as entropic
force Verlinde 6] D --> H[Maxwells demon, Landauers
principle 7] Main --> I[Information Theory and AI] I --> J[Jaynes: entropy as
subjective ignorance 8] I --> K[Rissanen: minimum description
length 9] I --> L[Variational Bayes: free
energy equation 10] I --> M[Use model entropy
for neural network efficiency 11] Main --> N[Energy Efficiency in AI] N --> O[MCMC and variational
Bayes for inference 12] N --> P[Stochastic gradient Langevin
dynamics for MCMC 13] N --> Q[Local reparameterization for
activation uncertainty 14] N --> R[Deep learning models
growing unsustainably large 15] R --> S[AI value must
exceed energy cost 16] R --> T[Measure intelligence per
kilowatt-hour 17] Main --> U[AI Compression Techniques] U --> V[Bayesian compression removes
uncertain parameters 18] U --> W[Concrete distribution for
binary mask learning 19] U --> X[Compression benefits: regularization,
privacy, robustness 20] U --> Y[Spiking neural networks
for energy efficiency 21] Main --> Z[Fairness in Machine Learning] Z --> AA[Delayed Impact of
Fair Machine Learning 23] AA --> AB[Fairness impact on
protected groups examined 24] AA --> AC[Scores correlate with
outcomes, thresholding scores 25] AA --> AD[Delayed impact: mean
score change 26] AA --> AE[Fairness criteria impact
groups differently 27] AA --> AF[FICO data: fairness
criteria lead to differences 28] Main --> AG[Future Directions] AG --> AH[Universe as huge
computer, simulation hypothesis 22] AG --> AI[Richer decision spaces,
alternative welfare measures 29] AG --> AJ[Care needed in
application-specific impacts 30] class A,B,C intro class D,E,F,G,H physics class I,J,K,L,M,N,O,P,Q,R,S,T energy class U,V,W,X,Y energy class Z,AA,AB,AC,AD,AE,AF fairness class AG,AH,AI,AJ future

Resume:

1.- Introduction by Jennifer Chayes of Max Welling, research chair at University of Amsterdam, VP of Qualcomm, and previous NIPS/ICML roles.

2.- Talk titled "Intelligence and Energy Consumption" exploring ways to reduce energy usage in AI.

3.- Analogy between Industrial Revolution (energy/physical work) and 1940s information age (data/efficiency).

4.- John Wheeler stated information is fundamental to physics - "it from bit".

5.- Black hole entropy proportional to event horizon area; holographic principle encodes physics information on universe's surface.

6.- Verlinde argues gravity is an entropic force, like a stretched molecule curling up due to thermal fluctuations.

7.- Maxwell's demon thought experiment on using information to violate 2nd law of thermodynamics, resolved by Landauer's principle.

8.- Jaynes showed entropy reflects subjective ignorance in modeling, not just physical property. Leads to Bayesian perspective.

9.- Rissanen developed minimum description length - balancing model complexity and data encoding. Extended by Hinton.

10.- Variational Bayes provides explicit free energy equation for models with energy and entropy terms. Reparameterization trick enables gradients.

11.- Goal: use model entropy to run neural networks more efficiently, like the brain. Closing free energy cycle.

12.- MCMC and variational Bayes are two approaches for approximate Bayesian inference, trading off bias and variance.

13.- Stochastic gradient Langevin dynamics enables MCMC with minibatches for large datasets. Reparameterization for variational Bayes.

14.- Local reparameterization trick converts parameter uncertainty to activation uncertainty. Used for compression.

15.- Growing size of deep learning models - 100 trillion parameters (brain-sized) projected by 2025. Unsustainable energy-wise.

16.- AI value must exceed energy cost to run. Edge devices have additional energy constraints vs cloud.

17.- Measure AI success by intelligence per kilowatt-hour, not just accuracy. Brain is ~100x more efficient.

18.- Bayesian compression removes uncertain parameters/activations. Empirical results show large compression with minimal accuracy loss.

19.- Concrete distribution enables learning binary masks for model compression/pruning during training.

20.- Compression doesn't necessarily improve interpretability. Helps regularization, confidence estimation, privacy, adversarial robustness.

21.- Spiking neural networks inspired by event cameras to reduce computation when inputs are static. Achieves energy efficiency.

22.- Universe may be a huge computer according to Wheeler, holographic principle. Some believe we live in a simulation.

23.- Lydia Liu presents paper on "Delayed Impact of Fair Machine Learning" with co-authors.

24.- Increasing fairness papers, but impact of criteria on protected groups often left to intuition. Paper examines this.

25.- Scores (e.g. credit) assumed to correlate with outcome. Lenders maximize utility by thresholding scores. Fairness changes thresholds.

26.- Delayed impact defined as mean score change. Characterized as concave curve vs acceptance rate.

27.- Fairness criteria (demographic parity, equal opportunity) impact groups differently, sometimes causing harm. Depends on score distributions.

28.- Experiments on FICO data show fairness criteria lead to very different outcomes for minority group.

29.- Future work: richer decision spaces beyond binary, alternative welfare measures, studying algorithmic impact on social systems.

30.- Conclusion: Intervention beyond utility maximization possible, but care needed. Consider application-specific impacts.

Knowledge Vault built byDavid Vivancos 2024