Intelligence per Kilowatthour

Max Welling

**Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:**

graph LR
classDef intro fill:#f9d4d4, font-weight:bold, font-size:14px
classDef physics fill:#d4f9d4, font-weight:bold, font-size:14px
classDef energy fill:#d4d4f9, font-weight:bold, font-size:14px
classDef fairness fill:#f9f9d4, font-weight:bold, font-size:14px
classDef future fill:#f9d4f9, font-weight:bold, font-size:14px
Main[Intelligence per Kilowatthour]
Main --> A[Max Welling: Intelligence

and Energy Consumption 1] A --> B[Reduce energy usage

in AI 2] A --> C[Analogy: Industrial Revolution

and Information Age 3] Main --> D[Physics and Information] D --> E[Information fundamental to

physics: it from bit 4] D --> F[Black hole entropy,

holographic principle 5] D --> G[Gravity as entropic

force Verlinde 6] D --> H[Maxwells demon, Landauers

principle 7] Main --> I[Information Theory and AI] I --> J[Jaynes: entropy as

subjective ignorance 8] I --> K[Rissanen: minimum description

length 9] I --> L[Variational Bayes: free

energy equation 10] I --> M[Use model entropy

for neural network efficiency 11] Main --> N[Energy Efficiency in AI] N --> O[MCMC and variational

Bayes for inference 12] N --> P[Stochastic gradient Langevin

dynamics for MCMC 13] N --> Q[Local reparameterization for

activation uncertainty 14] N --> R[Deep learning models

growing unsustainably large 15] R --> S[AI value must

exceed energy cost 16] R --> T[Measure intelligence per

kilowatt-hour 17] Main --> U[AI Compression Techniques] U --> V[Bayesian compression removes

uncertain parameters 18] U --> W[Concrete distribution for

binary mask learning 19] U --> X[Compression benefits: regularization,

privacy, robustness 20] U --> Y[Spiking neural networks

for energy efficiency 21] Main --> Z[Fairness in Machine Learning] Z --> AA[Delayed Impact of

Fair Machine Learning 23] AA --> AB[Fairness impact on

protected groups examined 24] AA --> AC[Scores correlate with

outcomes, thresholding scores 25] AA --> AD[Delayed impact: mean

score change 26] AA --> AE[Fairness criteria impact

groups differently 27] AA --> AF[FICO data: fairness

criteria lead to differences 28] Main --> AG[Future Directions] AG --> AH[Universe as huge

computer, simulation hypothesis 22] AG --> AI[Richer decision spaces,

alternative welfare measures 29] AG --> AJ[Care needed in

application-specific impacts 30] class A,B,C intro class D,E,F,G,H physics class I,J,K,L,M,N,O,P,Q,R,S,T energy class U,V,W,X,Y energy class Z,AA,AB,AC,AD,AE,AF fairness class AG,AH,AI,AJ future

and Energy Consumption 1] A --> B[Reduce energy usage

in AI 2] A --> C[Analogy: Industrial Revolution

and Information Age 3] Main --> D[Physics and Information] D --> E[Information fundamental to

physics: it from bit 4] D --> F[Black hole entropy,

holographic principle 5] D --> G[Gravity as entropic

force Verlinde 6] D --> H[Maxwells demon, Landauers

principle 7] Main --> I[Information Theory and AI] I --> J[Jaynes: entropy as

subjective ignorance 8] I --> K[Rissanen: minimum description

length 9] I --> L[Variational Bayes: free

energy equation 10] I --> M[Use model entropy

for neural network efficiency 11] Main --> N[Energy Efficiency in AI] N --> O[MCMC and variational

Bayes for inference 12] N --> P[Stochastic gradient Langevin

dynamics for MCMC 13] N --> Q[Local reparameterization for

activation uncertainty 14] N --> R[Deep learning models

growing unsustainably large 15] R --> S[AI value must

exceed energy cost 16] R --> T[Measure intelligence per

kilowatt-hour 17] Main --> U[AI Compression Techniques] U --> V[Bayesian compression removes

uncertain parameters 18] U --> W[Concrete distribution for

binary mask learning 19] U --> X[Compression benefits: regularization,

privacy, robustness 20] U --> Y[Spiking neural networks

for energy efficiency 21] Main --> Z[Fairness in Machine Learning] Z --> AA[Delayed Impact of

Fair Machine Learning 23] AA --> AB[Fairness impact on

protected groups examined 24] AA --> AC[Scores correlate with

outcomes, thresholding scores 25] AA --> AD[Delayed impact: mean

score change 26] AA --> AE[Fairness criteria impact

groups differently 27] AA --> AF[FICO data: fairness

criteria lead to differences 28] Main --> AG[Future Directions] AG --> AH[Universe as huge

computer, simulation hypothesis 22] AG --> AI[Richer decision spaces,

alternative welfare measures 29] AG --> AJ[Care needed in

application-specific impacts 30] class A,B,C intro class D,E,F,G,H physics class I,J,K,L,M,N,O,P,Q,R,S,T energy class U,V,W,X,Y energy class Z,AA,AB,AC,AD,AE,AF fairness class AG,AH,AI,AJ future

**Resume: **

**1.-** Introduction by Jennifer Chayes of Max Welling, research chair at University of Amsterdam, VP of Qualcomm, and previous NIPS/ICML roles.

**2.-** Talk titled "Intelligence and Energy Consumption" exploring ways to reduce energy usage in AI.

**3.-** Analogy between Industrial Revolution (energy/physical work) and 1940s information age (data/efficiency).

**4.-** John Wheeler stated information is fundamental to physics - "it from bit".

**5.-** Black hole entropy proportional to event horizon area; holographic principle encodes physics information on universe's surface.

**6.-** Verlinde argues gravity is an entropic force, like a stretched molecule curling up due to thermal fluctuations.

**7.-** Maxwell's demon thought experiment on using information to violate 2nd law of thermodynamics, resolved by Landauer's principle.

**8.-** Jaynes showed entropy reflects subjective ignorance in modeling, not just physical property. Leads to Bayesian perspective.

**9.-** Rissanen developed minimum description length - balancing model complexity and data encoding. Extended by Hinton.

**10.-** Variational Bayes provides explicit free energy equation for models with energy and entropy terms. Reparameterization trick enables gradients.

**11.-** Goal: use model entropy to run neural networks more efficiently, like the brain. Closing free energy cycle.

**12.-** MCMC and variational Bayes are two approaches for approximate Bayesian inference, trading off bias and variance.

**13.-** Stochastic gradient Langevin dynamics enables MCMC with minibatches for large datasets. Reparameterization for variational Bayes.

**14.-** Local reparameterization trick converts parameter uncertainty to activation uncertainty. Used for compression.

**15.-** Growing size of deep learning models - 100 trillion parameters (brain-sized) projected by 2025. Unsustainable energy-wise.

**16.-** AI value must exceed energy cost to run. Edge devices have additional energy constraints vs cloud.

**17.-** Measure AI success by intelligence per kilowatt-hour, not just accuracy. Brain is ~100x more efficient.

**18.-** Bayesian compression removes uncertain parameters/activations. Empirical results show large compression with minimal accuracy loss.

**19.-** Concrete distribution enables learning binary masks for model compression/pruning during training.

**20.-** Compression doesn't necessarily improve interpretability. Helps regularization, confidence estimation, privacy, adversarial robustness.

**21.-** Spiking neural networks inspired by event cameras to reduce computation when inputs are static. Achieves energy efficiency.

**22.-** Universe may be a huge computer according to Wheeler, holographic principle. Some believe we live in a simulation.

**23.-** Lydia Liu presents paper on "Delayed Impact of Fair Machine Learning" with co-authors.

**24.-** Increasing fairness papers, but impact of criteria on protected groups often left to intuition. Paper examines this.

**25.-** Scores (e.g. credit) assumed to correlate with outcome. Lenders maximize utility by thresholding scores. Fairness changes thresholds.

**26.-** Delayed impact defined as mean score change. Characterized as concave curve vs acceptance rate.

**27.-** Fairness criteria (demographic parity, equal opportunity) impact groups differently, sometimes causing harm. Depends on score distributions.

**28.-** Experiments on FICO data show fairness criteria lead to very different outcomes for minority group.

**29.-** Future work: richer decision spaces beyond binary, alternative welfare measures, studying algorithmic impact on social systems.

**30.-** Conclusion: Intervention beyond utility maximization possible, but care needed. Consider application-specific impacts.

Knowledge Vault built byDavid Vivancos 2024