Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:
Resume:
1.- Introduction by Jennifer Chayes of Max Welling, research chair at University of Amsterdam, VP of Qualcomm, and previous NIPS/ICML roles.
2.- Talk titled "Intelligence and Energy Consumption" exploring ways to reduce energy usage in AI.
3.- Analogy between Industrial Revolution (energy/physical work) and 1940s information age (data/efficiency).
4.- John Wheeler stated information is fundamental to physics - "it from bit".
5.- Black hole entropy proportional to event horizon area; holographic principle encodes physics information on universe's surface.
6.- Verlinde argues gravity is an entropic force, like a stretched molecule curling up due to thermal fluctuations.
7.- Maxwell's demon thought experiment on using information to violate 2nd law of thermodynamics, resolved by Landauer's principle.
8.- Jaynes showed entropy reflects subjective ignorance in modeling, not just physical property. Leads to Bayesian perspective.
9.- Rissanen developed minimum description length - balancing model complexity and data encoding. Extended by Hinton.
10.- Variational Bayes provides explicit free energy equation for models with energy and entropy terms. Reparameterization trick enables gradients.
11.- Goal: use model entropy to run neural networks more efficiently, like the brain. Closing free energy cycle.
12.- MCMC and variational Bayes are two approaches for approximate Bayesian inference, trading off bias and variance.
13.- Stochastic gradient Langevin dynamics enables MCMC with minibatches for large datasets. Reparameterization for variational Bayes.
14.- Local reparameterization trick converts parameter uncertainty to activation uncertainty. Used for compression.
15.- Growing size of deep learning models - 100 trillion parameters (brain-sized) projected by 2025. Unsustainable energy-wise.
16.- AI value must exceed energy cost to run. Edge devices have additional energy constraints vs cloud.
17.- Measure AI success by intelligence per kilowatt-hour, not just accuracy. Brain is ~100x more efficient.
18.- Bayesian compression removes uncertain parameters/activations. Empirical results show large compression with minimal accuracy loss.
19.- Concrete distribution enables learning binary masks for model compression/pruning during training.
20.- Compression doesn't necessarily improve interpretability. Helps regularization, confidence estimation, privacy, adversarial robustness.
21.- Spiking neural networks inspired by event cameras to reduce computation when inputs are static. Achieves energy efficiency.
22.- Universe may be a huge computer according to Wheeler, holographic principle. Some believe we live in a simulation.
23.- Lydia Liu presents paper on "Delayed Impact of Fair Machine Learning" with co-authors.
24.- Increasing fairness papers, but impact of criteria on protected groups often left to intuition. Paper examines this.
25.- Scores (e.g. credit) assumed to correlate with outcome. Lenders maximize utility by thresholding scores. Fairness changes thresholds.
26.- Delayed impact defined as mean score change. Characterized as concave curve vs acceptance rate.
27.- Fairness criteria (demographic parity, equal opportunity) impact groups differently, sometimes causing harm. Depends on score distributions.
28.- Experiments on FICO data show fairness criteria lead to very different outcomes for minority group.
29.- Future work: richer decision spaces beyond binary, alternative welfare measures, studying algorithmic impact on social systems.
30.- Conclusion: Intervention beyond utility maximization possible, but care needed. Consider application-specific impacts.
Knowledge Vault built byDavid Vivancos 2024