Neil Lawrence ICLR 2016 - Keynote - Beyond Backpropagation: Uncertainty Propagation

**Concept Graph & Resume using Claude 3 Opus | Chat GPT4 | Gemini Adv | Llama 3:**

graph LR
classDef mackay fill:#f9d4d4, font-weight:bold, font-size:14px;
classDef machinelearning fill:#d4f9d4, font-weight:bold, font-size:14px;
classDef gaussianprocesses fill:#d4d4f9, font-weight:bold, font-size:14px;
classDef deeplearning fill:#f9f9d4, font-weight:bold, font-size:14px;
classDef latentmodels fill:#f9d4f9, font-weight:bold, font-size:14px;
classDef resources fill:#d4f9f9, font-weight:bold, font-size:14px;
A[Neil Lawrence

ICLR 2016] --> B[Inspirational Mackay

passed from cancer. 1] B --> C[Mackay revolutionized

machine learning. 2] A --> D[Speaker: oil rigs to

PhD student. 3] C --> E[Mackay introduced priors

over NN weights. 4] C --> F[Gaussian processes solved

NN problems then. 5] A --> G[Data explosion advanced

deep learning rapidly. 6] A --> H[Gaussian processes: priors

over functions directly. 7] H --> I[Gaussian processes, NNs

connected under conditions. 8] H --> J[Gaussian processes excel

on small data. 9] H --> K[Gaussian processes model

malaria in Uganda. 10] H --> L[Gaussian processes infer

protein levels. 11] H --> M[Mackay: Gaussian processes

just smoothing machines? 12] A --> N[Deep learning composes

differentiable functions. 13] A --> O[Bayesian inference: priors,

posteriors, predictions. 14] A --> P[Variational inference approximates

intractable posteriors. 15] H --> Q[Gaussian process inference

hard, made tractable. 16] H --> R[Sparse approximations scale

Gaussian processes. 17] H --> S[Composing Gaussian processes

challenging, bounds enable. 18] S --> T[Deep Gaussian processes

compose with uncertainty. 19] T --> U[Deep Gaussian processes

avoid overfitting. 20] A --> V[Latent variable models

represent high-D observations. 21] V --> W[Mackay pioneered neural networks

for unsupervised latents. 22] V --> X[Gaussian process latents

extract low-D structure. 23] X --> Y[Layered Gaussian process

latents learn hierarchies. 24] Y --> Z[Company scaling layered

Gaussian process models. 25] Z --> AA[New approximations reduce

numerical issues scaling. 26] V --> AB[Goal: 'deep health'

personalized medicine models. 27] A --> AC[Resources available:

schools, tutorials, software. 28] A --> AD[Recent research: RNNs,

variational autoencoders. 29] A --> AE[Speaker inspired by

Mackay, laments loss. 30] class A,B,AE mackay; class C,D,E,F,N,O,P,W machinelearning; class G,H,I,J,K,L,M,Q,R,S,T,U gaussianprocesses; class V,X,Y,Z,AA,AB latentmodels; class AC resources; class AD deeplearning;

ICLR 2016] --> B[Inspirational Mackay

passed from cancer. 1] B --> C[Mackay revolutionized

machine learning. 2] A --> D[Speaker: oil rigs to

PhD student. 3] C --> E[Mackay introduced priors

over NN weights. 4] C --> F[Gaussian processes solved

NN problems then. 5] A --> G[Data explosion advanced

deep learning rapidly. 6] A --> H[Gaussian processes: priors

over functions directly. 7] H --> I[Gaussian processes, NNs

connected under conditions. 8] H --> J[Gaussian processes excel

on small data. 9] H --> K[Gaussian processes model

malaria in Uganda. 10] H --> L[Gaussian processes infer

protein levels. 11] H --> M[Mackay: Gaussian processes

just smoothing machines? 12] A --> N[Deep learning composes

differentiable functions. 13] A --> O[Bayesian inference: priors,

posteriors, predictions. 14] A --> P[Variational inference approximates

intractable posteriors. 15] H --> Q[Gaussian process inference

hard, made tractable. 16] H --> R[Sparse approximations scale

Gaussian processes. 17] H --> S[Composing Gaussian processes

challenging, bounds enable. 18] S --> T[Deep Gaussian processes

compose with uncertainty. 19] T --> U[Deep Gaussian processes

avoid overfitting. 20] A --> V[Latent variable models

represent high-D observations. 21] V --> W[Mackay pioneered neural networks

for unsupervised latents. 22] V --> X[Gaussian process latents

extract low-D structure. 23] X --> Y[Layered Gaussian process

latents learn hierarchies. 24] Y --> Z[Company scaling layered

Gaussian process models. 25] Z --> AA[New approximations reduce

numerical issues scaling. 26] V --> AB[Goal: 'deep health'

personalized medicine models. 27] A --> AC[Resources available:

schools, tutorials, software. 28] A --> AD[Recent research: RNNs,

variational autoencoders. 29] A --> AE[Speaker inspired by

Mackay, laments loss. 30] class A,B,AE mackay; class C,D,E,F,N,O,P,W machinelearning; class G,H,I,J,K,L,M,Q,R,S,T,U gaussianprocesses; class V,X,Y,Z,AA,AB latentmodels; class AC resources; class AD deeplearning;

**Resume: **

**1.-**David Mackay was an inspirational figure who passed away from cancer at age 49, leaving behind a young family.

**2.-**Mackay revolutionized machine learning and information theory. A symposium was held before his death to honor his broad influence.

**3.-**The speaker worked on oil rigs implementing neural networks before becoming a PhD student. Neural networks are functions approximating weighted sums.

**4.-**Mackay introduced priors over weights in neural networks, turning them into classes of functions. Weight decay implements this idea.

**5.-**With limited data in that era, Gaussian processes seemed to solve many machine learning problems that neural networks aimed to address.

**6.-**Digital data explosion in areas like vision, speech, language allowed deep learning methods to advance rapidly and achieve impressive results.

**7.-**Gaussian processes take a different modeling approach - placing priors over functions directly. Covariance functions relate inputs to covariances.

**8.-**Gaussian processes and neural networks are connected - as hidden layers increase, neural nets converge to Gaussian processes under certain conditions.

**9.-**For small datasets, Gaussian processes often outperform other methods. They provide good uncertainty estimates for tasks like Bayesian optimization.

**10.-**Gaussian processes have been applied to model malaria spread in Uganda, inferring missing reports. Visualization is key for impact.

**11.-**Gaussian processes can infer unobserved protein levels in gene regulatory networks by placing priors on the dynamics as differential equations.

**12.-**Despite their power, Mackay noted Gaussian processes are just sophisticated smoothing machines, questioning if we "threw the baby out with the bathwater."

**13.-**Deep learning composes differentiable functions to learn representations. Propagating gradients through the composition is key to optimizing them.

**14.-**Bayesian inference involves specifying prior distributions, computing posterior distributions over parameters, and making predictions by marginalizing the posterior.

**15.-**Variational inference approximates intractable posteriors with simpler distributions, turning integration into optimization problems. It gives probabilistic neural network training.

**16.-**Gaussian process inference is hard due to priors on infinite-dimensional functions. Variational approximations and augmentation make it tractable.

**17.-**Sparse approximations allow Gaussian processes to scale to large datasets. Parameters increase to tighten a lower bound on the marginal likelihood.

**18.-**Composing Gaussian processes to make deep models is challenging due to intractability of the resulting integral. Variational bounds enable it.

**19.-**Deep Gaussian processes give a way to compose stochastic processes while maintaining uncertainty. Theory may help understand how deep learning works.

**20.-**On small datasets, deep Gaussian processes can avoid overfitting as much as shallow ones while increasing flexibility.

**21.-**Latent variable models represent high-dimensional observations through lower-dimensional unobserved variables. Motion capture data demonstrates this concept.

**22.-**Mackay pioneered using neural networks for unsupervised latent variable models through density networks, but limited data restricted their effectiveness then.

**23.-**Gaussian process latent variable models can extract meaningful low-dimensional structures and even infer the latent dimensionality needed from little data.

**24.-**Layered Gaussian process latent variable models applied to handwriting and motion capture aim to learn hierarchical, abstract representations.

**25.-**Scaling up layered Gaussian process models is a key challenge being addressed by forming a company to develop them further.

**26.-**New approximations from the company reduce numerical issues when scaling these models, showing promising early results over previous approaches.

**27.-**The ultimate goal is "deep health" - integrating all aspects of an individual's health data into comprehensive models for personalized medicine.

**28.-**Educational resources are available to learn more about Gaussian processes, including a summer school, tutorial, and open-source software.

**29.-**Recent research extends Gaussian processes to recurrent neural network architectures and introduces variational autoencoders with deep Gaussian process priors.

**30.-**The speaker attributes his research direction and inspiration to Mackay's influence, lamenting the loss of Mackay's ongoing presence for his family.

Knowledge Vault built byDavid Vivancos 2024