The End Of Knowledge - Vault 2 - ICLR (2014-2023)

graph LR classDef Main fill:#f9d4d4, font-weight:bold, font-size:14px; classDef DeepMind fill:#d4f9d4, font-weight:bold, font-size:14px; classDef WaveNet fill:#d4d4f9, font-weight:bold, font-size:14px; classDef Impala fill:#f9f9d4, font-weight:bold, font-size:14px; classDef SPIRAL fill:#f9d4f9, font-weight:bold, font-size:14px; classDef Integer fill:#d4f9f9, font-weight:bold, font-size:14px; A[Koray Kavukcuoglu
ICLR 2018 ] --> B[Kavukcuoglu: generative
models, agents. 1] A --> C[Unsupervised learning:
understand, explain data. 2] C --> D[Rich representations enable
generalization, transfer. 3] A --> E[WaveNet: end-to-end
audio model. 4] E --> F[Dilated convolutions model
long-range dependencies. 5] E --> G[Human-level speech Google Assistant. 6] E --> H[Parallel WaveNet: inverse
autoregressive flow. 7] H --> I[Probability distillation,
losses improve efficiency. 8] E --> J[Deep learning enables
rapid development. 9] A --> K[Deep RL: sequential decisions,
representations. 10] K --> L[Impala: scalable off-policy
actor-critic. 11] L --> M[DeepMind Lab: multitask 3D world. 12] L --> N[Stability, transfer key for multitask. 13] L --> O[Decoupled acting, learning
for parallelization. 14] L --> P[V-trace balances on-policy, off-policy. 15] L --> Q[Impala outperforms A3C on DMLab-30. 16] A --> R[SPIRAL: unsupervised
RL for programs. 17] R --> S[Agent, engine, discriminator
interact. 18] R --> T[Learns MNIST, Omniglot
from feedback. 19] R --> U[Programs enable sim-to-real
transfer. 20] R --> V[Future: interpretability,
tools, environment learning. 21] A --> W[Wu: integer training, inference. 22] W --> X[Edge devices: power, memory, precision. 23] W --> Y[WAGE: weights, activations,
gradients, errors. 24] Y --> Z[Mapping, shifting, rounding, pure SGD. 25] Y --> AA[Good accuracy on
CIFAR, ImageNet. 26] W --> AB[Depth, data affect
precision needs. 27] W --> AC[Distributions, bottlenecks
key for accuracy. 28] W --> AD[Reduces energy, area, memory costs. 29] W --> AE[Enables on-device learning. 30] class A Main; class B,C,D DeepMind; class E,F,G,H,I,J WaveNet; class K,L,M,N,O,P,Q Impala; class R,S,T,U,V SPIRAL; class W,X,Y,Z,AA,AB,AC,AD,AE Integer;

Resume:

1.-Koray Kavukcuoglu gave an invited talk on generative models and generative agents, discussing unsupervised learning in DeepMind's recent work.

2.-Unsupervised learning aims to understand data and explain the environment, with generated samples indicating the model's understanding.

3.-Rich representations learned through unsupervised learning should enable generalization and transfer.

4.-WaveNet is an end-to-end generative model of raw audio that can produce realistic speech and music samples.

5.-WaveNet uses dilated convolutional layers to model long-range dependencies efficiently during training but generates samples autoregressively.

6.-WaveNet achieves human-level performance in text-to-speech and is used in Google Assistant.

7.-Parallel WaveNet makes the model more efficient using an inverse autoregressive flow student model trained by a pre-trained WaveNet teacher.

8.-Parallel WaveNet uses probability density distillation, a power loss, a perceptual loss from speech recognition, and a contrastive loss.

9.-Generalization and rapid development for new speakers and languages is a key advantage of the deep learning approach.

10.-Deep reinforcement learning combines RL's sequential decision making with deep learning's representation learning to tackle challenging problems.

11.-Impala is a highly scalable and efficient off-policy actor-critic agent architecture developed at DeepMind.

12.-The DeepMind Lab environment enables testing the ability of a single agent to perform multiple tasks in a complex 3D world.

13.-Stability, low hyperparameter sensitivity, and positive transfer between tasks are important for training Impala in the multitask setting.

14.-Impala decouples acting from learning, enabling efficient parallelization and robustness to varying environment rendering speeds.

15.-The V-trace off-policy advantage actor-critic algorithm balances the on-policy vs off-policy trade-off in Impala.

16.-Impala demonstrates better data efficiency, performance, and positive transfer compared to A3C when trained on DMLab-30.

17.-SPIRAL is an unsupervised RL approach for training agents to generate programs that lead to preferred environmental states.

18.-SPIRAL uses an agent, an execution engine, and a discriminator, combining RL, programmer synthesis, and generative adversarial networks.

19.-The SPIRAL agent generates brush strokes, the libmypaint environment renders them, and a discriminator provides a reward signal.

20.-SPIRAL learns to generate MNIST digits and Omniglot symbols using only discriminator feedback, with the ability to generalize between domains.

21.-Representing the policy as a general-purpose program enables SPIRAL to transfer from a simulator to a real robot.

22.-Future directions include interpretable program synthesis, tool use, and environment learning with RL agents.

23.-Shuang Wu presented work on training and inference with integers in deep neural networks for edge device deployment.

24.-The key challenges are limited power, memory, and precision, especially for future neuromorphic hardware.

25.-Their WAGE approach constrains weights, activations, gradients, and errors to low-bit-width integers in both training and inference.

26.-Techniques used include linear mapping, distribution shifting, deterministic and stochastic rounding, and pure mini-batch SGD.

27.-Good accuracy is achieved with ternary weights and 8-bit activations, gradients, and errors on CIFAR and ImageNet.

28.-Arrow and gradient precision requirements were found to be depth and data dependent respectively.

29.-Adjusting internal distributions and avoiding information bottlenecks is key to maintaining accuracy with integer quantization.

30.-Integer quantization reduces energy, area, and memory access costs for DNN accelerators, enabling on-device learning.

Knowledge Vault built byDavid Vivancos 2024