The End Of Knowledge - Vault 2 - ICLR (2014-2023) - Tianqi Chen, Ian Goodfellow, Jon Shlens ICLR 2016

graph LR classDef biologicalArtificial fill:#f9d4d4, font-weight:bold, font-size:14px; classDef trainingApproaches fill:#d4f9d4, font-weight:bold, font-size:14px; classDef net2net fill:#d4d4f9, font-weight:bold, font-size:14px; classDef modelEvolution fill:#f9f9d4, font-weight:bold, font-size:14px; A[Tianqi Chen et al.
ICLR 2016] --> B[Biological, artificial neural
networks analogy 1] A --> C[Deep learning: iterative
model design 2] C --> D[Growing models with
data crucial 3] A --> E["Dumb" way:
discard, retrain 4] A --> F[Teacher-student approach
doesn't converge 5] A --> G[Net2Net: transform,
continue training 6] G --> H[Preserving knowledge
is important 7] G --> I[Widen networks:
duplicate, divide channels 9] G --> J[Deepen networks:
factor layers 10] G --> K[ImageNet experiments
with Inception 11] K --> L[Widening speeds
convergence 3-4x 12] K --> M[Deepening improves
convergence, accuracy 13] K --> N[Wider + deeper
slightly outperforms 14] K --> O[Net2Net accelerates
model exploration 15] A --> P[Need better than
discarding, retraining 16] P --> Q[Reuse models to
accelerate training 17] Q --> R[Function-preserving transformations
avoid slow convergence 18] A --> S[Continuous, incremental training
beyond one-shot 19] S --> T[Net2Net: step towards
lifelong learning 20] class B biologicalArtificial; class C,D,E,F trainingApproaches; class G,H,I,J,K,L,M,N,O net2net; class P,Q,R,S,T modelEvolution;

Resume:

1.-The speaker makes an analogy between biological and artificial neural networks - more neurons/data leads to more intelligence but longer training times.

2.-In reality, deep learning involves iterating and experimenting with many neural network designs until finding one that works well enough.

3.-This experiment loop of growing models as data increases happens across machine learning, and will be crucial for building continuously learning systems.

4.-Current approaches to training new models discard the old trained network and retrain from scratch - referred to as the "dumb" way.

5.-Another approach is using the trained network as a teacher to supervise a new student network, but this doesn't converge well.

6.-The speaker proposes "Net2Net" - transforming a trained model into an equivalent new model and continuing training, to enable continuous model evolution.

7.-Experiments show randomly reinitializing over half of a trained network's layers significantly slows convergence, so preserving knowledge is important.

8.-Net2Net uses function-preserving transformations to expand model capacity in width (more channels per layer) or depth (more layers).

9.-For wider networks, channels are randomly duplicated then divided to maintain functional equivalence, with some noise added to break symmetry.

10.-To make networks deeper, layers can be factored into two layers, like adding an identity mapping, in a way that generalizes.

11.-Experiments were conducted on ImageNet using Inception to test if Net2Net can speed up the model development/experimentation cycle.

12.-When widening a smaller Inception model, Net2Net allows quickly reaching the smaller model's performance, then improving, yielding a 3-4x speedup vs scratch.

13.-Similar faster convergence and final accuracy are seen when adding convolutional layers to deepen a standard Inception model.

14.-By applying both wider and deeper Net2Net transforms to Inception, they quickly explore new architectures that slightly outperform the original.

15.-The larger models' theoretical scratch training convergence is even slower, confirming Net2Net's ability to accelerate model exploration and improvement.

16.-In conclusion, we need better approaches than discarding and retraining models from scratch as data increases and models evolve.

17.-It's possible to reuse trained models to accelerate training of larger models for the new data, as Net2Net demonstrates.

18.-The key is using function-preserving transformations to expand models while avoiding randomly initialized components that slow convergence.

19.-More broadly, we should think about continuous and incremental training of models beyond one-shot training as a crucial need.

20.-Net2Net is just a small step in this direction of enabling continuous model evolution and lifelong learning systems.

Knowledge Vault built byDavid Vivancos 2024