Tianqi Chen, Ian Goodfellow, Jon Shlens ICLR 2016 - Net2Net: Accelerating Learning via Knowledge Transfer

**Concept Graph & Resume using Claude 3 Opus | Chat GPT4 | Gemini Adv | Llama 3:**

graph LR
classDef biologicalArtificial fill:#f9d4d4, font-weight:bold, font-size:14px;
classDef trainingApproaches fill:#d4f9d4, font-weight:bold, font-size:14px;
classDef net2net fill:#d4d4f9, font-weight:bold, font-size:14px;
classDef modelEvolution fill:#f9f9d4, font-weight:bold, font-size:14px;
A[Tianqi Chen et al.

ICLR 2016] --> B[Biological, artificial neural

networks analogy 1] A --> C[Deep learning: iterative

model design 2] C --> D[Growing models with

data crucial 3] A --> E["Dumb" way:

discard, retrain 4] A --> F[Teacher-student approach

doesn't converge 5] A --> G[Net2Net: transform,

continue training 6] G --> H[Preserving knowledge

is important 7] G --> I[Widen networks:

duplicate, divide channels 9] G --> J[Deepen networks:

factor layers 10] G --> K[ImageNet experiments

with Inception 11] K --> L[Widening speeds

convergence 3-4x 12] K --> M[Deepening improves

convergence, accuracy 13] K --> N[Wider + deeper

slightly outperforms 14] K --> O[Net2Net accelerates

model exploration 15] A --> P[Need better than

discarding, retraining 16] P --> Q[Reuse models to

accelerate training 17] Q --> R[Function-preserving transformations

avoid slow convergence 18] A --> S[Continuous, incremental training

beyond one-shot 19] S --> T[Net2Net: step towards

lifelong learning 20] class B biologicalArtificial; class C,D,E,F trainingApproaches; class G,H,I,J,K,L,M,N,O net2net; class P,Q,R,S,T modelEvolution;

ICLR 2016] --> B[Biological, artificial neural

networks analogy 1] A --> C[Deep learning: iterative

model design 2] C --> D[Growing models with

data crucial 3] A --> E["Dumb" way:

discard, retrain 4] A --> F[Teacher-student approach

doesn't converge 5] A --> G[Net2Net: transform,

continue training 6] G --> H[Preserving knowledge

is important 7] G --> I[Widen networks:

duplicate, divide channels 9] G --> J[Deepen networks:

factor layers 10] G --> K[ImageNet experiments

with Inception 11] K --> L[Widening speeds

convergence 3-4x 12] K --> M[Deepening improves

convergence, accuracy 13] K --> N[Wider + deeper

slightly outperforms 14] K --> O[Net2Net accelerates

model exploration 15] A --> P[Need better than

discarding, retraining 16] P --> Q[Reuse models to

accelerate training 17] Q --> R[Function-preserving transformations

avoid slow convergence 18] A --> S[Continuous, incremental training

beyond one-shot 19] S --> T[Net2Net: step towards

lifelong learning 20] class B biologicalArtificial; class C,D,E,F trainingApproaches; class G,H,I,J,K,L,M,N,O net2net; class P,Q,R,S,T modelEvolution;

**Resume: **

**1.-**The speaker makes an analogy between biological and artificial neural networks - more neurons/data leads to more intelligence but longer training times.

**2.-**In reality, deep learning involves iterating and experimenting with many neural network designs until finding one that works well enough.

**3.-**This experiment loop of growing models as data increases happens across machine learning, and will be crucial for building continuously learning systems.

**4.-**Current approaches to training new models discard the old trained network and retrain from scratch - referred to as the "dumb" way.

**5.-**Another approach is using the trained network as a teacher to supervise a new student network, but this doesn't converge well.

**6.-**The speaker proposes "Net2Net" - transforming a trained model into an equivalent new model and continuing training, to enable continuous model evolution.

**7.-**Experiments show randomly reinitializing over half of a trained network's layers significantly slows convergence, so preserving knowledge is important.

**8.-**Net2Net uses function-preserving transformations to expand model capacity in width (more channels per layer) or depth (more layers).

**9.-**For wider networks, channels are randomly duplicated then divided to maintain functional equivalence, with some noise added to break symmetry.

**10.-**To make networks deeper, layers can be factored into two layers, like adding an identity mapping, in a way that generalizes.

**11.-**Experiments were conducted on ImageNet using Inception to test if Net2Net can speed up the model development/experimentation cycle.

**12.-**When widening a smaller Inception model, Net2Net allows quickly reaching the smaller model's performance, then improving, yielding a 3-4x speedup vs scratch.

**13.-**Similar faster convergence and final accuracy are seen when adding convolutional layers to deepen a standard Inception model.

**14.-**By applying both wider and deeper Net2Net transforms to Inception, they quickly explore new architectures that slightly outperform the original.

**15.-**The larger models' theoretical scratch training convergence is even slower, confirming Net2Net's ability to accelerate model exploration and improvement.

**16.-**In conclusion, we need better approaches than discarding and retraining models from scratch as data increases and models evolve.

**17.-**It's possible to reuse trained models to accelerate training of larger models for the new data, as Net2Net demonstrates.

**18.-**The key is using function-preserving transformations to expand models while avoiding randomly initialized components that slow convergence.

**19.-**More broadly, we should think about continuous and incremental training of models beyond one-shot training as a crucial need.

**20.-**Net2Net is just a small step in this direction of enabling continuous model evolution and lifelong learning systems.

Knowledge Vault built byDavid Vivancos 2024