The End Of Knowledge - Vault 2 - ICLR (2014-2023)

graph LR classDef retrospective fill:#f9d4d4, font-weight:bold, font-size:14px; classDef generative fill:#d4f9d4, font-weight:bold, font-size:14px; classDef invertible fill:#d4d4f9, font-weight:bold, font-size:14px; classDef flow fill:#f9f9d4, font-weight:bold, font-size:14px; classDef future fill:#f9d4f9, font-weight:bold, font-size:14px; A[Laurent Dinh
ICLR 2020] --> B[Personal retrospective by Dinh. 1] A --> C[Early deep generative models. 2] C --> D[Restricted Boltzmann machines] C --> E[Autoregressive models] C --> F[Generator network approaches] F --> G[VAEs] F --> H[GANs] A --> I[Dinh's motivation for invertible models. 3] A --> J[Lab themes: DL, autoencoders,
disentangling. 4] A --> K[Invertible functions as autoencoders. 5] A --> L[Change of variables formula. 6] L --> M[Jacobian reflects local mapping. 7] A --> N[Autoregressive architectures enable
determinant computation. 8] N --> O[Triangular Jacobian] A --> P[Deep invertible network with
triangular weights. 9] A --> Q[Coupling layers for inversion,
Jacobian computation. 10] Q --> R[Composing coupling layers
transforms input distribution. 11] A --> S['NICE' model needed improvements. 12] S --> T[Deep learning techniques
improved invertible models. 13] A --> U[Research progress on normalizing flows. 14] U --> V[Architecture level] U --> W[Fundamental building blocks] A --> X[Neural ODEs for invertible layers. 15] A --> Y[Flow model applications. 16] A --> Z[Flows compatible with
probabilistic methods. 17] A --> AA[Invertible models reduce
memory in backprop. 18] A --> AB[Flow models achieve quality
and diversity. 19] AB --> AC[Log-likelihood and quality
can be decorrelated] A --> AD[Density not always
typicality measure. 20] A --> AE[Independence doesn't imply
disentanglement. 21] AE --> AF[Weak supervision may
help disentanglement] A --> AG[Independent base distribution
not required. 22] A --> AH[Promising research directions. 23] AH --> AI[Flows on manifolds] AH --> AJ[Incorporating known structure] AH --> AK[Handling discrete data] AH --> AL[Adaptive sparsity patterns] A --> AM[Invertible models as stepping
stone to non-invertible. 24] AM --> AN[Piecewise invertible functions] AM --> AO[Stochastic inversion] A --> AP[Community work drives
future developments. 25] class A,B retrospective; class C,D,E,F,G,H generative; class I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z,AA,AB,AC,AD,AE,AF,AG,AH,AI,AJ,AK,AL,AM,AN,AO,AP invertible;

Resume:

1.-The talk is a personal retrospective on invertible models and normalizing flows by Laurent Dinh from Google Brain.

2.-Early deep generative models included restricted Boltzmann machines, autoregressive models, and generator network approaches like VAEs and GANs.

3.-Dinh was motivated to pursue tractable maximum likelihood training of generator networks through invertible models.

4.-Recurring themes in Dinh's PhD lab were deep learning, autoencoders, and disentangling factors of variation.

5.-Invertible functions paired with their inverse fulfill the autoencoder goal of encoding/decoding to reconstruct the original input.

6.-The change of variables formula allows computing the density of a variable transformed by an invertible function.

7.-The Jacobian determinant term in the change of variables formula reflects how the mapping affects the space locally.

8.-Neural autoregressive model architectures impose useful sparsity constraints that make the Jacobian triangular and its determinant easy to compute.

9.-Dinh modified a deep invertible network to have triangular weight matrices, allowing tractable density estimation in high dimensions.

10.-Coupling layers modify one part of the input additively as a function of the other part, enabling easy inversion and Jacobian computation.

11.-Composing coupling layers with alternating modified sides allows fully transforming the input distribution while preserving desirable properties.

12.-Dinh's initial "NICE" model showed promise but needed improvements based on reviewer feedback and further community research.

13.-Incorporating deep learning techniques like ResNets, multiplicative coupling terms, multi-scale architectures, and batch normalization improved the invertible models significantly.

14.-The research community made progress on normalizing flows at the architecture level and by developing fundamental building blocks.

15.-Neural ODEs define transformations through ordinary differential equations and provide an alternative way to build invertible layers.

16.-Normalizing flows have been applied to many tasks including image, video, speech, text, graphics, physics, chemistry, and reinforcement learning.

17.-The probabilistic roots of flow models make them compatible with variational inference, MCMC, and approximating autoregressive models.

18.-Invertible models can reduce memory usage in backpropagation by reconstructing activations on-the-fly using the inverse mapping.

19.-Empirically, flow models can achieve both good sample quality and diversity, though log-likelihood and quality can be decorrelated.

20.-Density is not always a good measure of typicality, as bijections can arbitrarily change relative density between points.

21.-Statistical independence does not necessarily imply disentanglement, but weak supervision may help learn disentangled representations.

22.-Using an independent base distribution is convenient but not required; more structured priors can be used.

23.-Promising research directions include learning flows on manifolds, incorporating known structure, handling discrete data, and adaptive sparsity patterns.

24.-Dinh believes invertible models are a stepping stone toward more powerful non-invertible models using piecewise invertible functions and stochastic inversion.

25.-The research community's work, including reviews, blog posts, and educational material, will drive the most promising future developments in normalizing flows.

Knowledge Vault built byDavid Vivancos 2024