The End Of Knowledge - Vault 6/56 - CVPR - 2020 - Representation learning on sequential data with latent priors

graph LR classDef main fill:#f9d4f9, font-weight:bold, font-size:14px classDef basics fill:#f9d4d4, font-weight:bold, font-size:14px classDef models fill:#d4f9d4, font-weight:bold, font-size:14px classDef techniques fill:#d4d4f9, font-weight:bold, font-size:14px classDef challenges fill:#f9f9d4, font-weight:bold, font-size:14px classDef applications fill:#d4f9f9, font-weight:bold, font-size:14px Main[Representation learning on
sequential data with
latent priors] --> A[Fundamental Concepts] Main --> B[Models and Architectures] Main --> C[Learning Techniques] Main --> D[Challenges and Solutions] Main --> E[Applications and Extensions] A --> A1[Unsupervised learning: represent unlabeled sequential
data 1] A --> A2[Discover units in speech and
handwriting 2] A --> A3[Latent representation: compact, useful data
form 3] A --> A4[Bottleneck forces efficient representations 7] A --> A5[Information filtering retains relevant, discards
irrelevant 10] A --> A6[Zero-shot learning: perform on unseen
data 11] B --> B1[Autoencoder: encode, then reconstruct input 4] B --> B2[VAE: encode data as probability
distributions 5] B --> B3[VQVAE: discrete latent representations via
clustering 6] B --> B4[Autoregressive models predict from past
values 8] B --> B5[Markovian model: probabilistic state transitions 19] B --> B6[Convolutional Deep Markov Model: CNNs
with Markovian dynamics 20] C --> C1[Probe classifiers analyze unsupervised representations 9] C --> C2[Smoothness prior: latent representations change
smoothly 12] C --> C3[Time jittering enforces smoothness without
collapse 14] C --> C4[Constrained optimization enforces desired properties 16] C --> C5[Lagrangian relaxation converts constrained to
unconstrained 17] C --> C6[Greedy algorithm merges latent vectors 18] D --> D1[Latent collapse ignores latent representations 13] D --> D2[Piecewise constant representation within units 15] D --> D3[Variational inference approximates complex distributions 21] D --> D4[Linguistic prior incorporates language structure
knowledge 22] D --> D5[Contrastive coding contrasts related, unrelated
samples 23] D --> D6[Maximize mutual information between inputs,
latents 24] E --> E1[Wave2Vec: self-supervised speech recognition technique 25] E --> E2[MIME-CPC: mutual information and contrastive
coding 26] E --> E3[Pixel CNN generates images pixel-by-pixel 27] E --> E4[WaveNet generates raw audio waveforms 28] E --> E5[Filter bank reconstruction measures spectrogram
reconstruction 29] E --> E6[Tonal information: pitch patterns carry
meaning 30] class Main main class A,A1,A2,A3,A4,A5,A6 basics class B,B1,B2,B3,B4,B5,B6 models class C,C1,C2,C3,C4,C5,C6 techniques class D,D1,D2,D3,D4,D5,D6 challenges class E,E1,E2,E3,E4,E5,E6 applications

Resume:

1.- Unsupervised learning: Technique to learn representations of sequential data without labeled data, useful for understanding structure in documents like the Voynich manuscript.

2.- Unsupervised unit discovery: Finding boundaries and clustering data in speech and handwriting to identify characters or phonemes.

3.- Latent representation: Capturing essential information from input data in a more compact and useful form.

4.- Autoencoder: Neural network that encodes input data, then decodes it to reconstruct the original input.

5.- Variational Autoencoder (VAE): Generative model that learns to encode data as probability distributions in latent space.

6.- Vector Quantized VAE (VQVAE): VAE variant that uses discrete latent representations by clustering encoder outputs.

7.- Bottleneck: Constraining information flow in a model to force it to learn efficient representations.

8.- Autoregressive models: Models that predict future values based on past values, used for reconstructing data from latent representations.

9.- Probe classifiers: Small supervised classifiers used to analyze information content in unsupervised model representations.

10.- Information filtering: Selectively retaining relevant information (e.g., phonemes) while discarding irrelevant information (e.g., speaker identity).

11.- Zero-shot learning: Model's ability to perform tasks on unseen data or in new contexts.

12.- Smoothness prior: Assumption that latent representations should change smoothly over time for sequential data.

13.- Latent collapse: When a model ignores latent representations and relies solely on autoregressive decoding.

14.- Time jittering: Randomly copying latent vectors to enforce smoothness without causing latent collapse.

15.- Piecewise constant representation: Latent representation that remains constant within units (e.g., phonemes) and changes abruptly at boundaries.

16.- Constrained optimization: Formulating the learning problem with constraints to enforce desired properties in latent representations.

17.- Lagrangian relaxation: Converting constrained optimization problems into unconstrained problems with penalty terms.

18.- Greedy algorithm: Approach for solving the constrained optimization problem by merging latent vectors.

19.- Markovian dynamic model: Probabilistic model for transitions between latent states over time.

20.- Convolutional Deep Markov Model: Model combining convolutional neural networks with Markovian dynamics for latent representations.

21.- Variational inference: Technique for approximating complex probability distributions, used in VAEs and related models.

22.- Linguistic prior: Incorporating knowledge about language structure into latent representation learning.

23.- Contrastive coding: Learning technique that contrasts related and unrelated samples to improve representations.

24.- Mutual Information Maximization: Approach to learn representations by maximizing mutual information between inputs and latents.

25.- Wave2Vec: Self-supervised learning technique for speech recognition.

26.- MIME-CPC: Mutual Information Maximization and Contrastive Predictive Coding, techniques for representation learning.

27.- Pixel CNN: Autoregressive model for generating images pixel by pixel, used in handwriting generation example.

28.- WaveNet: Neural network for generating raw audio waveforms, used as a decoder in speech models.

29.- Filter bank reconstruction: Measure of how well a model can reconstruct speech spectrograms from latent representations.

30.- Tonal information: Pitch patterns in languages like Mandarin that carry meaning, potentially lost in some unsupervised models.

Knowledge Vault built byDavid Vivancos 2024