Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:
Resume:
1.- Unrolled computation graphs: Represent dynamical systems with parameters governing state evolution over time, used in various machine learning applications.
2.- Classic optimization approaches: Backpropagation through time, truncated backprop, real-time recurrent learning (RTRL), and approximations, each with limitations.
3.- Chaotic loss landscapes: Long unrolls can lead to chaotic or poorly conditioned loss landscapes, making optimization challenging.
4.- Evolution Strategies (ES): Optimizes Gaussian-smoothed meta-objective, doesn't require backprop, memory-efficient, can optimize blackbox functions, scalable on parallel compute.
5.- Persistent Evolution Strategies (PES): Unbiased approach splitting computation graph into truncated unrolls, accumulating correction terms over full sequence.
6.- PES derivation: Uses shift in notation, considering loss as function of entire parameter sequence, deriving gradient estimate.
7.- PES decomposition: Breaks down into sum of sequential gradient estimates, accumulating perturbations over multiple unrolls.
8.- PES implementation: Similar to ES but with particle state tracking and perturbation accumulation.
9.- PES variance: Depends on correlation between gradients at each unroll, can decrease with more unrolls under certain conditions.
10.- Synthetic influence balancing task: Demonstrates PES's unbiasedness, converging to correct solutions unlike truncated methods.
11.- Hyperparameter optimization: PES outperforms truncated methods on toy 2D regression task with chaotic regions.
12.- MNIST learning rate schedule: PES converges to optimal region for both differentiable and non-differentiable objectives.
13.- Multi-hyperparameter tuning: PES outperforms truncated ES and random search for tuning 20 hyperparameters of MLP on MNIST.
14.- Learned optimizer training: PES achieves lower losses and more consistency than ES when meta-training MLP-based optimizer.
15.- Continuous control policy learning: PES more efficient than ES on full episodes for swimmer task, while truncated ES fails.
16.- Unbiased gradient estimation: PES provides unbiased estimates from partial unrolls, unlike truncated methods.
17.- Loss surface smoothing: PES inherits this useful characteristic from ES, helping navigate chaotic landscapes.
18.- Parallelizability: PES is easily parallelizable, inheriting this advantage from ES.
19.- Non-differentiable objectives: PES can work with non-differentiable functions like accuracy instead of loss.
20.- Tractable compute and memory cost: PES achieves this while providing unbiased estimates from partial unrolls.
21.- Applications: PES applicable to hyperparameter optimization, training learned optimizers, and reinforcement learning.
22.- Antithetic sampling: Used in practice for PES, sampling pairs of positive and negative perturbations at each time step.
23.- JAX implementation: Example of PES estimator implementation using JAX, demonstrating simplicity and parallelization.
24.- Comparison with truncated ES: PES differs in tracking particle states and accumulating perturbations over time.
25.- Meta-loss surface visualization: Illustrates chaotic regions in hyperparameter optimization tasks where PES excels.
26.- CIFAR-10 experiment: PES outperforms ES in meta-training learned optimizer for training MLP on CIFAR-10.
27.- Mujoco swimmer task: Demonstrates PES's efficiency in learning continuous control policies using partial unrolls.
28.- Bias elimination: PES eliminates bias from truncations by accumulating correction terms over full sequence of unrolls.
29.- Frequent parameter updates: PES allows for more frequent updates compared to full-unroll ES, improving efficiency.
30.- Easy implementation: PES is described as an easy-to-implement modification of ES, making it accessible for various applications.
Knowledge Vault built byDavid Vivancos 2024