Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:
graph LR
classDef main fill:#f9d4d4, font-weight:bold, font-size:14px
classDef basics fill:#d4f9d4, font-weight:bold, font-size:14px
classDef gradients fill:#d4d4f9, font-weight:bold, font-size:14px
classDef stochastic fill:#f9f9d4, font-weight:bold, font-size:14px
classDef advanced fill:#f9d4f9, font-weight:bold, font-size:14px
Main[Modern Convex Optimization
Methods for Large-scale
Empirical Risk Minimization]
Main --> A[Optimization Basics]
A --> A1[Loss function plus regularization 1]
A --> A2[Large datasets need tailored
optimization 2]
A --> A3[Convex problems reliably solvable 3]
A --> A4[Lipschitz functions ensure convergeability 4]
A --> A5[What makes function convex? 5]
A --> A6[Convexity-preserving operations 6]
Main --> B[Gradient-based Methods]
B --> B1[Gradient methods handle large
dimensions 8]
B --> B2[Gradient descent convergence factors 9]
B --> B3[Nesterov method outperforms gradient
descent 10]
B --> B4[Line search, derivative checks 11]
B --> B5[Newtons method: quadratic convergence 12]
B --> B6[Practical second-order optimization adjustments 13]
Main --> C[Stochastic Methods]
C --> C1[Stochastic gradients: cheap but
noisy 14]
C --> C2[Stochastic gradients require smaller
steps 15]
C --> C3[Average iterates for noisy gradients 16]
C --> C4[Stochastic Newton: evolving theory 17]
C --> C5[Finite sums: linear convergence 18]
C --> C6[SAG: linear convergence using
history 19]
Main --> D[Advanced Techniques]
D --> D1[SVRG: convergence without saved
gradients 20]
D --> D2[Reduce SAG/SVRG storage requirements 21]
D --> D3[Clever sampling improves SAG/SVRG 22]
D --> D4[Smoothing tackles non-smooth problems 23]
D --> D5[Projected gradients for constraints 24]
D --> D6[Proximal methods handle composite
problems 25]
Main --> E[Special Methods]
E --> E1[Efficient proximal operators exist 26]
E --> E2[Proximal methods: stochastic/Newton variants 27]
E --> E3[ADMM splits complex constraints 28]
E --> E4[Frank-Wolfe: proximal method alternative 29]
E --> E5[Duality: smooth problem transformation 30]
E --> E6[Interior point methods inefficient 7]
class Main main
class A,A1,A2,A3,A4,A5,A6 basics
class B,B1,B2,B3,B4,B5,B6 gradients
class C,C1,C2,C3,C4,C5,C6 stochastic
class D,D1,D2,D3,D4,D5,D6,E,E1,E2,E3,E4,E5,E6 advanced
Resume:
1.- Loss function plus regularization.
2.- Large datasets require tailored optimization.
3.- Convex problems: reliably solvable.
4.- Lipschitz functions ensure convergeability.
5.- What makes a function convex?
6.- Convexity-preserving function operations.
7.- Interior point methods are inefficient.
8.- Gradient methods handle large dimensions.
9.- Gradient descent convergence depends...
10.- Nesterov method beats gradient descent.
11.- Line search, derivative checks.
12.- Newton's method: quadratic convergence.
13.- Practical 2nd-order optimization tweaks.
14.- Stochastic gradients: cheap but noisy.
15.- Stochastic gradients need smaller steps.
16.- Average iterates for noisy gradients.
17.- Stochastic Newton: theory evolving.
18.- Finite sums: linear convergence.
19.- SAG: linear convergence using history.
20.- SVRG: convergence without saved gradients.
21.- Reduce SAG/SVRG storage needs.
22.- Clever sampling improves SAG/SVRG.
23.- Smoothing tackles non-smooth problems.
24.- Projected gradients for constraints.
25.- Proximal methods handle composite problems.
26.- Efficient proximal operators abound.
27.- Proximal methods go stochastic/Newton.
28.- ADMM splits complex constraints.
29.- Frank-Wolfe: proximal method alternative.
30.- Duality: smooth problem transformation.
Knowledge Vault built byDavid Vivancos 2024