Knowledge Vault 6 /8 - ICML 2015
Modern Convex Optimization Methods for Large-scale Empirical Risk Minimization
Mark Schmidt & Peter Richtárik
< Resume Image >

Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:

graph LR classDef main fill:#f9d4d4, font-weight:bold, font-size:14px classDef basics fill:#d4f9d4, font-weight:bold, font-size:14px classDef gradients fill:#d4d4f9, font-weight:bold, font-size:14px classDef stochastic fill:#f9f9d4, font-weight:bold, font-size:14px classDef advanced fill:#f9d4f9, font-weight:bold, font-size:14px Main[Modern Convex Optimization
Methods for Large-scale
Empirical Risk Minimization] Main --> A[Optimization Basics] A --> A1[Loss function plus regularization 1] A --> A2[Large datasets need tailored
optimization 2] A --> A3[Convex problems reliably solvable 3] A --> A4[Lipschitz functions ensure convergeability 4] A --> A5[What makes function convex? 5] A --> A6[Convexity-preserving operations 6] Main --> B[Gradient-based Methods] B --> B1[Gradient methods handle large
dimensions 8] B --> B2[Gradient descent convergence factors 9] B --> B3[Nesterov method outperforms gradient
descent 10] B --> B4[Line search, derivative checks 11] B --> B5[Newtons method: quadratic convergence 12] B --> B6[Practical second-order optimization adjustments 13] Main --> C[Stochastic Methods] C --> C1[Stochastic gradients: cheap but
noisy 14] C --> C2[Stochastic gradients require smaller
steps 15] C --> C3[Average iterates for noisy gradients 16] C --> C4[Stochastic Newton: evolving theory 17] C --> C5[Finite sums: linear convergence 18] C --> C6[SAG: linear convergence using
history 19] Main --> D[Advanced Techniques] D --> D1[SVRG: convergence without saved
gradients 20] D --> D2[Reduce SAG/SVRG storage requirements 21] D --> D3[Clever sampling improves SAG/SVRG 22] D --> D4[Smoothing tackles non-smooth problems 23] D --> D5[Projected gradients for constraints 24] D --> D6[Proximal methods handle composite
problems 25] Main --> E[Special Methods] E --> E1[Efficient proximal operators exist 26] E --> E2[Proximal methods: stochastic/Newton variants 27] E --> E3[ADMM splits complex constraints 28] E --> E4[Frank-Wolfe: proximal method alternative 29] E --> E5[Duality: smooth problem transformation 30] E --> E6[Interior point methods inefficient 7] class Main main class A,A1,A2,A3,A4,A5,A6 basics class B,B1,B2,B3,B4,B5,B6 gradients class C,C1,C2,C3,C4,C5,C6 stochastic class D,D1,D2,D3,D4,D5,D6,E,E1,E2,E3,E4,E5,E6 advanced

Resume:

1.- Loss function plus regularization.

2.- Large datasets require tailored optimization.

3.- Convex problems: reliably solvable.

4.- Lipschitz functions ensure convergeability.

5.- What makes a function convex?

6.- Convexity-preserving function operations.

7.- Interior point methods are inefficient.

8.- Gradient methods handle large dimensions.

9.- Gradient descent convergence depends...

10.- Nesterov method beats gradient descent.

11.- Line search, derivative checks.

12.- Newton's method: quadratic convergence.

13.- Practical 2nd-order optimization tweaks.

14.- Stochastic gradients: cheap but noisy.

15.- Stochastic gradients need smaller steps.

16.- Average iterates for noisy gradients.

17.- Stochastic Newton: theory evolving.

18.- Finite sums: linear convergence.

19.- SAG: linear convergence using history.

20.- SVRG: convergence without saved gradients.

21.- Reduce SAG/SVRG storage needs.

22.- Clever sampling improves SAG/SVRG.

23.- Smoothing tackles non-smooth problems.

24.- Projected gradients for constraints.

25.- Proximal methods handle composite problems.

26.- Efficient proximal operators abound.

27.- Proximal methods go stochastic/Newton.

28.- ADMM splits complex constraints.

29.- Frank-Wolfe: proximal method alternative.

30.- Duality: smooth problem transformation.

Knowledge Vault built byDavid Vivancos 2024