Knowledge Vault 6 /19 - ICML 2016
The convex optimization, game-theoretic approach to learning
Elad Hazan & Satyen Kale
< Resume Image >

Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:

graph LR classDef main fill:#f9d9c9, font-weight:bold, font-size:14px classDef basics fill:#d4f9d4, font-weight:bold, font-size:14px classDef algorithms fill:#d4d4f9, font-weight:bold, font-size:14px classDef advanced fill:#f9f9d4, font-weight:bold, font-size:14px classDef applications fill:#f9d4f9, font-weight:bold, font-size:14px Main[The convex optimization,
game-theoretic approach to
learning] Main --> A[Basics of Online Learning] Main --> B[Core Algorithms] Main --> C[Advanced Concepts] Main --> D[Applications and Extensions] A --> A1[Online learning: sequential decisions
on adversarial data 1] A --> A2[Repeated game between learner
and adversary 2] A --> A3[Online convex optimization: choose
point, minimize cost 3] A --> A4[Online learning modeled as
convex optimization 8] A --> A5[Bandit setting: only chosen
decisions loss observed 18] B --> B1[Weighted majority algorithm predicts,
updates expert weights 4] B --> B2[Online gradient descent updates
with negative gradient 6] B --> B3[FTRL adds regularization to
optimization 13] B --> B4[FTRL equivalent to mirror
descent algorithm 15] B --> B5[Follow-the-perturbed-leader adds random
perturbations 16] B --> B6[AdaGrad adjusts learning rates
per feature 17] C --> C1[Strong convexity allows faster
convergence rates 9] C --> C2[Portfolio selection exhibits exp-concavity,
allowing logarithmic regret 10] C --> C3[Newton methods: faster convergence
for exp-concave losses 11] C --> C4[Exp3 achieves optimal multi-armed
bandit regret 19] C --> C5[Linear losses: perturbations achieve
optimal regret 20] C --> C6[Network topic: general lower
and upper bounds 24] D --> D1[Algorithms for linked convex
ranking, repeated discounting 21] D --> D2[Unbiased estimates plugged into
FTRL optimum 22] D --> D3[Linked stochastic optimization connects
statistical assumptions 25] D --> D4[Multi-armed learning: local approximations
improve drafts 26] D --> D5[Additional methods: robustness, unbounded
hypothesis spaces 28] D --> D6[Online learning insights apply
to reinforcement learning 29] class Main main class A,A1,A2,A3,A4,A5 basics class B,B1,B2,B3,B4,B5,B6 algorithms class C,C1,C2,C3,C4,C5,C6 advanced class D,D1,D2,D3,D4,D5,D6 applications

Resume:

1.- Online learning involves making decisions sequentially based on adversarially chosen data. Examples include spam filtering, portfolio selection, recommendation systems, and ad selection.

2.- Online learning is modeled as a repeated game between a learner and an adversary, with the goal of minimizing regret.

3.- Online convex optimization involves the learner choosing a point in a convex set and the adversary choosing a convex cost function.

4.- The weighted majority algorithm assigns weights to experts, predicts based on the weighted majority, and updates weights based on expert performance.

5.- The weighted majority algorithm guarantees total errors are at most about twice the errors of the best expert plus a logarithmic term.

6.- Online gradient descent involves updating by taking a step in the negative gradient direction and projecting back onto the convex set.

7.- Online gradient descent guarantees regret bounded by the square root of the number of iterations, which is optimal in general.

8.- Online learning is modeled as an online convex optimization problem. Examples are formulated in this framework.

9.- Strong convexity of loss functions allows faster convergence rates, such as logarithmic regret instead of square root regret.

10.- The portfolio selection problem, with logarithmic loss functions, exhibits exp-concavity which allows logarithmic regret algorithms.

11.- Newton-style second order methods can get faster convergence for exp-concave loss functions and strongly convex regularizers.

12.- Follow-the-leader, which plays the best decision so far, is unstable. Regularization makes it stable.

13.- Follow-the-regularized-leader (FTRL) adds a regularization term to the optimization. Common regularizers are L2 and entropy.

14.- FTRL with L2 regularization is equivalent to online gradient descent. FTRL with entropy regularization recovers the weighted majority algorithm.

15.- FTRL is equivalent to the mirror descent algorithm, which alternates between gradient update and Bregman projection.

16.- Follow-the-perturbed-leader adds random perturbations instead of regularization. It achieves the same regret bounds but only requires linear optimization.

17.- AdaGrad is an adaptive regularization method that adjusts learning rates per feature, improving convergence for sparse features.

18.- In the bandit setting, only the loss of the chosen decision is observed, not the entire loss vector.

19.- The Exp3 algorithm achieves optimal regret bounds for the multi-armed bandit problem by combining exploration and exploitation.

20.- For linear losses and arbitrary action sets, perturbations or self-concordant barrier regularization achieve optimal regret.

21.- This movement develops algorithms for linked convex ranking and uses repeated decomposition for repeated discounting.

22.- The idea is to find unbiased estimates of transmission vectors through exploration, and plug them into the optimum in a way of FTRL.

23.- For convex responses and arbitrary action systems, perturbations or regulation of parallel schedulers achieve optimal tuning.

24.- In the network topic, the results provide a general lower bound, and given information, a general upper bound can be achieved.

25.- The linked stochastic optimization problem connects the statistical assumptions with the environmental and adversarial assumptions of online learning.

26.- In multi-armed online learning with information about the relationships between the arms, local approximations to the loss function can be used to improve drafts.

27.- For "easy" loss objectives that depend on the distance between parameters, nice parametric applications with sub-linear dependence on the number of parameters can be obtained.

28.- Additional methods have robustness properties and capabilities to learn models in unbounded hypothesis spaces.

29.- Insights from online learning can also be applied to reinforcement learning problems such as recurrent policy control and policy gradient methods.

30.- This lecture provided an in-depth look at the field of online learning, its main models, methods, and applications.

Knowledge Vault built byDavid Vivancos 2024