Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:
Resume:
1.- Online learning involves making decisions sequentially based on adversarially chosen data. Examples include spam filtering, portfolio selection, recommendation systems, and ad selection.
2.- Online learning is modeled as a repeated game between a learner and an adversary, with the goal of minimizing regret.
3.- Online convex optimization involves the learner choosing a point in a convex set and the adversary choosing a convex cost function.
4.- The weighted majority algorithm assigns weights to experts, predicts based on the weighted majority, and updates weights based on expert performance.
5.- The weighted majority algorithm guarantees total errors are at most about twice the errors of the best expert plus a logarithmic term.
6.- Online gradient descent involves updating by taking a step in the negative gradient direction and projecting back onto the convex set.
7.- Online gradient descent guarantees regret bounded by the square root of the number of iterations, which is optimal in general.
8.- Online learning is modeled as an online convex optimization problem. Examples are formulated in this framework.
9.- Strong convexity of loss functions allows faster convergence rates, such as logarithmic regret instead of square root regret.
10.- The portfolio selection problem, with logarithmic loss functions, exhibits exp-concavity which allows logarithmic regret algorithms.
11.- Newton-style second order methods can get faster convergence for exp-concave loss functions and strongly convex regularizers.
12.- Follow-the-leader, which plays the best decision so far, is unstable. Regularization makes it stable.
13.- Follow-the-regularized-leader (FTRL) adds a regularization term to the optimization. Common regularizers are L2 and entropy.
14.- FTRL with L2 regularization is equivalent to online gradient descent. FTRL with entropy regularization recovers the weighted majority algorithm.
15.- FTRL is equivalent to the mirror descent algorithm, which alternates between gradient update and Bregman projection.
16.- Follow-the-perturbed-leader adds random perturbations instead of regularization. It achieves the same regret bounds but only requires linear optimization.
17.- AdaGrad is an adaptive regularization method that adjusts learning rates per feature, improving convergence for sparse features.
18.- In the bandit setting, only the loss of the chosen decision is observed, not the entire loss vector.
19.- The Exp3 algorithm achieves optimal regret bounds for the multi-armed bandit problem by combining exploration and exploitation.
20.- For linear losses and arbitrary action sets, perturbations or self-concordant barrier regularization achieve optimal regret.
21.- This movement develops algorithms for linked convex ranking and uses repeated decomposition for repeated discounting.
22.- The idea is to find unbiased estimates of transmission vectors through exploration, and plug them into the optimum in a way of FTRL.
23.- For convex responses and arbitrary action systems, perturbations or regulation of parallel schedulers achieve optimal tuning.
24.- In the network topic, the results provide a general lower bound, and given information, a general upper bound can be achieved.
25.- The linked stochastic optimization problem connects the statistical assumptions with the environmental and adversarial assumptions of online learning.
26.- In multi-armed online learning with information about the relationships between the arms, local approximations to the loss function can be used to improve drafts.
27.- For "easy" loss objectives that depend on the distance between parameters, nice parametric applications with sub-linear dependence on the number of parameters can be obtained.
28.- Additional methods have robustness properties and capabilities to learn models in unbounded hypothesis spaces.
29.- Insights from online learning can also be applied to reinforcement learning problems such as recurrent policy control and policy gradient methods.
30.- This lecture provided an in-depth look at the field of online learning, its main models, methods, and applications.
Knowledge Vault built byDavid Vivancos 2024