Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:
Resume:
1.- Introduction to structured prediction and joint prediction problems. Avoiding the term "structured prediction" due to association with conditional random fields.
2.- Examples of joint prediction problems: sequence labeling, parsing, matching, machine translation, image segmentation, protein folding.
3.- Approaches to joint prediction: independent predictions, multi-task learning, graphical models, handcrafted solutions. Pros and cons of each.
4.- Characterizing joint prediction problems: learning to optimize a joint loss over discrete predictions, exponential output space, need for output decomposition.
5.- Issues with joint prediction: huge output space, combinatorial loss functions, need to assume output decomposes usefully to make progress.
6.- Sequential decision-making formulation for problems with ordered output decomposition. Parallels to reinforcement learning.
7.- Warm-up 1: Vanilla supervised learning approach to solving joint prediction. Drawbacks due to train/test distribution mismatch.
8.- Warm-up 2: Imitation learning with the DAgger algorithm. Repeatedly queries expert on learner's trajectories. Provides theoretical guarantees.
9.- Comparing imitation learning to reinforcement learning. Access to labeled training data and optimal reference policies is an advantage.
10.- Limitations of imitation learning: Doesn't capture varying costs of different wrong predictions. Requires too much expert access.
11.- AGGRAVATE algorithm: Rolls in with learned policy, makes one-step deviations, rolls out with reference policy, generates cost-sensitive classification examples.
12.- Relationship between multi-class and cost-sensitive classification. Goal is minimizing expected cost of predictions vs just expected errors.
13.- Roll-in, roll-out, and one-step deviations in AGGRAVATE. Uses reference policy for roll-outs.
14.- Efficiently calculating losses without explicit roll-outs when using an expert reference policy, by exploiting loss decomposition (e.g. Hamming loss).
15.- Example 1: Sequence labeling with AGGRAVATE. Picking a time step, trying all labels, rolling out with expert, collecting cost-sensitive examples.
16.- Leveraging loss decomposition in sequence labeling to avoid doing roll-outs, gaining computational advantage. Only need to count incorrect past decisions.
17.- Example 2: Graph labeling with AGGRAVATE. Linearizing graph traversal in a belief propagation-like manner. Breadth-first search ordering.
18.- Making predictions on a node given its neighbors' information. Multiple inward and outward passes provide interior and exterior information.
19.- Constructing cost-sensitive examples for a node by trying all label perturbations. Hamming loss enables avoiding roll-outs.
20.- Using an online cost-sensitive classifier in the inner loop of AGGRAVATE. One-against-all strategy is common in practice.
21.- Convergence properties of AGGRAVATE and relation to empirical risk minimization. Second half of tutorial covers this.
22.- Handling non-optimal experts in learning to search. Recent work addresses this (paper being presented at the conference).
23.- Comparing learning to search and conditional random fields (CRFs). High-level similarities but differing approaches and optimization objectives.
24.- Learning to search reduces to prediction ability as a primitive, while CRFs reduce to probability modeling ability. Hard to directly compare.
25.- Sensitivity of graph labeling results to linearization order. Multiple inward/outward passes makes order matter less.
26.- Relationship between learning to search and recurrent neural networks. Orthogonal - one is algorithmic, other is representational. Potentially complementary.
27.- Online cost-sensitive classification as a sub-component of AGGRAVATE. Choice of classifier affects final prediction quality.
28.- Possibility of optimizing non-decomposable loss functions like F-score using learning to search, but requiring explicit roll-outs.
29.- Decomposability of loss function enables computational shortcuts in AGGRAVATE when using expert for roll-outs. Hamming loss is an example.
30.- Further discussion on connections to reinforcement learning, handling non-optimal experts, recurrent neural networks in the second half of tutorial.
Knowledge Vault built byDavid Vivancos 2024